In a Large Language Model conversation, different roles have a clear hierarchy and have distinctly different abilities to influence the
conversation, define its boundaries, and control the actions of other participants.
Nowadays, the trio of system
, user
, and assistant
defining the core roles of many Large Language Model (LLM)
interactions is expanding to include a more diverse set of roles, such as developer, tool, function, and even more nuanced roles in multi-agent
systems.
Injecting unchecked user input in privileged prompts, such as system
, gives unauthorized third parties the ability to break out of
contexts and constraints that you assume the LLM follows.
What is the potential impact?
When attackers detect privilege discrepancies while injecting into your LLM application, they then map out their capabilities in terms of actions
and knowledge extraction, and then act accordingly.
The impact is very dependent on the "screenplay" of the intended dialogues between model,
user(s), third-parties, tools, which you had in mind while designing the application.
Below are some real-world scenarios that illustrate some impacts of an attacker exploiting the vulnerability.
Data manipulation
A malicious prompt injection enables data leakages or possibly impacting the LLM discussions of other users.
Denial of service and code execution
Malicious prompt injections could allow the attacker to possibly leverage internal tooling such as MCP, to delete sensitive or important data, or
to send tremendous amounts of requests to third-party services, leading to financial losses or getting banned from such services.
This threat is
particularly insidious if the attacked organization does not maintain a disaster recovery plan (DRP).