In a Large Language Model conversation, different roles have a clear hierarchy and have distinctly different abilities to influence the
conversation, define its boundaries, and control the actions of other participants.
Injecting unchecked user inputs in privileged prompts gives unauthorized third parties the ability to break out of contexts and constraints that
you assume the LLM will follow.
Fundamentally, the trio of system
, user
, and assistant
defines the core roles of many Large Language Model
(LLM) interactions. However, the landscape of conversational AI is expanding to include a more diverse set of roles, such as developer, tool,
function, and even more nuanced roles in multi-agent systems.
In essence, the LLM conversation roles are no longer a simple triad. The most important thing to keep in mind is that these roles must stay
coherent to the least privilege principle, where each role has the minimum level of access necessary to perform its function.
What is the potential impact?
When attackers detect privilege discrepancies while injecting into your LLM application, they will try to map out their capabilities in terms of
actions and knowledge extraction, and act accordingly.
Below are some real-world scenarios that illustrate some impacts of an attacker exploiting the vulnerability.
Data manipulation
A malicious prompt injection enables data leakages or possibly impacting the LLM discussions of other users.
Denial of service and code execution
Malicious prompt injections could allow the attacker to possibly leverage internal tooling such as MCP, to delete sensitive or important data, or
to send tremendous amounts of requests to third-party services, leading to financial losses or getting banned from such services.
This threat is
particularly insidious if the attacked organization does not maintain a disaster recovery plan (DRP).