User experience
When prompts are blocked
When the collector blocks a user prompt, the user sees a banner that includes:
- Message indicating that the prompt was blocked
- Request ID that users can copy and provide to Support
For example:
Malicious Prompt was detected and blocked.
Request ID: prq_b6m7di4yao3lc4q75j5lddx5y7licu5v ⧉
Block messages
By default, the banner displays a summary of the detection. You can replace this with a custom block message to provide context relevant to your organization. For example, include a link to an internal acceptable-use policy or instructions for requesting an exception.
To configure custom block messages, see Block messages in
Access Rules and Prompt Rules .When data is transformed
When the collector transforms data, the AI provider receives redacted sensitive values and defanged malicious URLs, IP addresses, and domains. Some sites might show the original user input in the chat history.
Users see a banner message that includes:
- Message indicating that sensitive data was redacted or malicious references were defanged
- Request ID that users can copy and provide to Support
For example:
Your organization's security policy modified sensitive or malicious content before sending it to the AI provider.
Request ID: prq_b6m7di4yao3lc4q75j5lddx5y7licu5v ⧉
Users see transformed values in AI responses when the model repeats those values in its output.
Inconsistent behavior across AI provider sites
AI provider sites handle AIDR security interventions differently based on their client-side web processing. These implementations can change at any time, are outside AIDR's control, and might result in inconsistent user experiences across platforms.
Example
The ChatGPT conversation interface captures user input and updates chat history based on what the AI model processed. Depending on how AIDR processes user input, the displayed conversation might not match what the user originally entered:
-
When AIDR transforms data in a user prompt:
- User enters a prompt containing sensitive data.
- ChatGPT adds the user input to the chat interface. The input briefly appears unchanged until ChatGPT updates it based on the model's response.
- AIDR browser collector intercepts the prompt, processes it, and sends the transformed version to the AI model.
- ChatGPT receives the model response and:
- Updates the user prompt displayed in the chat interface with the actual prompt received by the model.
- Adds the model response to the chat history.
Example exchange:- User enters: "Do you know Muffin Man?"
- User's input is added to the chat history unmodified: "Do you know Muffin Man?"
- AIDR's Confidential and PII Entity detector replaces the person name with a placeholder before sending the prompt to the AI model.
- When the model responds:
- AIDR browser extension shows a banner message.
- User input in the chat history becomes "Do you know <PERSON>".
- Model response is added to the chat history and might read: "I do not know who <PERSON> is from that message..."
-
When AIDR blocks a user prompt, the behavior differs because no content reaches the AI model:
- User enters a prompt that AIDR blocks - for example, a harmful intent blocked by the Malicious Prompt detector.
- ChatGPT adds the user input to the chat interface.
- AIDR browser collector intercepts the prompt, processes it, and blocks it from being sent to the model.
- AIDR browser extension shows a banner message.
- Because no model response arrives, ChatGPT doesn't update the conversation. The user prompt remains in the chat history and can't be removed or modified.
Other AI providers, such as Claude, Gemini, and enterprise platforms, might handle these scenarios differently due to variations in their client-side implementations.
For example, Claude AI currently behaves like ChatGPT when AIDR transforms a user prompt. However, when AIDR blocks a prompt, Claude AI doesn't add it to the conversation.
Report Only mode
If browser collector policy input rules are set to Report, or the policy is in Report Only Mode , the user experience is unaffected.
AIDR logs detections without blocking prompts or modifying data.
Output rules in browser collector policies always run in Report Only Mode.