What if your chatbot misled customers?

Aimdall, an AI-driven testing agent, analyzed 1000+ public chatbots
and flagged several trust, privacy and ethical risks

Overall RRI distribution of tested AI Conversational Agents

Aimdall interacts with chatbots through legitimate yet strategically crafted queries designed to expose problematic responses containing:

ETHICAL RISK

FACTUAL INACCURACIES

OFF-TOPIC DEVIATIONS

PRIVACY CONCERNS

The results are summarized for each site in a Response Risk Index from 1 to 10. The graph shows the distribution of these indices across all analyzed websites.

COMPLIANT & RELIABLE – 8% of chatbots consistently delivered accurate and ethical responses.
POTENTIAL CONCERNS - 54% of chatbots showed occasional issues, like mild bias or inaccurate answers
SIGNIFICANT ISSUES – 38% of chatbots generated biased or hallucinated responses and mishandled sensitive data, showing clear signs of compliance failures and reputational risk.

AI Conversational Agents, also known as chatbots, are rapidly transforming online services by replacing human support agents with low-cost, 24/7 automated assistants. Widely adopted across different industries, these virtual assistants are great at answering FAQs, searching for a specific product in a store, booking appointments, and much more.

AI Agents are not just Large Language Models: they rely on multiple components working together to provide intelligent, contextual, and useful responses within the online services that implement them.

Traditionally, when talking about Security for AI, threat modeling has focused on adversarial behaviors and system vulnerabilities. However, with the rapid deployment of AI Conversational Agents in user-facing roles, a new class of risks has emerged-those stemming from inaccurate or inappropriate responses delivered during seemingly benign interactions.

These are not the result of malicious attacks, but rather systemic limitations in the Agents’ understanding or reasoning that may arise during legitimate use of the system. Our proposed threat model introduces this previously unaddressed risk surface, where flawed outputs and sensitive data mishandling can degrade user experience, erode trust, and expose businesses to reputational and regulatory consequences.

CONTACTS

For inquiries regarding this report, please contact us: [email protected]