Ontario auditors find AI note takers fail in healthcare settings

21 May 2026

Ontario's auditor general has uncovered significant flaws in AI-powered note-taking systems used by doctors, raising concerns about patient safety. The audit revealed that these AI scribes, recommended by the provincial government, often produce incorrect, incomplete, and even fabricated information. Such inaccuracies could lead to inadequate or harmful treatment plans, potentially impacting patient health outcomes. This revelation underscores the critical need for rigorous evaluation and oversight of AI tools in healthcare, as the province grapples with balancing technological innovation and patient safety.

The promise and pitfalls of AI scribes

AI scribes have been introduced as a solution to alleviate the administrative burdens faced by healthcare professionals. These tools are designed to transcribe patient-doctor interactions into structured medical notes, ostensibly saving time and improving efficiency. However, the recent audit by Ontario's auditor general has highlighted significant shortcomings in these systems. The report found that all 20 government-approved vendors exhibited issues with accuracy or completeness in their AI scribe outputs.

Among the vendors, nine were found to hallucinate patient information, while 12 recorded information incorrectly. These errors are not just technical glitches; they have the potential to directly affect patient care. In some cases, AI scribes fabricated nonexistent referrals or misrepresented medication names, which could lead to serious treatment errors.

The audit also revealed that 17 vendors failed to capture key details about mental health issues discussed during consultations. This oversight is particularly concerning given the sensitive nature of mental health care and the importance of accurate documentation in treatment planning. The findings underscore the need for thorough vetting and continuous monitoring of AI tools in healthcare.

Evaluating AI scribe performance

The auditor general's report raised concerns about the evaluation processes of AI scribe vendors prior to their approval for use in Ontario's healthcare system. Despite being pre-qualified by the government, many vendors did not meet essential evaluation criteria. Some vendors did not provide adequate documentation or assessments, which are critical for ensuring reliability and safety.

These gaps in evaluation raise questions about the oversight processes in place to ensure the reliability and safety of AI tools in healthcare. The report highlighted that the AI scribe systems were not adequately assessed for accuracy, security, and privacy, which are critical factors in safeguarding patient information.

Moreover, the issue of bias in AI systems was not sufficiently addressed. There is a need for comprehensive evaluations to mitigate bias risks, which could lead to skewed or inaccurate medical documentation. This lack of thorough evaluation underscores the need for stricter regulatory frameworks and accountability measures in the deployment of AI technologies in healthcare.

Implications for patient care

The inaccuracies in AI scribe outputs have significant implications for patient care. Incorrect or incomplete medical notes can lead to misdiagnoses, inappropriate treatments, and potentially harmful outcomes. The auditor general's report emphasized that these errors could result in inadequate treatment plans, thereby impacting patient health outcomes.

Despite these risks, AI scribes are currently in use by some doctors, who report saving time on administrative tasks. This time-saving aspect is a key selling point for AI scribes, as it allows healthcare providers to focus more on patient care rather than administrative tasks. However, the trade-off between efficiency and accuracy remains a critical concern.

The report's findings highlight the need for healthcare providers to remain vigilant when using AI tools. Doctors are encouraged to review AI-generated notes thoroughly to ensure accuracy and completeness before making clinical decisions. This additional layer of oversight is essential to mitigate the risks associated with AI scribe errors.

Challenges and limitations

The implementation of AI scribes in healthcare is not without its challenges. One major issue is the capability-reliability gap, where AI systems demonstrate impressive capabilities but fall short in reliability. This gap is evident in the persistent inaccuracies and hallucinations observed in AI scribe outputs.

Another challenge is the lack of standardized evaluation criteria for AI tools in healthcare. There is a need for comprehensive assessments for accuracy, security, and privacy, which are crucial for ensuring the safe and effective use of AI technologies.

Furthermore, the issue of bias in AI systems remains a significant concern. Without proper evaluation and mitigation strategies, AI scribes could perpetuate existing biases in medical documentation, leading to unequal treatment outcomes. Addressing these challenges requires a concerted effort from policymakers, healthcare providers, and AI developers to establish robust evaluation frameworks and accountability measures.

Future prospects and considerations

Looking ahead, the future of AI scribes in healthcare will depend on addressing the current challenges and limitations. Enhancing the reliability and accuracy of AI systems is paramount to ensuring patient safety and improving healthcare outcomes. This will require ongoing research and development, as well as rigorous testing and evaluation of AI tools before they are implemented in clinical settings.

Policymakers and healthcare providers must work together to establish clear guidelines and standards for the use of AI in healthcare. This includes developing comprehensive evaluation criteria that address accuracy, security, privacy, and bias mitigation. Such measures will help build trust in AI technologies and ensure their safe and effective integration into healthcare practices.

As AI technologies continue to evolve, it is crucial to remain vigilant and proactive in addressing potential risks and challenges. By fostering collaboration and innovation, the healthcare industry can harness the potential of AI scribes to improve patient care while safeguarding against the pitfalls of inaccuracy and bias.

Frequently Asked Questions

What are AI scribes?

AI scribes are software tools that use artificial intelligence to transcribe spoken interactions between doctors and patients into written medical notes. They are designed to streamline the documentation process, saving time for healthcare providers by reducing the administrative burden associated with manual note-taking.

What issues have been found with AI scribes in Ontario?

The auditor general of Ontario found that AI scribes often produce incorrect, incomplete, and even fabricated information. These inaccuracies can lead to inadequate or harmful treatment plans, posing risks to patient safety. The report also highlighted gaps in the evaluation of AI scribe vendors, including issues with accuracy, security, and privacy.

How can the reliability of AI scribes be improved?

Improving the reliability of AI scribes involves rigorous testing and evaluation of the tools before implementation. Establishing standardized evaluation criteria that address accuracy, security, privacy, and bias mitigation is essential. Ongoing research and development, as well as collaboration between policymakers, healthcare providers, and AI developers, are crucial to enhancing the reliability and effectiveness of AI scribes in healthcare.