Evaluating clinical AI scribes:
a practical vetting framework for practice leaders.
Clinical Advisor at CalmScribe. Licensed Psychotherapist in Private Practice.
PUBLISHED: April 12, 2026 • UPDATED: May 28, 2026
Direct Answer Summary
Evaluate any clinical AI scribe against eight criteria before it touches a single session. Confirm the vendor will sign a Business Associate Agreement, then look past the contract to where PHI actually lives, how long audio and transcripts are retained, and whether client data trains the vendor's models. Verify independent certifications, build a documented client-consent process, and treat clinician review and sign-off as non-negotiable. Format support, EHR integration, clean data export, and transparent pricing round out the checklist.
Ambient AI scribes have moved quickly from pilot curiosity to standing item on the operations agenda of behavioral health practices. The pitch is consistent: the tool listens to a session, drafts a structured progress note, and gives the clinician back the evenings they currently spend charting. For a field where after-hours documentation is a leading contributor to burnout, that is a serious offer.
It is also an offer that puts a third party between the therapist and the most sensitive data a practice holds. Ambient AI scribing works by capturing the spoken content of a clinical encounter, transcribing it, and mapping that transcript into a note format. Every step in that pipeline involves protected health information, and behavioral health records carry an unusually high sensitivity because they document the contents of private therapeutic conversation.
This guide is a vetting framework, not a product ranking. It will not tell you which scribe to buy. It gives you the eight questions that separate a defensible vendor from a liability, along with the specific things to ask each one and the answers that should make you walk away. Use it as a procurement worksheet: send the questions, get the answers in writing, and keep them on file.
What an ambient scribe actually does
Before evaluating vendors, it helps to be precise about the workflow. An ambient scribe is not a dictation tool. With legacy dictation, the clinician speaks the note aloud, including structure and punctuation. An ambient scribe instead listens passively to the natural back-and-forth of the session, produces a transcript, and uses a language model to reshape that transcript into a draft note in a recognized clinical format.
That draft is the key word. The output is an unsigned draft that the clinician then reads, corrects, and signs. The model does not practice; the clinician does. Every criterion below ultimately serves one goal: making sure the tool produces accurate, private, defensible drafts that a licensed clinician can stand behind.
The eight-criterion vetting framework
1. Will they sign a Business Associate Agreement?
This is the threshold question, and the answer must be an unconditional yes. Under HIPAA, any vendor that creates, receives, maintains, or transmits PHI on a covered entity's behalf is a business associate, and a signed Business Associate Agreement (BAA) is legally required before that vendor touches client data. A scribe that records sessions is squarely in scope.
Ask for the actual BAA template up front, not a marketing promise that one exists. Read it. Confirm it names breach-notification timelines, defines permitted uses of PHI, and addresses what happens to data when the contract ends. A vendor that hesitates, charges extra for a BAA, or only offers one on an enterprise tier is telling you something. That said, a signed BAA assigns liability; it does not prove the engineering underneath is sound. It is the floor, not the ceiling.
2. Data custody: where does PHI live, and what happens to it?
This is where most of the real risk sits, and where vendors vary the most. You are trying to trace the full lifecycle of a recorded session from microphone to deletion. Ask each vendor, in writing:
- Is the raw audio retained after the transcript is generated, or deleted immediately?
- How long are transcripts and finished notes retained, and can the retention window be configured to match our policy?
- Is client data, audio, or transcript ever used to train or fine-tune your models or any third-party model?
- Where is the data physically stored (data residency), and which subprocessors or cloud providers touch it?
- Is PHI encrypted in transit and at rest, and who holds the encryption keys?
- On offboarding, can we export all of our data and have you certify its deletion?
The single most important answer is the training question. If client conversations are used to train models, that is a material privacy concern for a behavioral health practice, and the burden is on the vendor to show it is contractually and technically prevented. Vague language like "we may use aggregated data to improve our service" deserves a direct follow-up until you understand exactly what leaves your tenant.
3. Certifications, and what they actually mean
Certifications are useful signals, but only if you understand what each one certifies. The common ones cluster into three buckets.
SOC 2 Type II is an independent audit, conducted by CPAs against AICPA criteria, of whether a vendor's security controls operated effectively over a sustained period (typically several months to a year). It is the strongest routine signal that a vendor's security claims are externally verified rather than self-asserted. Worth knowing: SOC 2 is not a statutory HIPAA requirement. HIPAA mandates safeguards; it does not mandate a SOC 2 report. A vendor can be HIPAA-compliant without SOC 2, and a SOC 2 report does not by itself prove HIPAA compliance. In practice, a current SOC 2 Type II report plus a signed BAA is a reasonable baseline. Ask to see the report under NDA and check its date and scope.
HIPAA safeguards refer to the administrative, physical, and technical controls the Security Rule requires. "HIPAA compliant" is a self-attestation; there is no government HIPAA certificate. Ask what specific safeguards back the claim: access controls, audit logging, encryption, workforce training, and a documented risk analysis.
ISO/IEC 42001 is the newer international standard for AI management systems. It speaks directly to the AI-specific risks a scribe introduces: managing model behavior, identifying bias, and governing the safety of generated output. It is not yet common, so its absence is not disqualifying, but its presence is a meaningful sign that the vendor treats the AI layer as a governed system rather than a black box.
4. Client consent and notice
Recording a therapy session through a scribe is not something to bolt on quietly. Clients have a right to know that an AI tool is processing their words, and in many jurisdictions you have a legal obligation to obtain consent before recording at all.
Recording-consent laws split into two models. One-party consent states require only one participant to a conversation to consent to recording. Two-party (also called all-party) consent states require every participant to agree. The clinician's own consent does not satisfy an all-party requirement; the client must consent too. Verify your state's rule, and if you practice across state lines via telehealth, apply the stricter standard.
Beyond the recording statute, informed consent is a clinical and ethical baseline. The client should understand, in plain language, what the tool does, what happens to the audio, and that they can decline without it affecting their care. Ask the vendor how consent is captured and documented in the workflow: a checkbox is not the same as a documented, dated informed-consent conversation in the chart.
5. Clinical accuracy, style fidelity, and human review
Language models can produce fluent text that is confidently wrong. In a clinical note, a fabricated symptom, an invented quote, or a misattributed risk statement is not a typo; it is a safety and liability problem. Hallucination risk is the defining clinical weakness of AI scribes, and the only reliable control is a clinician who reads every draft against their own memory of the session before signing.
This is the non-negotiable. The clinician who signs the note is fully responsible for its content under their license, regardless of which tool drafted it. No vendor claim about accuracy removes the requirement for human-in-the-loop review and sign-off. Evaluate scribes partly on how well they support that review: do they make it easy to compare the draft against source, flag uncertain passages, and edit before signing?
Style fidelity is the quieter accuracy question. A note that is technically correct but does not sound like the clinician, or that drops the specific observable language payers expect, still creates rework. Run a pilot on real (consented) sessions and judge the drafts the way an auditor would, using the same standards as our DAP note guide: are interventions specific, is medical necessity supported, are observations measurable rather than generic?
6. Format support
Behavioral health uses several note structures, and the right one depends on the setting and the payer. Confirm the scribe supports the formats your practice and your contracts actually require: SOAP, DAP, BIRP, and GIRP are the common ones. A tool that only emits a single rigid template will force clinicians to reformat by hand, which erodes the time savings that justified the purchase. Ask whether formats are configurable per clinician and per payer, and whether custom templates are supported.
7. EHR integration and offboarding
A scribe that drafts beautiful notes you then copy and paste into your EHR by hand is a half-built tool. Ask how notes reach the chart: a real integration, a copy-paste workflow, or something in between. Just as important, and easy to forget during a demo, is how you get your data out. Confirm you can export all notes and records in a usable format and that the vendor will delete its copy on request. Offboarding terms belong in the contract before you sign, not in a support ticket two years later.
8. Pricing transparency
Clear, published pricing is a proxy for how a vendor treats customers. Look for per-clinician or per-session costs you can model, and ask directly about the things that often hide in the fine print: minimum seat commitments, overage charges, whether the BAA or certain integrations sit behind a higher tier, and the terms for cancellation. A vendor unwilling to put pricing in writing during evaluation is unlikely to get more transparent after you have signed.
The vetting scorecard and the red flags
Use the left card as a checklist for every vendor on your shortlist. Use the right card as a set of disqualifiers. A single red flag is reason to slow down and get a clear written answer before going further.
- Signs a BAA at no extra cost, on the tier we would actually buy.
- States in writing that client data is never used to train models.
- Documents audio and transcript retention, with a configurable deletion policy.
- Provides a current SOC 2 Type II report (under NDA) and names its HIPAA safeguards.
- Supports a documented client-consent step in the workflow.
- Requires explicit clinician review and sign-off before a note is final.
- Supports the note formats and EHR our practice actually uses.
- Lets us export all data and certifies deletion on offboarding.
- Publishes pricing we can model without a sales call.
- Will not sign a BAA, or gates it behind a higher pricing tier.
- Cannot say clearly whether client data trains its models, or reserves the right to use it.
- No documented retention or deletion policy for audio and transcripts.
- Cites a certification but will not share the report, date, or scope.
- Markets notes as "ready to file" or implies clinician review is optional.
- No mechanism to capture or document client consent.
- Locks your data in with no clean export or deletion path.
- Opaque pricing, mandatory long contracts, or pressure to skip the pilot.
Sponsorship disclosure
CalmScribe is sponsored by the team behind Nextvisit, an ambient AI clinical scribe. We are telling you that here because it is relevant to this exact topic. This guide does not recommend Nextvisit or any other product, and it deliberately names no vendor specifics. Run every tool you consider, including Nextvisit, through the same eight criteria above, and demand the same written answers from each. A framework only protects your practice if you apply it evenly.
One closing point on scope. These tools handle progress notes, which are part of the official medical record. They are a poor fit for psychotherapy process notes, which HIPAA treats as a separate, more protected category that you keep apart from the chart. If you are unclear on that distinction, it is worth settling before you introduce a scribe; our guide on psychotherapy notes under HIPAA covers where that line sits. For more on the broader documentation pillars, see the full guides hub.
Frequently asked questions
Is a Business Associate Agreement enough for HIPAA compliance?
No. A signed Business Associate Agreement is mandatory before any vendor handles protected health information, but it is a contractual floor, not proof of secure engineering. The BAA assigns legal liability and breach-notification duties. You still have to verify how the vendor stores, encrypts, retains, and deletes PHI, and whether it uses client data to train models. Treat the BAA as the entry ticket, then evaluate the underlying data custody practices and independent audits separately.
Can AI scribe notes be used as the legal clinical record?
Yes, but only after a clinician reviews, edits, and signs the draft. An ambient AI scribe produces an unsigned draft. It becomes part of the legal record when the treating clinician verifies its accuracy and attests to it under their own license. The clinician remains fully responsible for the content of the signed note, including any errors or hallucinations the model introduced. Human review and sign-off is non-negotiable.
Do I need client consent to use an AI scribe?
Yes. Recording or processing a session through an AI scribe requires informed client consent, and many states have specific recording-consent laws. One-party consent states require only one participant to agree, while two-party (all-party) consent states require every participant to agree before recording. Behavioral health sessions involve heightened sensitivity, so document consent explicitly, explain what the tool does with the audio, and offer clients the option to decline without affecting their care.
References & Sources
This framework draws on the HIPAA Privacy and Security Rules and the definition of protected health information at 45 CFR 164.501, the AICPA SOC 2 reporting framework, ISO/IEC 42001 for AI management systems, and the American Psychological Association (APA) Record Keeping Guidelines. State recording-consent laws vary; verify the rule that applies in each state where you practice. To suggest corrections, contact our editorial desk.