Third‑party AI and health data: technical controls and contractual must-haves
A practical checklist for encrypting, restricting, retaining, and contracting safely before sending scanned health records to external AI.
As AI vendors rush to offer “health” features, IT teams are being asked to do something deceptively difficult: move scanned medical records into external systems without creating a privacy, security, or compliance incident. The pressure is real because the value is real too—better search, faster summarization, and support for staff who are buried in intake packets, referral letters, and legacy PDFs. But when the data includes PHI, the bar changes immediately. Before a single scan is sent to a third-party AI provider, teams need a hard checklist for encryption, access control, retention limits, auditability, and vendor contracts that look and behave like a BAA.
The recent rollout of consumer-facing health features underscores why this matters. In the BBC’s reporting on OpenAI’s ChatGPT Health launch, the company said health conversations would be stored separately and not used for model training, but privacy advocates still warned that “airtight” safeguards are essential for sensitive data. That tension captures the enterprise reality perfectly: vendor promises are not a control framework. For practical implementation guidance around secure workflows, see our broader resources on building retrieval datasets for internal AI assistants, the risks of relying on commercial AI, and why human review still matters when accuracy and accountability are on the line.
1) Start with the data classification: not all scans should ever reach third-party AI
Map the record types before you map the workflow
The first technical control is often the most overlooked: determine exactly what you are sending. A scanned image of a prescription label, a PDF of a discharge summary, and a full chart export may all look like “documents,” but they carry very different risk profiles. PHI, ePHI, and any document that could identify a patient should be treated as restricted by default, not merely sensitive. If your team has not classified inputs by content type, origin, and business purpose, you cannot reliably decide whether a third-party AI provider is appropriate at all.
Define the minimum necessary use case
Under HIPAA’s minimum necessary principle, the right question is not “Can the model read the whole record?” It is “What is the smallest subset of data needed to achieve the workflow outcome?” In many cases, OCR, redaction, or field extraction can happen internally before any external AI call. Teams trying to operationalize this should look at adjacent patterns like retrieval dataset design and quantifying avoidable waste; the same discipline applies here. Less data in the prompt means lower breach impact, lower retention risk, and often lower cost.
Build an allowlist, not a wish list
Many organizations make the mistake of allowing “any AI tool approved by procurement” to process regulated records. That is too broad. Create an explicit allowlist for providers, use cases, data classes, and account types. If a vendor cannot demonstrate that it supports your required controls for encryption, regional processing, retention, logging, and contract terms, the default answer should be no. This is especially important when the same provider offers consumer and enterprise products, because product-level privacy promises can differ significantly.
2) Encryption is necessary, but it is not enough
Encrypt data in transit with modern transport controls
Every call from your environment to the third-party AI service should use TLS 1.2+ with strong cipher suites, certificate validation, and endpoint verification. If the vendor supports mTLS, private connectivity, or mutually authenticated API gateways, use them. Where possible, route traffic through controlled egress points so you can monitor destinations, enforce policy, and prevent shadow AI use. A lot of enterprise teams already apply this kind of rigor to devices and peripherals; the same “don’t trust the cable by default” mindset you’d use in basic hardware purchasing decisions belongs in your AI integration architecture too.
Encrypt data at rest on both sides of the integration
It is not enough for your systems to encrypt records before upload. You also need clarity on how the vendor encrypts stored data, where keys live, who can access them, and whether customer-managed keys are supported. If the provider stores prompts, outputs, or uploaded files, those stores must be encrypted at rest with robust key management and documented rotation policies. Ask whether backups, replicas, caches, and disaster recovery copies are encrypted too, because that is where weak implementations often fail. This is also where architecture choices matter: if the vendor cannot segregate customer data or prove tenant isolation, the solution is not mature enough for PHI.
Prefer envelope encryption and controlled key ownership when possible
For high-sensitivity workflows, especially when records are scanned in bulk, consider envelope encryption so your application controls the data encryption key lifecycle even if the vendor handles storage. In some deployments, the right model is to redact or tokenize patient identifiers before the AI layer sees them. In others, the safest approach is to keep the entire sealed record in your own environment and send only narrowly scoped text snippets to the provider. Good practice here looks a lot like disciplined device evaluation: compare vendor claims, implementation limits, and operational reality before you commit, similar to the way teams approach technical product review checklists or troubleshooting a new laptop before acceptance.
3) Access control must be engineered, not assumed
Limit who can send data and who can read outputs
Third-party AI risk is not just about the provider. It is also about how broadly your own users can submit records and retrieve outputs. Put RBAC or ABAC in place so only approved roles can access health-document workflows. Separate power users, reviewers, admins, and auditors. If the output includes clinical language or decision support, make sure downstream consumption is also controlled; otherwise, a narrowly approved ingestion path becomes an uncontrolled sharing channel.
Use strong identity, SSO, and session protections
Require SSO with MFA, short-lived sessions, device posture checks, and just-in-time elevation for administrative functions. Avoid shared service accounts for human review, because they destroy accountability in audit logs. API keys should be scoped tightly, rotated regularly, and stored in a secrets manager, not a code repository or shared spreadsheet. If the vendor cannot support granular service principal permissions, that limitation should be flagged in your third-party risk review. Strong identity controls are the foundation for any trustworthy data workflow, just as identity and provenance are central to building durable trust relationships in any high-stakes setting.
Log every access and every handoff
At minimum, you should capture who uploaded the record, what file or fields were included, when the external call was made, which vendor endpoint received it, and what output was returned. These logs should be tamper-evident, centrally retained, and protected from ordinary user deletion. If the AI output is later reviewed by a clinician, case manager, or support specialist, that review should also be logged. A complete chain-of-custody does more than satisfy auditors—it helps you investigate whether a model hallucination, a user error, or a policy failure caused a downstream problem.
4) Retention limits are a contractual requirement, not a nice-to-have
Set explicit retention and deletion windows
The biggest surprise for many teams is how much data AI vendors want to keep by default. Some providers retain prompts and outputs to improve service quality, troubleshoot abuse, or support safety investigations. For PHI, that default may be unacceptable unless contractually narrowed. Your policy should specify how long uploaded documents, derived artifacts, logs, embeddings, and cached outputs can persist. If a vendor cannot delete data on schedule, you do not have a deletion policy—you have a hope.
Distinguish operational logs from content retention
Vendors often argue that they need “some” retention for security and debugging. That may be true, but your contract must separate metadata logs from the underlying content. You may allow short-lived operational telemetry while prohibiting storage of the actual scanned record or full prompt text. Make sure the vendor’s architecture, support process, and incident response tooling align with that distinction. This is similar to how organizations handle public-facing analytics versus source records: the metadata can exist for a purpose, but it must never become an unintended archive of sensitive information.
Verify deletion, not just policy language
Deletion commitments should include timing, scope, and proof. Ask for deletion certificates, API-based delete confirmations, or customer-accessible audit evidence. Also ask what happens to backups, replicas, indexes, and fine-tuning corpora. If the vendor says data is “not used for training,” that is helpful, but it is not the same as deletion. An accountable program requires the ability to prove that records were removed from active systems and excluded from future reuse.
5) Contract terms: what a BAA-like agreement must actually cover
Why the contract matters as much as the control stack
Technical safeguards can be excellent and still leave you exposed if the contract is weak. For HIPAA-covered data, you need a BAA with any vendor acting as a business associate or subcontractor that handles PHI on your behalf. Even when the provider says it is not a covered entity relationship, enterprise customers should still negotiate BAA-like clauses that mirror the operational obligations you need: use limitations, breach reporting, subcontractor controls, and return-or-destruction commitments. Without those terms, your security team may be left relying on marketing copy instead of enforceable obligations.
Key clauses to demand
A serious vendor agreement should spell out: permitted uses of data, prohibition on model training unless explicitly opted in, data localization options if needed, retention periods, deletion SLAs, audit rights or independent assurance reports, incident notification timelines, security control baselines, and indemnity or liability boundaries that reflect the sensitivity of the data. It should also require the vendor to flow the same protections down to subprocessors. If your organization operates in regulated healthcare, privacy counsel should verify whether HIPAA, state privacy laws, and international transfer rules impose additional requirements. For teams building policy artifacts, the same logic used in AI policy customization applies here: do not adopt a template blindly; tailor it to the actual risk.
Don’t forget exit and portability terms
What happens if the vendor changes its terms, raises prices, or has a security incident? Your agreement should give you a clean exit path, including export formats, data return timing, and guaranteed deletion after termination. This matters because the “AI pilot” that starts as a productivity experiment often becomes a system of record-adjacent workflow faster than anyone expects. If you need a model for how contractual and operational dependencies can reshape a service, see how teams think through usage-based pricing and vendor leverage or the jump from consumer tool to enterprise integration.
6) Third-party risk management should be evidence-based, not questionnaire theater
Ask for artifacts, not just promises
Many vendor risk reviews collapse into checkbox exercises. For a third-party AI provider, you should request SOC 2 reports, ISO 27001 evidence, pen test summaries, architecture diagrams, subprocessors lists, and documented data-flow diagrams. Where PHI is involved, insist on clarity about whether the service uses prompt data for service improvement, how support personnel access customer content, and whether human review occurs. If the vendor cannot answer these questions precisely, the risk profile is probably higher than procurement would like to admit.
Validate controls in a sandbox or pilot
Before production rollout, test the exact workflow with nonproduction documents and realistic attack scenarios. Try malformed PDFs, duplicated identifiers, oversized scans, redacted documents, and documents that contain both PHI and non-PHI fields. Measure whether the provider retains inputs, whether logs expose content, whether outputs can be downloaded by unintended users, and whether the system respects deletion requests. This is the enterprise version of evaluating a new product under pressure, much like the way reviewers compare complex system tradeoffs or assess design choices under uncertainty.
Score vendors on control maturity, not feature count
Some AI products look impressive because they offer long-context analysis, natural-language search, and polished dashboards. Those features are irrelevant if the provider cannot prove strong segregation, controlled retention, and contractable data protections. Build a scorecard that weights security architecture, privacy commitments, retention controls, auditability, legal terms, and operational support. In practice, the vendor with fewer features but stronger governance is often the one that survives enterprise scrutiny.
7) A practical implementation checklist for IT teams
Before pilot: lock down scope and approve the data path
Start with a one-page use-case definition that names the data source, record types, purpose of processing, user roles, and permitted outputs. Mark whether the workflow touches PHI, whether the data leaves your tenant, and whether the AI provider is a processor, subcontractor, or independent controller. Then document the minimum necessary content and decide what must be redacted before transmission. If you cannot define the data path clearly on paper, you should not implement it in code.
During build: implement guardrails in the application layer
Insert policy checks before upload, use structured redaction where possible, and block disallowed document classes automatically. Add encryption, egress allowlisting, secrets management, request signing, and rate limiting. Ensure user interfaces clearly warn staff what can and cannot be uploaded, because accidental oversharing is one of the most common failure modes. The workflow should also enforce role-based review and prevent downloads from becoming a parallel shadow archive.
Before go-live: test deletion, logging, and failure modes
Run a tabletop exercise that simulates data-subject access requests, vendor outage, accidental upload of the wrong file, and a request to purge all patient-associated data. Verify that your logs can support incident response without themselves becoming a privacy liability. Confirm that your contract, technical settings, and internal SOPs all agree on retention and deletion. Teams often underestimate this phase, but that is where resilient systems are made.
8) Example architecture patterns that reduce risk without killing utility
Pattern 1: internal preprocessing, external inference
In this design, OCR, redaction, document classification, and identity stripping happen inside your boundary. Only the extracted, minimized text goes to the AI provider, and even then only for narrow tasks like summarization or classification. This is usually the best balance for organizations that need AI assistance but cannot justify sending raw scans off-platform. It also reduces the chances that a vendor stores an entire medical record when only a few fields are needed.
Pattern 2: secure enclave or private cloud deployment
Where budgets and vendor maturity allow, a private deployment or isolated tenant can materially lower risk. You still need a contract, but the operational controls become stronger: stricter access policies, more predictable retention, and better alignment with your network and logging stack. For high-volume healthcare operations, this model can be easier to audit than a generic public SaaS integration. If your organization already manages sensitive workflows in structured systems, the same disciplined approach seen in talent sourcing analytics or real-time analytics overlays can help you design a safer pipeline.
Pattern 3: human-in-the-loop only, with no content persistence
For the most sensitive scenarios, use AI only as a transient assistive layer that receives a document, returns a suggestion, and then discards content immediately under contract. Human reviewers remain responsible for the final output, and the system stores only the approved result plus the audit trail. This pattern is not appropriate for every use case, but it is often the safest way to adopt AI when privacy obligations are strict and tolerance for vendor retention is low.
9) Comparison table: control choices and what they buy you
| Control area | Baseline approach | Stronger enterprise approach | Risk reduced |
|---|---|---|---|
| Encryption in transit | HTTPS only | TLS 1.2+, mTLS, private egress | MITM, endpoint spoofing, traffic interception |
| Encryption at rest | Vendor-managed encryption | Customer-managed keys or envelope encryption | Unauthorized storage access, key exposure |
| Access control | Shared admin roles | SSO, MFA, RBAC/ABAC, JIT elevation | Privilege abuse, account compromise |
| Retention | Vendor default retention | Contractual deletion SLAs and no-training clause | Excess persistence, secondary use, discovery exposure |
| Auditability | Basic app logs | Tamper-evident centralized logs with file-level traceability | Weak incident response, poor accountability |
| Contract terms | Click-through terms | BAA or BAA-like agreement with subprocessor flow-down | Regulatory noncompliance, vendor ambiguity |
| Vendor review | Questionnaire only | Artifacts, testing, and periodic reassessment | Hidden control gaps, stale assumptions |
Pro tip: If your team cannot answer four questions quickly—what data is sent, how it is encrypted, who can see it, and when it is deleted—you are not ready for production. The safest AI integration is the one that is boringly well documented.
10) Common failure modes and how to avoid them
“The vendor said it doesn’t train on our data”
That statement may be true, but it is incomplete. Training is only one risk. Vendors may still retain content for moderation, debugging, abuse prevention, or service improvement unless the contract says otherwise. Your team must explicitly control those other uses. Also remember that outputs can still create privacy leakage if they are stored in downstream systems without access controls.
“We’ll just send de-identified documents”
True de-identification is hard, especially in scanned records with contextual clues, small datasets, and handwritten annotations. A document that looks anonymized to a developer may still be re-identifiable to an operations team or a vendor with auxiliary data. Treat de-identification as a formal process, not a label. If you can’t defend the method technically and legally, assume the document remains sensitive.
“Procurement approved it, so we’re covered”
Procurement approval is not a substitute for security review, privacy review, legal review, and architecture review. Third-party risk is a cross-functional process because the failure modes are cross-functional. The right answer is a documented approval chain with clear owners for data governance, legal terms, and technical controls. Otherwise, the organization ends up with a vendor relationship that nobody truly owns when something goes wrong.
11) Decision checklist: ready, not ready, or no-go
Ready for a limited pilot
You are ready if you have a narrowly defined use case, restricted data scope, SSO/MFA, encrypted transport and storage, tamper-evident logs, a signed BAA or equivalent, and a deletion SLA you can verify. You should also have internal SOPs for exception handling, incident response, and user training. A pilot should be limited to nonproduction or tightly controlled production subsets, with frequent review checkpoints.
Ready for production
Production readiness requires not just a contract and a pilot, but repeatable controls: monitoring, periodic vendor reassessment, tested deletion workflows, and evidence that the vendor’s subprocessors are controlled. You should also have a rollback plan if the provider changes terms or if audit findings reveal unacceptable retention or access issues. If your architecture supports segmentation, use it to isolate the AI workflow from the broader patient-record environment.
No-go
If the vendor will not sign a BAA where required, will not document retention, cannot explain access controls, or insists on broad rights to use your data for training or product improvement, the answer is no. That remains true even if the user experience is excellent or the business team is enthusiastic. Sensitive healthcare data is not the place to trade governance for convenience. Organizations that adopt that discipline are more likely to build sustainable systems that outlast the current AI hype cycle, much like the companies that survive by focusing on trust, simplicity, and consistency rather than novelty.
Conclusion: treat external AI like a regulated processor, not a productivity toy
Third-party AI can absolutely help healthcare and operations teams extract value from scanned records, but only if the integration is built with the same seriousness you would apply to any regulated data processor. Encryption, access control, retention limits, and BAA-grade contractual protections are not separate concerns; they are one control system. When any one piece is missing, the risk shifts from theoretical to operational very quickly. The BBC’s reporting on health-focused AI tools is a reminder that product ambition moves faster than governance, which means IT teams have to be the adults in the room.
If you are evaluating a provider now, start with a short list: classify the data, minimize the payload, lock down identity, verify storage and deletion controls, and insist on enforceable vendor terms. Then pilot carefully, measure the real behavior, and keep the door open for exit. That is how you turn AI from a compliance liability into a controlled capability.
Related Reading
- Cloud, Commerce and Conflict: The Risks of Relying on Commercial AI in Military Ops - A broader look at why commercial AI needs hard boundaries in high-stakes environments.
- Building a Retrieval Dataset from Market Reports for Internal AI Assistants - Useful patterns for minimizing, structuring, and governing document inputs.
- An Ethical AI in Schools Policy Template: What Every Principal Should Customize - A policy-first approach to tailoring AI rules to specific risk profiles.
- Why Human Content Still Wins: Evidence-Based Playbook for High Ranking Pages - Reinforces why human review remains essential in sensitive workflows.
- When Interest Rates Rise: Pricing Strategies for Usage-Based Cloud Services - Helps teams think through vendor economics, lock-in, and contract flexibility.
FAQ
Do we need a BAA for every AI vendor that sees PHI?
If the vendor is creating, receiving, maintaining, or transmitting PHI on your behalf and fits the definition of a business associate, a BAA is generally required under HIPAA. Even when the relationship is more complex, enterprise teams should still negotiate BAA-like terms that clearly define permitted uses, retention, breach response, and deletion.
Is encryption enough to make scanned records safe for external AI?
No. Encryption is essential, but it only protects data in transit and at rest. You still need access control, least privilege, logging, retention limits, and contractual restrictions on secondary use. A provider can be technically encrypted and still be operationally risky if it stores data too long or allows broad internal access.
Can we rely on the vendor’s “no training on your data” statement?
Not by itself. That statement addresses one use case, but it does not automatically cover retention, support access, backups, subprocessors, or deleted copies. You need the commitment in the contract and, ideally, a configuration or technical control that enforces it.
What’s the safest way to send scanned records to AI?
The safest pattern is usually internal preprocessing: OCR and redact inside your boundary, then send only the minimum necessary text to a tightly controlled external service. For highly sensitive use cases, keep the raw scan on your side and use the vendor only for transient, narrowly scoped analysis.
How often should third-party AI vendors be reassessed?
At least annually, and sooner if the vendor changes products, terms, subprocessors, security posture, or data handling practices. If the use case is mission-critical or involves highly sensitive PHI, many organizations reassess quarterly or after any material incident.
What should be in the contract besides a BAA?
At minimum: data-use limitations, no-training language, retention and deletion commitments, breach notification timelines, subprocessor flow-downs, audit evidence, exit and portability terms, and clarity on backups and logs. The contract should match the actual architecture, not just the vendor’s default terms.
Related Topics
Jordan Ellis
Senior Privacy & Governance Editor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you
From scanner to chatbot: secure ingestion pipelines for EHR PDFs and images
Designing auditable e-signature workflows for AI-reviewed medical records
How to feed scanned medical records to AI without exposing PHI
Competitive intelligence framework for selecting enterprise e-sign vendors
Versioned workflow archives for compliance: using offline n8n workflow snapshots to prove auditability
From Our Network
Trending stories across our publication group