The Legal Stack for AI Companies: A Complete Guide

This is educational material and does not constitute legal advice nor is any attorney/client relationship created with this article, hence you should contact and engage an attorney if you have any legal questions.

The Core Premise

Every AI company needs a contract architecture. Not a pile of templates, but an architecture. A system where each document exists for a specific reason, sits in a specific relationship to the others, and covers a specific set of risks.

This guide teaches that architecture from the ground up. By the end, you should be able to explain to anyone what the components are, why they exist, who signs what with whom, in what order, and where things typically go wrong.

We’ll use a simple running example throughout: Startup, Inc., a company that built an AI-powered SaaS product on top of foundation models like ChatGPT or Claude. They have business customers. Those customers have end users. And Startup, Inc. relies on various vendors to deliver its service.

That’s the setup. Let’s build.

Part 1: The Roles

Before contracts make sense, roles have to make sense. Data protection law (GDPR, CCPA/CPRA, and the growing number of state privacy laws) creates categories of actors, each with different obligations. If you misidentify who’s who, your entire contract stack is built wrong.

The Data Subject is the human whose data is being processed. When someone uses an AI product, types a query, uploads a document, or receives a response, they’re the Data Subject. They’re not a party to your commercial contracts, but they’re the person these protections exist to serve.

The Controller is the entity that decides why data is processed (the purpose) and how in broad terms (the means). The Controller answers the question: what are we trying to accomplish, and what’s the general approach? In most B2B scenarios, your customer is the Controller. They decided to use AI for some business purpose. They chose to engage Startup, Inc. to help them do it.

The Processor is the entity that processes data on behalf of the Controller, following the Controller’s instructions. The Processor answers a different question: how do we execute what the Controller has asked for? In most B2B scenarios, Startup, Inc. is the Processor. They handle data according to the customer’s purposes and instructions.

The Sub-Processor is a Processor’s processor. When Startup, Inc. engages another company to help deliver its service (cloud hosting, an LLM provider, analytics tools), that downstream company is a Sub-Processor.

Here’s what this looks like in practice:

End User (Data Subject)

↓ uses product

Enterprise Customer (Controller)

↓ engages

Startup, Inc. (Processor)

↓ relies on

CloudCo, LLM Provider, AnalyticsCo (Sub-Processors)

The critical insight is that wherever personal data flows, a contract must exist between those parties governing that data. The Controller needs a data protection agreement with the Processor. The Processor needs data protection agreements with each Sub-Processor. Miss a link, and you have a compliance gap.

Part 2: The Contracts

Now the pieces. For an AI company serving business customers, the contract stack has four layers.

Layer 1: Master Service Agreement (MSA)

The MSA is the foundational commercial contract between Startup, Inc. and its customer. It governs the overall relationship: scope of services, payment terms, warranties and disclaimers, liability caps, indemnification, termination rights, intellectual property, and confidentiality.

For AI companies specifically, the MSA is where you allocate AI-specific risk. Traditional software warranties assume deterministic behavior, where the same inputs produce the same outputs. AI doesn’t work that way, and the MSA must reflect that reality. We’ll dig into this in Part 5.

Layer 2: Data Processing Addendum (DPA)

The DPA is a contract governing how Startup, Inc. handles Personal Data on behalf of the customer. It’s usually attached as a schedule to the MSA.

A DPA is required when processing Personal Data of individuals protected by GDPR, CCPA/CPRA, and the growing number of state privacy laws (Virginia, Colorado, Connecticut, Utah, Texas, Oregon, Montana, Iowa, Delaware, Nebraska, New Jersey, Tennessee, Maryland, Minnesota, and others). As a practical matter, assume you need one for every US business customer.

The DPA covers the categories of data processed, purposes of processing, security measures, sub-processor obligations, data subject rights procedures, breach notification requirements, and data deletion or return on termination.

Here’s the key principle: DPA obligations cascade downstream. Your customer’s DPA with you creates obligations that you must flow down to your Sub-Processors. You need DPAs with every vendor that touches Personal Data.

Layer 3: Business Associate Agreement (BAA)

The BAA is a contract required under HIPAA when processing Protected Health Information (PHI). You need one only when processing PHI on behalf of a Covered Entity (healthcare provider, health plan, clearinghouse) or another Business Associate. If your customer isn’t in healthcare and you’re not processing health data, you don’t need one.

The BAA covers permitted uses and disclosures of PHI, safeguard requirements, breach reporting obligations, subcontractor requirements, and PHI return or destruction on termination.

Here’s a critical distinction that trips people up: DPA and BAA are not interchangeable. A DPA is required for any Personal Data under GDPR, CCPA/CPRA, and other state privacy laws. A BAA is required only for PHI under HIPAA. If you serve a healthcare customer and process PHI, you need both. A BAA doesn’t satisfy GDPR obligations. A DPA doesn’t satisfy HIPAA obligations.

Layer 4: Acceptable Use Policy (AUP)

The AUP is a policy defining prohibited uses of the platform, usually attached as a schedule to the MSA. It covers prohibited activities, content restrictions, compliance requirements, and consequences of violation like suspension or termination.

For AI companies, the AUP matters because AI platforms can be misused in ways traditional software cannot. Users might try to generate harmful content, extract training data, or use outputs for fraud. The AUP is your contractual basis to say no and, if necessary, cut them off.

The Complete Stack

For a white-label or enterprise customer, the full stack looks like this:

MSA: Commercial terms, AI warranties, liability allocation

Schedule A: Service Description

Schedule B: DPA (required for GDPR/state privacy compliance)

Schedule C: AUP (prohibited uses)

Schedule D: BAA (required ONLY if processing PHI/healthcare customers)

For B2C or website customers, the structure simplifies: Terms of Service and Privacy Policy with GDPR and state privacy law provisions baked in, plus BAA provisions incorporated into the Terms if you’re processing PHI.

Part 3: When Each Agreement Is Required

This is where people get confused. Let’s be precise about what triggers each requirement.

The DPA Trigger

A DPA is required when you process Personal Data of individuals protected by GDPR, CCPA/CPRA, or the growing number of state privacy laws.

Personal Data is defined broadly. It includes names, emails, phone numbers, IP addresses, user queries and inputs, AI responses (if they contain or reference personal information), usage patterns and behavior data, and device identifiers.

Processing is defined even more broadly. It means any operation on data: collection, storage, retrieval, use, transmission, deletion, anything. Under GDPR, even transient routing counts. If data touches your servers in any way, you’re processing it.

The practical rule is this: if you can see it, log it, route it, or handle it, even momentarily, you need a DPA.

The BAA Trigger

A BAA is required when you create, receive, maintain, or transmit Protected Health Information on behalf of a Covered Entity or Business Associate.

Note that word “transmit.” The trigger isn’t storage. If PHI passes through your servers, transiently, encrypted, never persisted, you’re a Business Associate and you need a BAA.

The difference between DPA and BAA isn’t the trigger threshold. Both are triggered by transient handling. The difference is the obligations that attach once triggered. HIPAA requires specific technical safeguards, audit controls, workforce training, breach notification timelines, and the “minimum necessary” standard. Signing a BAA means you can actually deliver on those requirements. You don’t sign one casually.

Decision Table

Customer Scenario	DPA?	BAA?
US-only, no CA/VA/CO/CT/other covered state residents, non-healthcare	Likely No, but verify against all applicable state laws	No
Has California users, non-healthcare	Yes	No
Has EU users, non-healthcare	Yes	No
Healthcare, processing PHI	Yes	Yes
Healthcare, but no PHI involved	Maybe*	No

*Still likely yes if processing any Personal Data of CA, EU, or other covered state individuals.

Part 4: The Flow of Agreements

Understanding who signs what with whom is essential. Here’s how the contracts flow.

Customer to Startup, Inc.

The customer (as Controller) needs a data processing agreement with Startup, Inc. (as Processor). This is your DPA. Either you provide one for customers to sign, or they provide one that you negotiate and execute.

The MSA, DPA, AUP, and (if applicable) BAA all live in this relationship.

Startup, Inc. to Sub-Processors

Startup, Inc. (as Processor) needs data protection agreements with every Sub-Processor that handles Personal Data. This is a GDPR and state privacy law requirement: you must ensure your Sub-Processors are bound by data protection terms at least as protective as your DPA with the customer.

Typical Sub-Processors for an AI company include cloud infrastructure providers like AWS, GCP, or Azure; LLM providers like OpenAI or Anthropic; analytics tools like Mixpanel, Amplitude, or Segment; email and communications services like SendGrid or Twilio; payment processors like Stripe; monitoring and logging tools like Datadog or Sentry; and authentication providers like Auth0 or Okta.

Most major vendors have standard DPAs available, usually self-service through their Trust Center or legal page. The hard part isn’t execution. The hard part is knowing you need them in the first place.

If you also have healthcare customers and process PHI, you need BAAs with every Sub-Processor that touches PHI.

The Cascade

Every arrow where Personal Data flows requires a corresponding agreement.

Part 5: The AI Liability Problem

Here’s what makes AI companies different from traditional SaaS: you cannot warrant AI outputs.

Why Traditional Warranties Fail

Traditional software warranties work because software is deterministic. Given the same inputs, you get the same outputs. You can test every code path. You can guarantee behavior.

AI systems are probabilistic. They produce different outputs for similar inputs. They hallucinate. They generate unexpected results even when working exactly as designed. Unexpected behavior isn’t a bug. It’s inherent to how these systems work.

If your MSA warrants that your AI will produce accurate, compliant, or expected outputs, you’ve made a promise you cannot keep.

What You Can Warrant

You can warrant things you actually control.

System architecture and security: your infrastructure is built to documented specifications, with appropriate security controls. Data handling: you process data according to the DPA, with proper encryption, access controls, and retention policies. Platform availability: the service meets defined uptime SLAs. Compliance posture: you maintain SOC 2 (a security audit standard), ISO 27001 (an information security management certification), or other relevant certifications. No malware: your code doesn’t contain malicious components.

These are statements about your system. Things you design, build, and control.

What You Cannot Warrant

You cannot warrant things the AI determines.

You cannot warrant specific outputs or responses. You cannot warrant accuracy of AI-generated content. You cannot warrant that AI won’t hallucinate. You cannot warrant that outputs will be legally compliant in all contexts. You cannot warrant that behavior will never be unexpected.

The Practical Framework

When a customer tries to make you liable for AI outputs (and they will), here’s the framework.

Accept responsibility for: platform availability and security, data handling per the DPA, the system operating as documented, and defects in your code and infrastructure.

Reject responsibility for: AI outputs that deviate from expectations absent a defect in your system, downstream compliance issues from how the customer uses outputs, and “unexpected behaviors” inherent to AI systems.

The language matters. A customer might propose: “Vendor shall be responsible for compliance issues arising from AI-generated outputs.”

That sounds reasonable until you realize AI outputs are inherently variable. You’ve just accepted unlimited liability for something you can’t control.

Counter-position: “Vendor shall be responsible for defects in the documented system design. Vendor is not responsible for AI outputs that deviate from expected behavior absent such defects.”

The Insurance Reality

The AI liability insurance market is evolving rapidly. Standard E&O and Cyber policies likely cover data breaches, security incidents, platform downtime, and processing errors in your infrastructure.

For AI-specific risks like hallucinations, model drift, and output liability, specialized products are now emerging. Several insurers have launched AI-specific liability coverage, including products covering hallucinations and model drift. Evaluate whether AI-specific liability insurance makes sense for your risk profile, but don’t rely on it as a substitute for proper contractual protections. The market is still maturing, coverage terms vary significantly, and not all AI risks may be insurable at reasonable cost.

Structure your contracts to allocate risk appropriately regardless of insurance availability.

Part 6: BYOK and the Controller/Processor Question

Bring Your Own Key (BYOK) arrangements, where the customer uses their own API key for the underlying LLM, create ambiguity in the controller/processor chain. Working through this ambiguity is a good test of whether you actually understand the framework.

Why BYOK Matters

In a standard setup, Startup, Inc. uses its own API key with the LLM provider. Data flows from Customer to Startup, Inc. to LLM Provider. The LLM Provider is Startup, Inc.’s Sub-Processor. Simple.

With BYOK, the customer provides their own API key. Now who has the relationship with the LLM Provider?

Scenario A: True BYOK

The customer has their own independent account with the LLM provider. They control that relationship. They signed their own terms, they manage their own settings, they get billed directly. Startup, Inc. is just passing traffic through.

In this scenario, the Customer is the Controller, Startup, Inc. is the Processor (for orchestration and routing), and the LLM Provider is a Processor directly to the Customer, not Startup, Inc.’s Sub-Processor.

The contract implications: Customer needs a DPA from Startup, Inc. Customer needs a separate DPA from the LLM Provider. Startup, Inc. does not need a DPA with the LLM Provider for this customer’s data.

Scenario B: Platform-Managed Key

The customer provides an API key, but Startup, Inc. controls the actual integration. Startup, Inc. decides what prompts get sent, what settings are used, how responses are processed. The customer’s key is really just for billing or cost allocation.

In this scenario, the Customer is the Controller, Startup, Inc. is the Processor, and the LLM Provider is Startup, Inc.’s Sub-Processor.

The contract implications: Customer needs a DPA from Startup, Inc. Startup, Inc. needs a DPA with the LLM Provider. Customer does not need a separate DPA with the LLM Provider.

Scenario C: No External LLM

Startup, Inc. runs its own models or uses fully self-hosted infrastructure. No external LLM provider involved.

In this scenario, the Customer is the Controller, Startup, Inc. is the Processor, and there’s no LLM Provider in the chain.

The contract implications: Customer needs a DPA from Startup, Inc. No LLM provider agreements needed.

The Test

How do you know which scenario applies? Ask this question: who can give the LLM provider binding processing instructions?

If the customer can independently instruct the LLM provider on how to handle data, it’s Scenario A. If only Startup, Inc. can configure and instruct the LLM provider, it’s Scenario B.

One Thing That Doesn’t Change

Regardless of BYOK configuration, if data passes through Startup, Inc.’s servers, even transiently, Startup, Inc. is a Processor and needs a DPA with the customer.

BYOK doesn’t change the Startup, Inc. to Customer relationship. It only affects whether the LLM Provider is Startup, Inc.’s Sub-Processor or the Customer’s direct Processor.

Part 7: The Sub-Processor Audit

Every DPA you sign includes language like this: “Processor shall ensure each Sub-Processor is bound by data protection obligations no less protective than this DPA.”

That’s not boilerplate. That’s an obligation. You need to actually do it.

The Obvious Sub-Processors

The obvious ones are easy to spot: cloud hosting (AWS, GCP, Azure), LLM providers (OpenAI, Anthropic), and primary databases (MongoDB Atlas, Supabase, PlanetScale).

The Less Obvious Ones

The less obvious ones often get missed. Analytics platforms like Mixpanel, Amplitude, Segment, or Google Analytics. Monitoring and logging tools like Datadog, Sentry, or LogRocket. Email services like SendGrid, Mailchimp, or Postmark. Authentication providers like Auth0, Okta, or Clerk. Payment processors like Stripe or Braintree. Support platforms like Intercom or Zendesk. CDNs like Cloudflare or Fastly, which may log user data. CI/CD tools like GitHub Actions or CircleCI, if they touch customer data in builds.

If it processes Personal Data, even just logging an IP address, it’s a Sub-Processor.

The Audit Process

The process is straightforward. List every third-party service you use. Identify which ones process Personal Data. Execute DPAs with those that do (most have self-service DPAs available). Document everything in a Sub-Processor list. Update the list whenever you add new vendors.

If you have healthcare customers, repeat for BAAs with any Sub-Processor that touches PHI.

Maintaining the List

Most DPAs require you to maintain a list of Sub-Processors and notify customers of changes. Keep this list current. When you add a new vendor that processes Personal Data, execute the DPA with that vendor, add them to your Sub-Processor list, and notify customers per your DPA’s requirements (often 30 days advance notice).

Part 8: AI-Specific DPA Provisions

A generic SaaS DPA template won’t fully address AI-specific concerns. Here’s what to add or ensure is covered.

Training Prohibition

Customers will want assurance that their data isn’t used to train models, whether yours or your LLM provider’s.

Include explicit language: “Processor shall not use Customer Data to train, improve, or develop machine learning models, and shall ensure Sub-Processors are contractually prohibited from the same.”

Then verify your LLM provider’s terms actually support this. Most enterprise LLM agreements include training opt-outs, but you need to confirm and document it.

Prompt Isolation

For multi-tenant AI platforms, customers may want assurance that their prompts, data, and configurations don’t leak to other tenants.

Address this in your security measures section. Describe how you maintain tenant isolation in prompt construction, response handling, and any fine-tuning or customization.

Configurable Logging

AI interactions often generate logs: prompts, responses, metadata. Different customers have different logging requirements. Some want full audit trails. Some want minimal logging for privacy. Some have specific retention requirements.

Build flexibility into your architecture and document the options in your DPA.

Architecture Transparency

Sophisticated customers will want to understand data flows. Be prepared to explain and document what data passes to LLM providers, what data is stored versus transient, what preprocessing or postprocessing occurs, and where data is geographically processed.

Retention Specificity

Generic DPAs often say data will be retained as necessary for the service. For AI, be more specific. Spell out retention periods for conversation or session data, logs, derived analytics, and model fine-tuning data if any.

Part 9: MSA Red Flags

When customers redline your MSA, watch for these patterns.

AI Output Liability Creep

The red flag: “Vendor shall be responsible for compliance issues arising from AI-generated outputs when used per documentation.”

Why it’s dangerous: AI outputs are inherently unpredictable. This makes you liable for something you cannot control, with no limiting principle.

The counter: limit responsibility to system defects. “Vendor is responsible for defects in documented system design. Vendor is not responsible for variations in AI output absent such defects.”

Gutted Suspension Rights

The red flag: Customer revises your suspension rights from “AUP breach or security risk” to “material, demonstrable violation of law posing imminent harm, with 30-day cure period.”

Why it’s dangerous: You can’t act quickly on AUP violations, non-payment, or security threats. By the time you navigate that language, the damage is done.

The counter: Restore reasonable suspension triggers for AUP breach, non-payment, and security risk. Cure periods are fine for some issues but not all.

One-Sided Termination Plus Extended Wind-Down

The red flag: Customer adds termination for convenience (theirs, not yours) plus a 6-month wind-down period with “migration assistance.”

Why it’s dangerous: Customer can decide to leave and get six months of essentially free service while transitioning to a competitor.

The counter: If they want convenience termination, fine, but no extended free wind-down. Or make termination for convenience mutual.

Deleted Use Case Vetting

The red flag: Customer strikes language requiring them to screen downstream use cases or prohibit high-risk applications.

Why it’s dangerous: For platforms where the customer is reselling or white-labeling your service, you need a contractual hook to hold them accountable for how their customers use it.

The counter: Restore the vetting requirement. This is especially important for AI platforms with potential for misuse.

Expanded Warranty Language

The red flag: Customer adds AI-specific warranties like “AI outputs will be accurate” or “AI will comply with applicable law” or “AI will not produce unexpected results.”

Why it’s dangerous: You’re now warranting things that are definitionally impossible to guarantee.

The counter: Strike or heavily qualify. Reference the limitations in your documentation. Make clear what you warrant (system operation) versus what you don’t (AI outputs).

Part 10: Order of Operations

If you’re building this from scratch, here’s the sequence.

Phase 1: Core Documents (Weeks 1-2)

Draft your MSA with AI-specific provisions covering warranty limitations and liability allocation. Draft your DPA with AI-specific provisions covering training prohibition, prompt isolation, and architecture transparency. Draft your AUP defining prohibited uses. If you plan to serve healthcare customers, draft a BAA template.

Phase 2: Primary Sub-Processors (Weeks 3-4)

Execute DPAs with your highest-priority Sub-Processors: cloud infrastructure, LLM providers, primary database, and core analytics. Document everything in your Sub-Processor list.

Phase 3: Secondary Sub-Processors (Weeks 5-6)

Execute DPAs with remaining Sub-Processors: monitoring and logging, email and communications, authentication, payments, and support tools. Update your Sub-Processor list.

Ongoing

Review customer redlines against the red flags above. Update your Sub-Processor list when adding vendors. Conduct annual reviews of DPA and BAA status with all Sub-Processors. Monitor regulatory changes, particularly the EU AI Act and new state privacy laws.

Checklist

For B2C/Website Customers

Terms of Service with GDPR and state privacy law provisions
Privacy Policy with GDPR and state privacy law provisions
BAA provisions incorporated if processing PHI
Cookie consent mechanism if applicable

For Enterprise/White-Label Customers

Master Service Agreement
DPA as schedule, for all customers
AUP as schedule, for all customers
BAA as schedule, only if processing PHI

Sub-Processor Compliance

Complete vendor inventory
DPAs executed with all vendors that process Personal Data
BAAs executed with all vendors that process PHI if applicable
Sub-Processor list documented and maintained
Update process defined for new vendors

AI-Specific Provisions

Training prohibition language in DPA
Prompt isolation documentation
Configurable logging options
Architecture transparency documentation
Specific retention timelines
Warranty limitations in MSA
Liability caps appropriate for AI risk

A Note on the EU AI Act

This guide focuses on the foundational contract architecture that every AI company needs. However, readers serving EU customers or using EU-based models should be aware that the EU AI Act is now partially in force, with key provisions having taken effect in 2025 and more coming in 2026.

The EU AI Act creates additional obligations for AI providers and deployers, including risk classification requirements, transparency obligations, and specific rules for general-purpose AI (GPAI) models. These requirements layer on top of the contract stack described here and may require additional contractual provisions depending on your AI system’s risk classification and your role in the AI value chain.

If you serve EU customers or your AI systems are deployed in the EU, work with qualified counsel to ensure your contracts address EU AI Act requirements in addition to the data protection framework covered in this guide.

Read here for a deep dive analysis of the EU AI Act.

Key Takeaways

DPAs and BAAs are not interchangeable. DPAs cover Personal Data under GDPR, CCPA/CPRA, and other state privacy laws. BAAs cover PHI under HIPAA. If you serve a healthcare customer processing PHI, you need both.

You’re always a Processor. Data passes through your servers even with BYOK. You need DPAs with customers and with your Sub-Processors.

You cannot warrant AI outputs. You can warrant system design, security, and data handling. You cannot promise that AI won’t produce unexpected results.

The Sub-Processor obligation is broader than you think. Every SaaS tool that touches Personal Data needs a DPA. Every tool that touches PHI needs a BAA.

Watch for MSA red flags. AI output liability, gutted suspension rights, one-sided termination, and deleted use case vetting are patterns that can hurt you.

The hard part is knowing what you need. Most major vendors have standard DPAs available. The challenge is recognizing that you need them and actually executing them.

This guide is for informational purposes and does not constitute legal advice. Every company’s situation is different. Work with qualified counsel to implement these frameworks for your specific circumstances.