Registry / No-training default

Which AI providers don't train on your API data?

Public commitments not to train on customer API data by default, cell by cell, with sources. The cell answers: Is there a public commitment not to train on customer API data by default? Statuses below are evidence grades, not endorsements, “no public evidence” means we could not verify it from public sources, not that the answer is no.

OpenAI API first-party API

●Yes, public confidence: high · verified 2026-07-05

Docs state "data sent to the OpenAI API is not used to train or improve OpenAI models (unless you explicitly opt in to share data with us)". No-training is the default; sharing is opt-in only.

source · archived copy · full cell

Azure OpenAI Service OpenAI model, served by Microsoft Azure

●Yes, public confidence: high · verified 2026-07-05

Microsoft's public commitment (data-privacy page, verified 2026-07-05): prompts, completions, embeddings, and training data "are NOT available to OpenAI", "are NOT used by providers of Models sold by Azure to improve their models", and "are NOT used to train any generative AI foundation models without your permission or instruction". Models are stateless; fine-tuned models are exclusive to the customer. Original URL learn.microsoft.com/en-us/legal/cognitive-services/openai/data-privacy now canonicalizes to the Foundry responsible-ai path.

source · archived copy · full cell

Anthropic API first-party API

●Yes, public confidence: high · verified 2026-07-05

Commercial Terms (Customer Content section) state Anthropic may not train models on Customer Content from the Services; API docs reiterate retained data "is never used for model training without your express permission". Important distinction: in Aug/Sep 2025 Anthropic changed CONSUMER terms (Claude Free/Pro/Max) to allow training when the user enables the setting, with 5-year retention if enabled (anthropic.com/news/updates-to-our-consumer-terms, decision deadline 2025-10-08). That change covers consumer accounts only; the commercial API default (no training on customer content) is unchanged. This cell records the commercial-API answer.

source · archived copy · full cell

Claude via AWS Bedrock Anthropic model, served by AWS Bedrock

●Yes, public confidence: high · verified 2026-07-05

Bedrock FAQ: "your content is not used to improve the base models and is not shared with any model providers"; AWS and third-party model providers "will not use any inputs to or outputs from Amazon Bedrock to train" their models. Architecturally, model-provider deployment accounts give Anthropic no access to prompts/completions (docs.aws.amazon.com/bedrock/latest/userguide/data-protection.html). Caveat: the newest Claude models (Fable 5, Mythos 5) require an explicit provider_data_share opt-in that shares retained traffic with Anthropic for trust-and-safety review, a safety-review carve-out, not a training grant; the no-training commitment still applies.

source · archived copy · full cell

Claude via Google Vertex AI Anthropic model, served by Google Cloud Vertex AI

●Yes, public confidence: high · verified 2026-07-05

Google Cloud Service Terms Section 17 (Training Restriction) commits that Google will not use customer data to train or fine-tune AI/ML models without customer permission or instruction; the Vertex AI generative AI data governance page states prompts, responses, and adapter training data are not used to train foundation models by default, and that customer prompts/responses are not shared with third parties, including partner-model providers such as Anthropic. Archived snapshot is of the pre-migration cloud.google.com URL for the same page.

source · archived copy · full cell

Gemini via Vertex AI Google model, served by Google Cloud Vertex AI

●Yes, public confidence: high · verified 2026-07-05

"As outlined in Section 17 'Training Restriction' in the Service Terms section of Service Specific Terms, Google won't use your data to train or fine-tune any AI/ML models without your prior permission or instruction. This applies to all managed models on Gemini Enterprise Agent Platform, including GA and pre-GA models." The old URL cloud.google.com/vertex-ai/generative-ai/docs/data-governance now redirects to this page; the archived snapshot is of the pre-rename URL.

source · archived copy · full cell

AWS Bedrock (platform) platform row

●Yes, public confidence: high · verified 2026-07-05

Bedrock FAQ commits that "AWS and the third-party model providers will not use any inputs to or outputs from Amazon Bedrock to train Amazon Nova, Amazon Titan, or any third-party models," and that inputs/outputs are not shared with model providers. The Bedrock user guide additionally documents that model providers have no access to the AWS-operated Model Deployment Accounts, so they cannot see Bedrock logs or customer prompts/completions (https://docs.aws.amazon.com/bedrock/latest/userguide/data-protection.html). Caveat: the separate provider_data_share retention mode (see retention_zdr) shares data with the model provider for trust & safety, not training.

source · archived copy · full cell

Mistral La Plateforme first-party API

◔Partial confidence: high · verified 2026-07-05

Not a blanket no-training commitment for the whole platform: paid (Scale) API customers are opted out of training by default, but free-tier API usage is opted IN by default and requires a manual console toggle to opt out. DPA confirms Mistral acts as controller for training unless the customer opted out or uses a product opted out by default.

source · full cell

Mistral via Azure AI Mistral AI model, served by Microsoft Azure

●Yes, public confidence: high · verified 2026-07-05

Explicit commitment for serverless deployments - "Microsoft doesn't share these prompts and outputs with the model provider. Also, Microsoft doesn't use these prompts and outputs to train or improve Microsoft models, the model provider's models, or any third party's models." The Foundry Models FAQ repeats this ("customer data is never shared with model providers"). Caveat - Microsoft may share customer contact information and transaction/usage-volume details (not content) with the model publisher for marketplace purposes.

source · archived copy · full cell

Cohere API Cohere model, served by Cohere (first-party)

◔Partial confidence: medium · verified 2026-07-05

No commitment not to train by default on the SaaS API: Cohere states customers "can opt out from your prompts and generations being used to train Cohere models" via dashboard settings, i.e. training use is on unless the customer toggles it off (opt-out, not opt-in). Cohere says it filters/strips common personal information before any training use. For private/cloud-partner deployments Cohere receives no prompts or generations at all. Confidence medium because the default-on state is implied by the opt-out framing rather than stated as "default".

source · archived copy · full cell

Cohere via AWS Bedrock Cohere model, served by AWS Bedrock

●Yes, public confidence: high · verified 2026-07-05

Platform-level commitment for this offering: Bedrock states customer content is not used to improve base models and is not shared with model providers (i.e., Cohere never sees prompts/completions). Bedrock's Model Deployment Account design gives providers no access to inference infrastructure or logs. Bedrock's newer data-retention modes include a provider_data_share opt-in required by certain models; Cohere models are not listed among those requiring it, default behavior for Cohere models remains no provider sharing.

source · archived copy · full cell

Llama via AWS Bedrock Meta model, served by AWS Bedrock

●Yes, public confidence: high · verified 2026-07-05

Bedrock FAQ: "Your content is not used to improve the base models and is not shared with any model providers"; inputs and outputs are not shared with model providers. Architecturally, Bedrock runs each provider's model in an AWS-operated Model Deployment Account that the provider cannot access, so Meta has no access to customer prompts/completions (docs.aws.amazon.com/bedrock/latest/userguide/data-protection.html). No Llama model is documented as requiring the provider_data_share retention mode (that opt-in currently applies to certain Anthropic models only).

source · archived copy · full cell

Llama via Azure AI Meta model, served by Microsoft Azure (Azure AI Foundry / Models-as-a-Service)

●Yes, public confidence: high · verified 2026-07-05

Explicit public commitment for serverless API (MaaS) deployments: "Microsoft doesn't share these prompts and outputs with the model provider. Also, Microsoft doesn't use these prompts and outputs to train or improve Microsoft models, the model provider's models, or any third party's models." Fine-tuning data likewise not used to train other models. Caveat (not content): Microsoft may share customer contact information and transaction/usage-volume details with the model publisher (Meta) for marketplace purposes.

source · archived copy · full cell

xAI API xAI model, served by xAI (first-party)

●Yes, public confidence: high · verified 2026-07-05

API security FAQ: "xAI never trains on your API inputs or outputs without your explicit permission." No-training is the default for API traffic; opt-in is required for training use. (Consumer Grok products have different defaults; this cell covers the API offering only.)

source · archived copy · full cell

DeepSeek API (first-party) first-party API

○No public evidence confidence: high · verified 2026-07-05

No public commitment NOT to train on customer data by default was found; the privacy policy states the opposite. It lists Prompts/Inputs ("text input, voice input, prompt, uploaded files, photos, feedback, chat history, or other content that you provide to our model and Services") as collected data and states they are used "to improve and develop the Services and to train and improve our technology, such as our machine learning models and algorithms." The policy's rights section lists "the right to opt-out of using your Personal Data for training our models or optimizing our technologies" for all users (not only the European Region). The Open Platform Terms of Service address customers' use of Inputs/Outputs (including distillation) but do not state whether DeepSeek trains on API data. The privacy policy states it applies to DeepSeek "apps, websites, software, and related services" that link to it, excluding downstream applications built by platform developers.

source · archived copy · full cell

DeepSeek via Fireworks AI DeepSeek model, served by Fireworks AI

●Yes, public confidence: high · verified 2026-07-05

Privacy policy: "We do not use your prompts, training data, or API inputs to train or improve our AI models without your explicit opt-in." The public DPA reinforces this contractually, prohibiting "using Covered Data to train, fine-tune, or otherwise improve any shared or foundational model." Because Fireworks hosts open weights, customer data also cannot reach DeepSeek for training.

source · archived copy · full cell