Writing PRDs for AI Products: The New Framework

If you write a Product Requirements Document (PRD) for an AI feature using a standard 2019 template, the feature will fail in production.

Standard PRDs are built for deterministic software: If the user clicks button A, show screen B. AI products are non-deterministic: If the user clicks button A, the system might generate a brilliant insight, it might apologize for not knowing the answer, or it might confidently hallucinate a disastrously wrong conclusion.

Because the output is unpredictable, the PRD must shift its focus from defining exact outputs to defining acceptable boundaries and safety nets. Here is the new PRD framework required to ship AI products in 2026.

The Core Difference: Managing Non-Determinism

When writing an AI PRD, you are not writing logic; you are writing a constitution. You must define what the model must do, what it must never do, and how you will measure the difference.

Replace your standard PRD sections with this updated framework.

Section 1: The AI "Job to be Done" & Persona

You still need the standard "Why are we building this?" section, but you must explicitly define the AI's persona and constraints.

Model Persona: "The agent acts as a polite, highly concise data analyst. It must never use emojis or conversational filler (e.g., 'Sure! I can help with that!'). It must only output data."
Knowledge Boundary: "The model is restricted strictly to the user's uploaded CRM data. It must not answer general knowledge questions. If asked 'Who won the Super Bowl?', it must reply: 'I am only authorized to query your CRM data.'"

Section 2: Token Economics & Cost Modeling

Standard software has negligible marginal compute costs. AI does not. You must treat token costs as a core product requirement, not just an engineering afterthought.

Model Selection Hypothesis: "We propose using Claude 3 Haiku for initial triage to save costs, routing to GPT-4o only for complex summarization."
Cost Threshold: "The total compute cost per user interaction must not exceed $0.02. If latency or cost spikes above this during QA, we must fall back to a smaller model."

Section 3: Latency & "Time-to-First-Token" UX

Users will tolerate a 1-second load time for a standard dashboard. They will abandon an AI feature if they stare at a spinning loader for 4 seconds while the LLM thinks.

Streaming Requirements: "The UI must stream tokens to the user within 400ms of the prompt submission to create the illusion of instant responsiveness."
Distraction UX: "If the RAG retrieval pipeline takes longer than 2 seconds, the UI must display a dynamic skeleton loader with status text ('Searching your knowledge base...' -> 'Reading documents...' -> 'Synthesizing...')."

Section 4: Safety Guardrails & Fallback States

This is the most critical section. What happens when the AI fails? Because it will fail.

Hallucination Fallback: "If the model's confidence score drops below 0.8, or if the RAG pipeline returns zero relevant documents, the system must trigger the Fallback State: Do not attempt to generate an answer. Display: 'I could not find a definitive answer in your documents. Would you like to connect with human support?'"
Toxicity/Prompt Injection Guardrails: "All user inputs must pass through a lightweight classification model to detect prompt injection attempts before hitting the core LLM."

Section 5: "Evals" (Evaluation Metrics)

You cannot use manual QA to test an AI feature. You must define the automated evaluations (Evals) in the PRD.

The Ground Truth Dataset: "Before engineering begins, Product will supply a dataset of 100 sample user questions and the perfectly desired answers."
LLM-as-a-Judge Criteria: "We will run an automated script using an evaluator LLM. The evaluator must grade the feature's responses to the Ground Truth dataset. The feature cannot launch to production unless it achieves a 95% accuracy score and a 0% toxicity score on the Evals."

Conclusion: The "Kill Switch"

Finally, every AI PRD must include a Kill Switch protocol.

Because models can drift or suddenly become toxic due to underlying API updates you don't control, you must define the criteria for shutting the feature off.

Kill Switch Criteria: "If user error reports related to hallucinations exceed 2% of total daily queries, engineering is authorized to instantly disable the AI feature and revert the UI to the legacy search bar."

An AI PRD is a risk mitigation document. Build the fences, define the costs, and design for failure.

External References

Elevate Your PM Career

Are you ready to test your product sense and see where you stand in the AI era? Take the ORLOG PM Assessment to get your personalized growth roadmap and discover your PM archetype.

FAQ

Do I still need user stories in an AI PRD?

Yes, but they focus on the interaction layer. (e.g., "As a user, I want to see the citations for the AI's answer so I can trust the data"). You do not write user stories for the LLM's internal reasoning.

How do I estimate the cost of an AI feature in the PRD?

Calculate the average expected length of a user prompt + the length of your system prompt + the average expected output length. Multiply that total token count by the API pricing of your chosen model, then multiply by expected daily active users.

What is a 'Ground Truth' dataset?

It is a spreadsheet containing realistic user inputs and the exact, perfect outputs you expect the AI to generate. It serves as the automated testing baseline to ensure the model behaves correctly before launch.