Skip to content

AI Integration · LLM Engineering

Structured Outputs from LLMs: JSON Mode, Tool Calls, and Schema Validation in Practice

Getting a language model to return reliably structured data is not just about asking nicely. Here's the pattern that actually works at production scale.

Anurag Verma

Anurag Verma

6 min read

Structured Outputs from LLMs: JSON Mode, Tool Calls, and Schema Validation in Practice

Sponsored

Share

Language models output text. Your application needs structured data. The gap between those two facts is where most AI integration bugs live.

The naive approach: ask the model to return JSON, parse the response, catch the errors. This works in development when you’re writing the prompts yourself and checking every output. It fails in production when the model decides to wrap the JSON in a markdown code block, include an explanation before the JSON, or return a slightly different structure than you specified.

The 2026 approach is different. Every major model provider now exposes structured output mechanisms that constrain the model’s output to match a schema you define. Getting this right removes an entire category of runtime errors.

JSON Mode vs Structured Outputs

These are two different things that are often conflated.

JSON mode tells the model to produce valid JSON. That’s all it guarantees. The model is still free to use any key names, any nesting structure, any field types. You get syntactically valid JSON with semantically unpredictable structure.

Structured outputs (or function/tool calling with strict schema enforcement) constrain the model to produce JSON that matches a specific schema you define. If your schema says { name: string, age: number }, the model will always produce exactly that.

Structured outputs require more setup but are what you want for production systems.

Using Tool Calls for Data Extraction

The most reliable way to get structured data from a model today is through tool/function calling. You define a tool with an input schema, the model “calls” the tool, and you receive the arguments as a parsed object.

In TypeScript with the Anthropic SDK:

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const extractContactInfo = async (text: string) => {
  const response = await client.messages.create({
    model: 'claude-opus-4-6',
    max_tokens: 1024,
    tools: [
      {
        name: 'extract_contact',
        description: 'Extract contact information from text',
        input_schema: {
          type: 'object',
          properties: {
            name: { type: 'string', description: 'Full name' },
            email: { type: 'string', description: 'Email address' },
            phone: { type: 'string', description: 'Phone number with country code' },
            company: { type: 'string', description: 'Company or organization name' },
          },
          required: ['name'],
        },
      },
    ],
    tool_choice: { type: 'tool', name: 'extract_contact' },
    messages: [
      {
        role: 'user',
        content: `Extract the contact information from this text:\n\n${text}`,
      },
    ],
  });

  const toolUse = response.content.find(block => block.type === 'tool_use');
  if (!toolUse || toolUse.type !== 'tool_use') {
    throw new Error('Model did not call the expected tool');
  }

  return toolUse.input as {
    name: string;
    email?: string;
    phone?: string;
    company?: string;
  };
};

const result = await extractContactInfo(
  'Please contact Sarah Chen at sarah@acme.com or +1-555-0123 for more information about Acme Corp.'
);

console.log(result);
// { name: 'Sarah Chen', email: 'sarah@acme.com', phone: '+1-555-0123', company: 'Acme Corp' }

tool_choice: { type: 'tool', name: 'extract_contact' } forces the model to call that specific tool. Without this, the model may decide to respond with text instead.

With OpenAI’s structured outputs API:

import OpenAI from 'openai';
import { zodResponseFormat } from 'openai/helpers/zod';
import { z } from 'zod';

const client = new OpenAI();

const ContactSchema = z.object({
  name: z.string(),
  email: z.string().email().optional(),
  phone: z.string().optional(),
  company: z.string().optional(),
});

const response = await client.beta.chat.completions.parse({
  model: 'gpt-4o-2024-11-20',
  messages: [
    {
      role: 'user',
      content: `Extract contact info: ${text}`,
    },
  ],
  response_format: zodResponseFormat(ContactSchema, 'contact'),
});

const contact = response.choices[0].message.parsed;
// TypeScript knows the type is z.infer<typeof ContactSchema>

OpenAI’s structured outputs guarantee the output matches your Zod schema. The parsed field is already typed.

Schema Validation on the Output

Even with structured output APIs, runtime validation is worth doing. The model’s output passes the provider’s schema validation, but your application’s logic may have additional constraints that schemas can’t express cleanly.

import { z } from 'zod';

const LineItemSchema = z.object({
  description: z.string().min(1),
  quantity: z.number().int().positive(),
  unit_price: z.number().positive(),
  total: z.number().positive(),
});

const InvoiceSchema = z.object({
  invoice_number: z.string(),
  vendor: z.string(),
  date: z.string().regex(/^\d{4}-\d{2}-\d{2}$/),
  line_items: z.array(LineItemSchema).min(1),
  subtotal: z.number().positive(),
  tax: z.number().nonnegative(),
  total: z.number().positive(),
});

// Validate and get typed result
const parseInvoice = (rawOutput: unknown) => {
  const result = InvoiceSchema.safeParse(rawOutput);
  if (!result.success) {
    console.error('Invoice validation failed:', result.error.flatten());
    return null;
  }
  return result.data;
};

The safeParse approach avoids thrown exceptions and gives you a clean error path when something unexpected comes through.

Python with Pydantic

from pydantic import BaseModel, Field
from typing import Optional
import anthropic

class SentimentResult(BaseModel):
    sentiment: str = Field(description="positive, negative, or neutral")
    confidence: float = Field(ge=0.0, le=1.0)
    key_phrases: list[str] = Field(description="phrases that drove the sentiment")
    summary: str

client = anthropic.Anthropic()

def analyze_sentiment(text: str) -> SentimentResult:
    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=512,
        tools=[
            {
                "name": "report_sentiment",
                "description": "Report sentiment analysis results",
                "input_schema": SentimentResult.model_json_schema(),
            }
        ],
        tool_choice={"type": "tool", "name": "report_sentiment"},
        messages=[{"role": "user", "content": f"Analyze the sentiment: {text}"}],
    )

    tool_use = next(
        block for block in response.content if block.type == "tool_use"
    )

    return SentimentResult.model_validate(tool_use.input)

result = analyze_sentiment("The product works but customer support took 3 days to respond.")
print(result.sentiment)      # negative
print(result.confidence)     # 0.72
print(result.key_phrases)    # ['customer support', '3 days to respond']

model_json_schema() generates a JSON Schema from the Pydantic model, which the tool definition uses. model_validate() validates the model’s output against the Pydantic model with full type checking.

Handling Model Refusals and Partial Outputs

Structured output modes reduce but don’t eliminate the need to handle bad outputs. Models can still refuse to fill in fields when information isn’t present, return null for required fields, or hit token limits mid-structure.

const extractWithRetry = async (text: string, maxAttempts = 2) => {
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    try {
      const result = await extractContactInfo(text);
      
      // Business logic validation the schema can't catch
      if (!result.name || result.name.trim().length === 0) {
        if (attempt < maxAttempts - 1) continue;
        return null;
      }
      
      return result;
    } catch (err) {
      if (attempt === maxAttempts - 1) return null;
    }
  }
  return null;
};

Keep retry counts low. One retry is usually appropriate. More than two suggests your prompt or schema needs work, not more retries.

When to Use Which Approach

Use tool/function calling when you need guaranteed structure, the schema is specific, and you’re building a production feature that will run at volume.

Use JSON mode when you need valid JSON but the structure is flexible, or when you’re prompting an older model that doesn’t support structured outputs well.

Use prompt-only JSON extraction when the schema is very simple (1-2 fields), you’re prototyping, or the model is a small local model where tool calling support is limited.

Use streaming with structured outputs for long-form structured responses where you need to show progress. Most providers support streaming even with schema constraints — the JSON builds incrementally, and you parse when the stream completes.

The investment in structured outputs pays off quickly. The parser errors, the “model responded with markdown instead of JSON” bugs, the undefined-access crashes on missing fields — these stop happening. What remains is prompt work and schema design, which are problems worth solving once rather than debugging indefinitely in production.

Sponsored

Sponsored

Discussion

Join the conversation.

Comments are powered by GitHub Discussions. Sign in with your GitHub account to leave a comment.

Sponsored