JSON Schema Validation: Complete Guide with Examples
When building AI translation APIs for enterprise systems, schema validation saved me countless debugging hours. Instead of discovering data issues in production, invalid data gets rejected at the API boundary with clear error messages.
This guide shares everything I've learned about JSON Schema validation from building APIs that process thousands of requests. You'll learn to write schemas, validate data in multiple languages, and enforce API contracts that prevent bad data from entering your system.
What is JSON Schema and Why Validate
JSON Schema is a vocabulary for annotating and validating JSON documents. Think of it as a contract that describes what valid JSON should look like.
Why I use schema validation
When building content translation workflows for PIM systems in Copenhagen, I process JSON from multiple sources: user uploads, third-party APIs, and automated scripts. Without validation, invalid data crashes downstream systems.
Schema validation provides:
- Early error detection: Catch issues at the API boundary, not in production
- Clear error messages: "email must be a valid email format" vs "500 Internal Server Error"
- API documentation: Schemas document expected data structure
- Security: Reject unexpected fields that could exploit vulnerabilities
- Type safety: Ensure string fields contain strings, not objects
Real-world impact
In one e-commerce project, implementing schema validation reduced production errors by 67%. Invalid product data was rejected with clear messages instead of corrupting the database.
// Before validation - this got into our database
{
"productId": ["PROD123"], // Should be string, not array
"price": "29.99", // Should be number, not string
"stock": null // Should be integer
}
// After validation - rejected with clear errors
{
"errors": [
"productId must be string",
"price must be number",
"stock must be integer"
]
}
Writing Your First Schema
Let's start with a simple product schema based on real e-commerce data.
Basic structure
Every JSON Schema starts with these properties:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://utilitiz.com/schemas/product.json",
"title": "Product",
"description": "E-commerce product schema",
"type": "object"
}
Adding properties
Define the fields your JSON should have:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"id": {
"type": "string",
"description": "Unique product identifier"
},
"name": {
"type": "string",
"description": "Product name"
},
"price": {
"type": "number",
"description": "Price in USD",
"minimum": 0
},
"inStock": {
"type": "boolean",
"description": "Whether product is in stock"
}
}
}
Required fields
Specify which fields must be present:
{
"type": "object",
"properties": {
"id": { "type": "string" },
"name": { "type": "string" },
"price": { "type": "number" }
},
"required": ["id", "name", "price"]
}
Preventing additional properties
When building secure APIs, I always set additionalProperties to false. This prevents attackers from injecting unexpected fields:
{
"type": "object",
"properties": {
"id": { "type": "string" },
"name": { "type": "string" }
},
"required": ["id", "name"],
"additionalProperties": false
}
This rejects data like:
{
"id": "PROD123",
"name": "Widget",
"__proto__": { "isAdmin": true } // Rejected!
}
Common Validation Patterns
Here are the validation patterns I use most frequently in production APIs. These come from building translation systems and e-commerce platforms.
String constraints
{
"username": {
"type": "string",
"minLength": 3,
"maxLength": 20,
"pattern": "^[a-zA-Z0-9_-]+$"
},
"email": {
"type": "string",
"format": "email"
},
"url": {
"type": "string",
"format": "uri"
},
"description": {
"type": "string",
"maxLength": 500
}
}
Number constraints
{
"price": {
"type": "number",
"minimum": 0,
"maximum": 10000,
"multipleOf": 0.01
},
"quantity": {
"type": "integer",
"minimum": 0
},
"rating": {
"type": "number",
"minimum": 1,
"maximum": 5,
"exclusiveMaximum": false
}
}
Enums for fixed values
When translating product data across languages, I use enums to ensure category values match expected options:
{
"status": {
"type": "string",
"enum": ["draft", "published", "archived"]
},
"language": {
"type": "string",
"enum": ["en", "fr", "de", "es", "it", "pt", "nl", "pl", "sv"]
},
"category": {
"type": "string",
"enum": ["electronics", "clothing", "books", "home"]
}
}
Arrays with specific items
{
"tags": {
"type": "array",
"items": {
"type": "string"
},
"minItems": 1,
"maxItems": 10,
"uniqueItems": true
},
"images": {
"type": "array",
"items": {
"type": "string",
"format": "uri"
},
"maxItems": 5
}
}
Nested objects
For complex product data in PIM systems:
{
"product": {
"type": "object",
"properties": {
"id": { "type": "string" },
"name": { "type": "string" },
"specifications": {
"type": "object",
"properties": {
"weight": { "type": "number", "minimum": 0 },
"dimensions": {
"type": "object",
"properties": {
"length": { "type": "number" },
"width": { "type": "number" },
"height": { "type": "number" }
},
"required": ["length", "width", "height"]
}
}
}
}
}
}
Conditional validation
Validate based on other field values:
{
"type": "object",
"properties": {
"shippingMethod": {
"type": "string",
"enum": ["standard", "express"]
},
"deliveryDate": {
"type": "string",
"format": "date"
}
},
"if": {
"properties": {
"shippingMethod": { "const": "express" }
}
},
"then": {
"required": ["deliveryDate"]
}
}
Schema Validation in JavaScript
When building Node.js APIs for AI translation systems, I use Ajv for schema validation. It's the fastest and most standards-compliant validator.
Setting up Ajv
npm install ajv ajv-formats
Basic validation
const Ajv = require('ajv');
const addFormats = require('ajv-formats');
const ajv = new Ajv({ allErrors: true });
addFormats(ajv); // Add format validators (email, uri, date, etc.)
const schema = {
type: 'object',
properties: {
username: { type: 'string', minLength: 3 },
email: { type: 'string', format: 'email' },
age: { type: 'integer', minimum: 0, maximum: 120 }
},
required: ['username', 'email'],
additionalProperties: false
};
const validate = ajv.compile(schema);
// Valid data
const validData = {
username: 'cedric',
email: 'cedric@utilitiz.com',
age: 35
};
if (validate(validData)) {
console.log('Valid data');
} else {
console.log('Errors:', validate.errors);
}
Express middleware
I use this pattern in all production APIs:
const express = require('express');
const Ajv = require('ajv');
const addFormats = require('ajv-formats');
const app = express();
app.use(express.json());
const ajv = new Ajv({ allErrors: true });
addFormats(ajv);
// Validation middleware factory
function validateSchema(schema) {
const validate = ajv.compile(schema);
return (req, res, next) => {
if (!validate(req.body)) {
return res.status(400).json({
error: 'Validation failed',
details: validate.errors.map(err => ({
field: err.instancePath || err.params.missingProperty,
message: err.message
}))
});
}
next();
};
}
// Product creation schema
const createProductSchema = {
type: 'object',
properties: {
name: { type: 'string', minLength: 1, maxLength: 200 },
price: { type: 'number', minimum: 0 },
category: {
type: 'string',
enum: ['electronics', 'clothing', 'books']
}
},
required: ['name', 'price', 'category'],
additionalProperties: false
};
// Apply validation middleware
app.post('/api/products', validateSchema(createProductSchema), (req, res) => {
// req.body is guaranteed to be valid
const product = createProduct(req.body);
res.status(201).json(product);
});
app.listen(3000);
Custom error messages
Make errors more user-friendly:
function formatValidationErrors(errors) {
return errors.map(err => {
const field = err.instancePath.replace('/', '') || err.params.missingProperty;
const messages = {
'type': `${field} must be ${err.params.type}`,
'minimum': `${field} must be at least ${err.params.limit}`,
'maximum': `${field} must be at most ${err.params.limit}`,
'minLength': `${field} must be at least ${err.params.limit} characters`,
'maxLength': `${field} must be at most ${err.params.limit} characters`,
'pattern': `${field} format is invalid`,
'format': `${field} must be a valid ${err.params.format}`,
'enum': `${field} must be one of: ${err.params.allowedValues.join(', ')}`,
'required': `${field} is required`,
'additionalProperties': `${err.params.additionalProperty} is not allowed`
};
return messages[err.keyword] || err.message;
});
}
Schema Validation in Python
For Python-based translation scripts and data processing pipelines, I use the jsonschema library.
Installation
pip install jsonschema
Basic validation
import jsonschema
from jsonschema import validate, ValidationError
schema = {
"type": "object",
"properties": {
"product_id": {"type": "string"},
"name": {"type": "string", "minLength": 1},
"price": {"type": "number", "minimum": 0},
"in_stock": {"type": "boolean"}
},
"required": ["product_id", "name", "price"],
"additionalProperties": False
}
# Valid data
data = {
"product_id": "PROD123",
"name": "Wireless Mouse",
"price": 29.99,
"in_stock": True
}
try:
validate(instance=data, schema=schema)
print("Valid data")
except ValidationError as e:
print(f"Validation error: {e.message}")
Flask API validation
Validation decorator for Flask endpoints:
from flask import Flask, request, jsonify
from jsonschema import validate, ValidationError
from functools import wraps
app = Flask(__name__)
def validate_json(schema):
def decorator(f):
@wraps(f)
def wrapper(*args, **kwargs):
try:
validate(instance=request.json, schema=schema)
except ValidationError as e:
return jsonify({
'error': 'Validation failed',
'message': e.message,
'field': '.'.join(str(p) for p in e.path)
}), 400
return f(*args, **kwargs)
return wrapper
return decorator
# Product schema
product_schema = {
"type": "object",
"properties": {
"name": {"type": "string", "minLength": 1, "maxLength": 200},
"price": {"type": "number", "minimum": 0},
"category": {
"type": "string",
"enum": ["electronics", "clothing", "books"]
}
},
"required": ["name", "price", "category"],
"additionalProperties": False
}
@app.route('/api/products', methods=['POST'])
@validate_json(product_schema)
def create_product():
# request.json is guaranteed to be valid
product = save_product(request.json)
return jsonify(product), 201
if __name__ == '__main__':
app.run(debug=True)
Batch validation
When processing translated product catalogs:
import json
from jsonschema import validate, ValidationError
def validate_products(products, schema):
errors = []
for idx, product in enumerate(products):
try:
validate(instance=product, schema=schema)
except ValidationError as e:
errors.append({
'index': idx,
'product_id': product.get('id', 'unknown'),
'error': e.message,
'field': '.'.join(str(p) for p in e.path)
})
return errors
# Load translated products
with open('products_translated.json') as f:
products = json.load(f)
errors = validate_products(products, product_schema)
if errors:
print(f"Found {len(errors)} validation errors:")
for error in errors:
print(f"Product {error['product_id']}: {error['error']}")
else:
print("All products valid")
API Contract Enforcement with Schemas
When building microservices for enterprise AI systems, JSON Schema serves as the contract between services. Both sides validate against the same schema.
Request and response schemas
I define schemas for both directions:
// Translation request schema
{
"type": "object",
"properties": {
"texts": {
"type": "array",
"items": { "type": "string" },
"minItems": 1,
"maxItems": 100
},
"targetLanguage": {
"type": "string",
"enum": ["en", "fr", "de", "es"]
},
"sourceLanguage": {
"type": "string",
"enum": ["en", "fr", "de", "es"]
}
},
"required": ["texts", "targetLanguage"]
}
// Translation response schema
{
"type": "object",
"properties": {
"translations": {
"type": "array",
"items": { "type": "string" }
},
"detectedLanguage": { "type": "string" },
"confidence": { "type": "number", "minimum": 0, "maximum": 1 }
},
"required": ["translations"]
}
Schema registry pattern
For large systems, centralize schemas:
// schemas/index.js
const schemas = {
createProduct: require('./create-product.json'),
updateProduct: require('./update-product.json'),
createUser: require('./create-user.json'),
translateRequest: require('./translate-request.json')
};
// Compile all schemas
const Ajv = require('ajv');
const ajv = new Ajv();
const validators = {};
Object.keys(schemas).forEach(name => {
validators[name] = ajv.compile(schemas[name]);
});
module.exports = validators;
Contract testing
Test that APIs comply with schemas:
const request = require('supertest');
const validators = require('./schemas');
const app = require('./app');
describe('Product API Contract', () => {
test('POST /api/products validates schema', async () => {
const validProduct = {
name: 'Test Product',
price: 29.99,
category: 'electronics'
};
const response = await request(app)
.post('/api/products')
.send(validProduct)
.expect(201);
// Validate response matches schema
expect(validators.productResponse(response.body)).toBe(true);
});
test('POST /api/products rejects invalid data', async () => {
const invalidProduct = {
name: 'Test',
price: -10, // Invalid: negative price
category: 'invalid-category'
};
await request(app)
.post('/api/products')
.send(invalidProduct)
.expect(400);
});
});
Tools and Libraries Comparison
After building translation APIs with various validators, here's what I've learned about the ecosystem.
JavaScript validators
Ajv (Another JSON Schema Validator)
- Fastest validator (10-100x faster than alternatives)
- Full JSON Schema Draft 2020-12 support
- TypeScript types available
- My choice for production Node.js APIs
npm install ajv ajv-formats
joi
- More expressive API than JSON Schema
- Better for JavaScript-first projects
- Excellent error messages
- Slightly slower than Ajv
const Joi = require('joi');
const schema = Joi.object({
username: Joi.string().min(3).max(20).required(),
email: Joi.string().email().required(),
age: Joi.number().integer().min(0).max(120)
});
zod
- TypeScript-first with automatic type inference
- Growing popularity in modern projects
- Great developer experience
import { z } from 'zod';
const UserSchema = z.object({
username: z.string().min(3).max(20),
email: z.string().email(),
age: z.number().int().min(0).max(120)
});
Python validators
jsonschema
- Standard JSON Schema library for Python
- Full specification support
- My default choice for Python projects
pydantic
- Data validation using Python type hints
- Excellent FastAPI integration
- Automatic JSON Schema generation
from pydantic import BaseModel, EmailStr, Field
class User(BaseModel):
username: str = Field(min_length=3, max_length=20)
email: EmailStr
age: int = Field(ge=0, le=120)
# Automatic validation
user = User(username="cedric", email="cedric@utilitiz.com", age=35)
Performance considerations
Benchmark from validating 10,000 product objects:
- Ajv: 25ms (fastest)
- joi: 180ms
- jsonschema (Python): 320ms
- pydantic: 145ms
For high-throughput APIs processing thousands of requests, Ajv's performance advantage matters. For most applications, any validator is fast enough.
Schema Versioning Strategies
When building APIs that evolve over time, schema versioning prevents breaking changes. Here's how I handle it in production.
Semantic versioning for schemas
schemas/
product/
v1.0.0.json # Initial schema
v1.1.0.json # Added optional fields (backward compatible)
v2.0.0.json # Breaking changes
Backward-compatible changes
Safe changes that don't break existing clients:
- Adding optional fields
- Removing required fields
- Making validation less strict (removing patterns, increasing max values)
- Adding enum values
Breaking changes
Changes that require major version bump:
- Removing fields
- Making optional fields required
- Changing field types
- Making validation stricter
- Removing enum values
Multi-version support
Support multiple schema versions simultaneously:
const schemas = {
'v1': require('./schemas/product/v1.0.0.json'),
'v2': require('./schemas/product/v2.0.0.json')
};
const validators = {};
Object.keys(schemas).forEach(version => {
validators[version] = ajv.compile(schemas[version]);
});
app.post('/api/v1/products', validateVersion('v1'), createProduct);
app.post('/api/v2/products', validateVersion('v2'), createProduct);
function validateVersion(version) {
return (req, res, next) => {
if (!validators[version](req.body)) {
return res.status(400).json({
error: 'Validation failed',
version,
details: validators[version].errors
});
}
next();
};
}
Migration paths
Help clients migrate between versions:
function migrateV1toV2(v1Data) {
return {
...v1Data,
// v2 renamed 'category' to 'categoryId'
categoryId: v1Data.category,
// v2 added required field
createdAt: new Date().toISOString()
};
}
// Accept v1 data but store as v2
app.post('/api/v1/products', (req, res) => {
const v2Data = migrateV1toV2(req.body);
if (validators.v2(v2Data)) {
const product = createProduct(v2Data);
res.status(201).json(product);
}
});
Schema validation transformed how I build APIs. When building AI translation systems across Copenhagen-based teams, schemas served as the single source of truth for data structure. Debugging time dropped dramatically because invalid data never made it past the API boundary.