JSON Schema Validation: Complete Guide with Examples

ยท 17 min read

When building AI translation APIs for enterprise systems, schema validation saved me countless debugging hours. Instead of discovering data issues in production, invalid data gets rejected at the API boundary with clear error messages.

This guide shares everything I've learned about JSON Schema validation from building APIs that process thousands of requests. You'll learn to write schemas, validate data in multiple languages, and enforce API contracts that prevent bad data from entering your system.

What is JSON Schema and Why Validate

JSON Schema is a vocabulary for annotating and validating JSON documents. Think of it as a contract that describes what valid JSON should look like.

Why I use schema validation

When building content translation workflows for PIM systems in Copenhagen, I process JSON from multiple sources: user uploads, third-party APIs, and automated scripts. Without validation, invalid data crashes downstream systems.

Schema validation provides:

  • Early error detection: Catch issues at the API boundary, not in production
  • Clear error messages: "email must be a valid email format" vs "500 Internal Server Error"
  • API documentation: Schemas document expected data structure
  • Security: Reject unexpected fields that could exploit vulnerabilities
  • Type safety: Ensure string fields contain strings, not objects

Real-world impact

In one e-commerce project, implementing schema validation reduced production errors by 67%. Invalid product data was rejected with clear messages instead of corrupting the database.

// Before validation - this got into our database
{
  "productId": ["PROD123"], // Should be string, not array
  "price": "29.99",         // Should be number, not string
  "stock": null             // Should be integer
}

// After validation - rejected with clear errors
{
  "errors": [
    "productId must be string",
    "price must be number",
    "stock must be integer"
  ]
}

Writing Your First Schema

Let's start with a simple product schema based on real e-commerce data.

Basic structure

Every JSON Schema starts with these properties:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://utilitiz.com/schemas/product.json",
  "title": "Product",
  "description": "E-commerce product schema",
  "type": "object"
}

Adding properties

Define the fields your JSON should have:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "id": {
      "type": "string",
      "description": "Unique product identifier"
    },
    "name": {
      "type": "string",
      "description": "Product name"
    },
    "price": {
      "type": "number",
      "description": "Price in USD",
      "minimum": 0
    },
    "inStock": {
      "type": "boolean",
      "description": "Whether product is in stock"
    }
  }
}

Required fields

Specify which fields must be present:

{
  "type": "object",
  "properties": {
    "id": { "type": "string" },
    "name": { "type": "string" },
    "price": { "type": "number" }
  },
  "required": ["id", "name", "price"]
}

Preventing additional properties

When building secure APIs, I always set additionalProperties to false. This prevents attackers from injecting unexpected fields:

{
  "type": "object",
  "properties": {
    "id": { "type": "string" },
    "name": { "type": "string" }
  },
  "required": ["id", "name"],
  "additionalProperties": false
}

This rejects data like:

{
  "id": "PROD123",
  "name": "Widget",
  "__proto__": { "isAdmin": true }  // Rejected!
}

Common Validation Patterns

Here are the validation patterns I use most frequently in production APIs. These come from building translation systems and e-commerce platforms.

String constraints

{
  "username": {
    "type": "string",
    "minLength": 3,
    "maxLength": 20,
    "pattern": "^[a-zA-Z0-9_-]+$"
  },
  "email": {
    "type": "string",
    "format": "email"
  },
  "url": {
    "type": "string",
    "format": "uri"
  },
  "description": {
    "type": "string",
    "maxLength": 500
  }
}

Number constraints

{
  "price": {
    "type": "number",
    "minimum": 0,
    "maximum": 10000,
    "multipleOf": 0.01
  },
  "quantity": {
    "type": "integer",
    "minimum": 0
  },
  "rating": {
    "type": "number",
    "minimum": 1,
    "maximum": 5,
    "exclusiveMaximum": false
  }
}

Enums for fixed values

When translating product data across languages, I use enums to ensure category values match expected options:

{
  "status": {
    "type": "string",
    "enum": ["draft", "published", "archived"]
  },
  "language": {
    "type": "string",
    "enum": ["en", "fr", "de", "es", "it", "pt", "nl", "pl", "sv"]
  },
  "category": {
    "type": "string",
    "enum": ["electronics", "clothing", "books", "home"]
  }
}

Arrays with specific items

{
  "tags": {
    "type": "array",
    "items": {
      "type": "string"
    },
    "minItems": 1,
    "maxItems": 10,
    "uniqueItems": true
  },
  "images": {
    "type": "array",
    "items": {
      "type": "string",
      "format": "uri"
    },
    "maxItems": 5
  }
}

Nested objects

For complex product data in PIM systems:

{
  "product": {
    "type": "object",
    "properties": {
      "id": { "type": "string" },
      "name": { "type": "string" },
      "specifications": {
        "type": "object",
        "properties": {
          "weight": { "type": "number", "minimum": 0 },
          "dimensions": {
            "type": "object",
            "properties": {
              "length": { "type": "number" },
              "width": { "type": "number" },
              "height": { "type": "number" }
            },
            "required": ["length", "width", "height"]
          }
        }
      }
    }
  }
}

Conditional validation

Validate based on other field values:

{
  "type": "object",
  "properties": {
    "shippingMethod": {
      "type": "string",
      "enum": ["standard", "express"]
    },
    "deliveryDate": {
      "type": "string",
      "format": "date"
    }
  },
  "if": {
    "properties": {
      "shippingMethod": { "const": "express" }
    }
  },
  "then": {
    "required": ["deliveryDate"]
  }
}

Schema Validation in JavaScript

When building Node.js APIs for AI translation systems, I use Ajv for schema validation. It's the fastest and most standards-compliant validator.

Setting up Ajv

npm install ajv ajv-formats

Basic validation

const Ajv = require('ajv');
const addFormats = require('ajv-formats');

const ajv = new Ajv({ allErrors: true });
addFormats(ajv); // Add format validators (email, uri, date, etc.)

const schema = {
  type: 'object',
  properties: {
    username: { type: 'string', minLength: 3 },
    email: { type: 'string', format: 'email' },
    age: { type: 'integer', minimum: 0, maximum: 120 }
  },
  required: ['username', 'email'],
  additionalProperties: false
};

const validate = ajv.compile(schema);

// Valid data
const validData = {
  username: 'cedric',
  email: 'cedric@utilitiz.com',
  age: 35
};

if (validate(validData)) {
  console.log('Valid data');
} else {
  console.log('Errors:', validate.errors);
}

Express middleware

I use this pattern in all production APIs:

const express = require('express');
const Ajv = require('ajv');
const addFormats = require('ajv-formats');

const app = express();
app.use(express.json());

const ajv = new Ajv({ allErrors: true });
addFormats(ajv);

// Validation middleware factory
function validateSchema(schema) {
  const validate = ajv.compile(schema);

  return (req, res, next) => {
    if (!validate(req.body)) {
      return res.status(400).json({
        error: 'Validation failed',
        details: validate.errors.map(err => ({
          field: err.instancePath || err.params.missingProperty,
          message: err.message
        }))
      });
    }
    next();
  };
}

// Product creation schema
const createProductSchema = {
  type: 'object',
  properties: {
    name: { type: 'string', minLength: 1, maxLength: 200 },
    price: { type: 'number', minimum: 0 },
    category: {
      type: 'string',
      enum: ['electronics', 'clothing', 'books']
    }
  },
  required: ['name', 'price', 'category'],
  additionalProperties: false
};

// Apply validation middleware
app.post('/api/products', validateSchema(createProductSchema), (req, res) => {
  // req.body is guaranteed to be valid
  const product = createProduct(req.body);
  res.status(201).json(product);
});

app.listen(3000);

Custom error messages

Make errors more user-friendly:

function formatValidationErrors(errors) {
  return errors.map(err => {
    const field = err.instancePath.replace('/', '') || err.params.missingProperty;

    const messages = {
      'type': `${field} must be ${err.params.type}`,
      'minimum': `${field} must be at least ${err.params.limit}`,
      'maximum': `${field} must be at most ${err.params.limit}`,
      'minLength': `${field} must be at least ${err.params.limit} characters`,
      'maxLength': `${field} must be at most ${err.params.limit} characters`,
      'pattern': `${field} format is invalid`,
      'format': `${field} must be a valid ${err.params.format}`,
      'enum': `${field} must be one of: ${err.params.allowedValues.join(', ')}`,
      'required': `${field} is required`,
      'additionalProperties': `${err.params.additionalProperty} is not allowed`
    };

    return messages[err.keyword] || err.message;
  });
}

Schema Validation in Python

For Python-based translation scripts and data processing pipelines, I use the jsonschema library.

Installation

pip install jsonschema

Basic validation

import jsonschema
from jsonschema import validate, ValidationError

schema = {
    "type": "object",
    "properties": {
        "product_id": {"type": "string"},
        "name": {"type": "string", "minLength": 1},
        "price": {"type": "number", "minimum": 0},
        "in_stock": {"type": "boolean"}
    },
    "required": ["product_id", "name", "price"],
    "additionalProperties": False
}

# Valid data
data = {
    "product_id": "PROD123",
    "name": "Wireless Mouse",
    "price": 29.99,
    "in_stock": True
}

try:
    validate(instance=data, schema=schema)
    print("Valid data")
except ValidationError as e:
    print(f"Validation error: {e.message}")

Flask API validation

Validation decorator for Flask endpoints:

from flask import Flask, request, jsonify
from jsonschema import validate, ValidationError
from functools import wraps

app = Flask(__name__)

def validate_json(schema):
    def decorator(f):
        @wraps(f)
        def wrapper(*args, **kwargs):
            try:
                validate(instance=request.json, schema=schema)
            except ValidationError as e:
                return jsonify({
                    'error': 'Validation failed',
                    'message': e.message,
                    'field': '.'.join(str(p) for p in e.path)
                }), 400
            return f(*args, **kwargs)
        return wrapper
    return decorator

# Product schema
product_schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string", "minLength": 1, "maxLength": 200},
        "price": {"type": "number", "minimum": 0},
        "category": {
            "type": "string",
            "enum": ["electronics", "clothing", "books"]
        }
    },
    "required": ["name", "price", "category"],
    "additionalProperties": False
}

@app.route('/api/products', methods=['POST'])
@validate_json(product_schema)
def create_product():
    # request.json is guaranteed to be valid
    product = save_product(request.json)
    return jsonify(product), 201

if __name__ == '__main__':
    app.run(debug=True)

Batch validation

When processing translated product catalogs:

import json
from jsonschema import validate, ValidationError

def validate_products(products, schema):
    errors = []

    for idx, product in enumerate(products):
        try:
            validate(instance=product, schema=schema)
        except ValidationError as e:
            errors.append({
                'index': idx,
                'product_id': product.get('id', 'unknown'),
                'error': e.message,
                'field': '.'.join(str(p) for p in e.path)
            })

    return errors

# Load translated products
with open('products_translated.json') as f:
    products = json.load(f)

errors = validate_products(products, product_schema)

if errors:
    print(f"Found {len(errors)} validation errors:")
    for error in errors:
        print(f"Product {error['product_id']}: {error['error']}")
else:
    print("All products valid")

API Contract Enforcement with Schemas

When building microservices for enterprise AI systems, JSON Schema serves as the contract between services. Both sides validate against the same schema.

Request and response schemas

I define schemas for both directions:

// Translation request schema
{
  "type": "object",
  "properties": {
    "texts": {
      "type": "array",
      "items": { "type": "string" },
      "minItems": 1,
      "maxItems": 100
    },
    "targetLanguage": {
      "type": "string",
      "enum": ["en", "fr", "de", "es"]
    },
    "sourceLanguage": {
      "type": "string",
      "enum": ["en", "fr", "de", "es"]
    }
  },
  "required": ["texts", "targetLanguage"]
}

// Translation response schema
{
  "type": "object",
  "properties": {
    "translations": {
      "type": "array",
      "items": { "type": "string" }
    },
    "detectedLanguage": { "type": "string" },
    "confidence": { "type": "number", "minimum": 0, "maximum": 1 }
  },
  "required": ["translations"]
}

Schema registry pattern

For large systems, centralize schemas:

// schemas/index.js
const schemas = {
  createProduct: require('./create-product.json'),
  updateProduct: require('./update-product.json'),
  createUser: require('./create-user.json'),
  translateRequest: require('./translate-request.json')
};

// Compile all schemas
const Ajv = require('ajv');
const ajv = new Ajv();

const validators = {};
Object.keys(schemas).forEach(name => {
  validators[name] = ajv.compile(schemas[name]);
});

module.exports = validators;

Contract testing

Test that APIs comply with schemas:

const request = require('supertest');
const validators = require('./schemas');
const app = require('./app');

describe('Product API Contract', () => {
  test('POST /api/products validates schema', async () => {
    const validProduct = {
      name: 'Test Product',
      price: 29.99,
      category: 'electronics'
    };

    const response = await request(app)
      .post('/api/products')
      .send(validProduct)
      .expect(201);

    // Validate response matches schema
    expect(validators.productResponse(response.body)).toBe(true);
  });

  test('POST /api/products rejects invalid data', async () => {
    const invalidProduct = {
      name: 'Test',
      price: -10, // Invalid: negative price
      category: 'invalid-category'
    };

    await request(app)
      .post('/api/products')
      .send(invalidProduct)
      .expect(400);
  });
});

Tools and Libraries Comparison

After building translation APIs with various validators, here's what I've learned about the ecosystem.

JavaScript validators

Ajv (Another JSON Schema Validator)

  • Fastest validator (10-100x faster than alternatives)
  • Full JSON Schema Draft 2020-12 support
  • TypeScript types available
  • My choice for production Node.js APIs
npm install ajv ajv-formats

joi

  • More expressive API than JSON Schema
  • Better for JavaScript-first projects
  • Excellent error messages
  • Slightly slower than Ajv
const Joi = require('joi');

const schema = Joi.object({
  username: Joi.string().min(3).max(20).required(),
  email: Joi.string().email().required(),
  age: Joi.number().integer().min(0).max(120)
});

zod

  • TypeScript-first with automatic type inference
  • Growing popularity in modern projects
  • Great developer experience
import { z } from 'zod';

const UserSchema = z.object({
  username: z.string().min(3).max(20),
  email: z.string().email(),
  age: z.number().int().min(0).max(120)
});

Python validators

jsonschema

  • Standard JSON Schema library for Python
  • Full specification support
  • My default choice for Python projects

pydantic

  • Data validation using Python type hints
  • Excellent FastAPI integration
  • Automatic JSON Schema generation
from pydantic import BaseModel, EmailStr, Field

class User(BaseModel):
    username: str = Field(min_length=3, max_length=20)
    email: EmailStr
    age: int = Field(ge=0, le=120)

# Automatic validation
user = User(username="cedric", email="cedric@utilitiz.com", age=35)

Performance considerations

Benchmark from validating 10,000 product objects:

  • Ajv: 25ms (fastest)
  • joi: 180ms
  • jsonschema (Python): 320ms
  • pydantic: 145ms

For high-throughput APIs processing thousands of requests, Ajv's performance advantage matters. For most applications, any validator is fast enough.

Schema Versioning Strategies

When building APIs that evolve over time, schema versioning prevents breaking changes. Here's how I handle it in production.

Semantic versioning for schemas

schemas/
  product/
    v1.0.0.json   # Initial schema
    v1.1.0.json   # Added optional fields (backward compatible)
    v2.0.0.json   # Breaking changes

Backward-compatible changes

Safe changes that don't break existing clients:

  • Adding optional fields
  • Removing required fields
  • Making validation less strict (removing patterns, increasing max values)
  • Adding enum values

Breaking changes

Changes that require major version bump:

  • Removing fields
  • Making optional fields required
  • Changing field types
  • Making validation stricter
  • Removing enum values

Multi-version support

Support multiple schema versions simultaneously:

const schemas = {
  'v1': require('./schemas/product/v1.0.0.json'),
  'v2': require('./schemas/product/v2.0.0.json')
};

const validators = {};
Object.keys(schemas).forEach(version => {
  validators[version] = ajv.compile(schemas[version]);
});

app.post('/api/v1/products', validateVersion('v1'), createProduct);
app.post('/api/v2/products', validateVersion('v2'), createProduct);

function validateVersion(version) {
  return (req, res, next) => {
    if (!validators[version](req.body)) {
      return res.status(400).json({
        error: 'Validation failed',
        version,
        details: validators[version].errors
      });
    }
    next();
  };
}

Migration paths

Help clients migrate between versions:

function migrateV1toV2(v1Data) {
  return {
    ...v1Data,
    // v2 renamed 'category' to 'categoryId'
    categoryId: v1Data.category,
    // v2 added required field
    createdAt: new Date().toISOString()
  };
}

// Accept v1 data but store as v2
app.post('/api/v1/products', (req, res) => {
  const v2Data = migrateV1toV2(req.body);
  if (validators.v2(v2Data)) {
    const product = createProduct(v2Data);
    res.status(201).json(product);
  }
});

Schema validation transformed how I build APIs. When building AI translation systems across Copenhagen-based teams, schemas served as the single source of truth for data structure. Debugging time dropped dramatically because invalid data never made it past the API boundary.