Working with JSON in Python: Read, Write, and Transform

ยท 13 min read

Python's json module is part of the standard library, so you don't need to install anything. It handles most JSON tasks perfectly well. For data analysis, pandas has some nice shortcuts too.

Basic Operations

Four functions cover most use cases:

json.loads() and json.dumps()

For working with strings (the "s" stands for string).

import json

# String to Python object
json_string = '{"name": "Alice", "age": 28}'
data = json.loads(json_string)
print(data['name'])  # Alice

# Python object to string
user = {"name": "Bob", "age": 30}
json_string = json.dumps(user)
print(json_string)  # '{"name": "Bob", "age": 30}'

json.load() and json.dump()

For working with files.

import json

# Read from file
with open('data.json') as f:
    data = json.load(f)

# Write to file
with open('output.json', 'w') as f:
    json.dump(data, f)

Quick reference

  • json.loads(string) - parse string to Python
  • json.dumps(obj) - convert Python to string
  • json.load(file) - read file to Python
  • json.dump(obj, file) - write Python to file

Reading JSON Files

Basic file reading

import json

with open('users.json', 'r', encoding='utf-8') as f:
    users = json.load(f)

for user in users:
    print(user['name'])

From a URL

import json
import urllib.request

url = 'https://api.example.com/users'
with urllib.request.urlopen(url) as response:
    data = json.load(response)

# Or with requests library (install with: pip install requests)
import requests

response = requests.get('https://api.example.com/users')
data = response.json()  # Built-in JSON parsing

Handle large files with streaming

For really large files, you might want to use a streaming parser like ijson to avoid loading everything into memory:

# pip install ijson
import ijson

with open('huge_file.json', 'rb') as f:
    for item in ijson.items(f, 'item'):
        process(item)  # Process one item at a time

Writing JSON Files

Basic writing

import json

users = [
    {"name": "Alice", "age": 28},
    {"name": "Bob", "age": 30}
]

with open('users.json', 'w', encoding='utf-8') as f:
    json.dump(users, f)

Pretty print to file

with open('users.json', 'w', encoding='utf-8') as f:
    json.dump(users, f, indent=2)

Handle non-ASCII characters

By default, non-ASCII characters get escaped. To keep them readable:

data = {"city": "Tokyo", "greeting": "Hello"}

# Default: escapes unicode
json.dumps(data)
# '{"city": "Tokyo", "greeting": "Hello"}'

# Keep unicode readable
json.dumps(data, ensure_ascii=False)
# '{"city": "Tokyo", "greeting": "Hello"}'

Formatting Output

Indentation

data = {"user": {"name": "Alice", "settings": {"theme": "dark"}}}

# Compact (default)
json.dumps(data)
# '{"user": {"name": "Alice", "settings": {"theme": "dark"}}}'

# Pretty with 2 spaces
json.dumps(data, indent=2)
# {
#   "user": {
#     "name": "Alice",
#     "settings": {
#       "theme": "dark"
#     }
#   }
# }

# With tabs
json.dumps(data, indent='\t')

Sort keys

data = {"c": 3, "a": 1, "b": 2}

json.dumps(data, sort_keys=True)
# '{"a": 1, "b": 2, "c": 3}'

Compact separators

# Remove spaces for smaller output
json.dumps(data, separators=(',', ':'))
# '{"a":1,"b":2,"c":3}'

JSON with pandas

If you're doing data analysis, pandas makes JSON handling really easy.

Read JSON to DataFrame

import pandas as pd

# From file
df = pd.read_json('users.json')

# From string
json_string = '[{"name": "Alice", "age": 28}, {"name": "Bob", "age": 30}]'
df = pd.read_json(json_string)

# From URL
df = pd.read_json('https://api.example.com/users')

DataFrame to JSON

# To string
json_string = df.to_json(orient='records')

# To file
df.to_json('output.json', orient='records', indent=2)

# Different orientations
df.to_json(orient='records')  # [{"name": "Alice"}, ...]
df.to_json(orient='columns')  # {"name": {"0": "Alice"}, ...}
df.to_json(orient='index')    # {"0": {"name": "Alice"}, ...}

Normalize nested JSON

Flatten nested structures into a table:

nested_data = [
    {"name": "Alice", "address": {"city": "NYC", "zip": "10001"}},
    {"name": "Bob", "address": {"city": "LA", "zip": "90001"}}
]

df = pd.json_normalize(nested_data)
# Columns: name, address.city, address.zip

Error Handling

Catching parse errors

import json

def safe_parse(json_string):
    try:
        return json.loads(json_string)
    except json.JSONDecodeError as e:
        print(f"Invalid JSON: {e}")
        return None

data = safe_parse('{"broken": }')  # Invalid JSON: ...
data = safe_parse('{"valid": true}')  # Returns dict

Handle file not found

def load_json_file(filepath, default=None):
    try:
        with open(filepath) as f:
            return json.load(f)
    except FileNotFoundError:
        return default
    except json.JSONDecodeError:
        return default

config = load_json_file('config.json', {})

Common Patterns

Merge JSON files

import json
from pathlib import Path

def merge_json_files(directory):
    all_data = []
    for json_file in Path(directory).glob('*.json'):
        with open(json_file) as f:
            data = json.load(f)
            if isinstance(data, list):
                all_data.extend(data)
            else:
                all_data.append(data)
    return all_data

Convert CSV to JSON

import csv
import json

with open('data.csv') as csv_file:
    reader = csv.DictReader(csv_file)
    data = list(reader)

with open('data.json', 'w') as json_file:
    json.dump(data, json_file, indent=2)

Custom serialization

Handle types that aren't JSON serializable:

from datetime import datetime
import json

class DateTimeEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime):
            return obj.isoformat()
        return super().default(obj)

data = {"created": datetime.now()}
json.dumps(data, cls=DateTimeEncoder)
# '{"created": "2026-01-28T10:30:00"}'

Advanced Patterns

When building enterprise AI systems for translating product catalogs and processing PIM data, I've learned these advanced JSON handling patterns are essential.

Custom encoders and decoders

In my experience working with international e-commerce data, standard JSON serialization doesn't handle many Python types. Custom encoders solve this.

import json
from decimal import Decimal
from datetime import datetime, date

class ExtendedEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, Decimal):
            return float(obj)
        if isinstance(obj, (datetime, date)):
            return obj.isoformat()
        if isinstance(obj, set):
            return list(obj)
        return super().default(obj)

# Usage
data = {
    "price": Decimal("19.99"),
    "date": datetime.now(),
    "tags": {"new", "sale"}
}
json.dumps(data, cls=ExtendedEncoder)
# '{"price": 19.99, "date": "2026-01-28T10:30:00", "tags": ["new", "sale"]}'

Datetime handling across timezones

When translating content for global markets, timezone handling becomes critical. Here's the pattern I use to avoid timezone bugs.

from datetime import datetime, timezone
import json

def datetime_handler(obj):
    if isinstance(obj, datetime):
        # Always convert to UTC and include timezone
        return obj.astimezone(timezone.utc).isoformat()
    return obj

# Serialize with timezone info
data = {"published": datetime.now(timezone.utc)}
json.dumps(data, default=datetime_handler)
# '{"published": "2026-01-28T10:30:00+00:00"}'

# Parse with timezone awareness
def parse_datetime(dct):
    for k, v in dct.items():
        if isinstance(v, str) and 'T' in v:
            try:
                dct[k] = datetime.fromisoformat(v)
            except ValueError:
                pass
    return dct

json.loads(json_string, object_hook=parse_datetime)

Streaming large JSON files with ijson

When processing multi-gigabyte product feeds for translation, loading the entire file crashes. I use ijson to stream parse one item at a time.

# pip install ijson
import ijson

def process_large_json(filename):
    with open(filename, 'rb') as f:
        # Parse array items one at a time
        for item in ijson.items(f, 'products.item'):
            # Process each product without loading entire file
            process_product(item)

# For deeply nested data
def extract_nested(filename):
    with open(filename, 'rb') as f:
        # Navigate to specific path
        for value in ijson.items(f, 'data.users.item.email'):
            print(value)  # Only extracts email fields

Partial JSON updates

When updating specific fields in large JSON config files without loading the entire structure:

import json

def update_json_field(filename, path, new_value):
    """Update a specific field in a JSON file"""
    with open(filename, 'r') as f:
        data = json.load(f)

    # Navigate to the field using path
    keys = path.split('.')
    current = data
    for key in keys[:-1]:
        current = current[key]

    # Update the field
    current[keys[-1]] = new_value

    # Write back
    with open(filename, 'w') as f:
        json.dump(data, f, indent=2)

# Usage
update_json_field('config.json', 'database.port', 5433)

Performance Comparison

In production systems processing thousands of JSON files daily, I've benchmarked different Python JSON libraries. The standard library is good, but alternatives can be 10x faster.

Standard json module

Built-in, reliable, but not the fastest. Good for most use cases.

import json
import time

data = {"users": [{"id": i, "name": f"User{i}"} for i in range(10000)]}

start = time.time()
json_str = json.dumps(data)
parsed = json.loads(json_str)
print(f"json module: {time.time() - start:.3f}s")

ujson (UltraJSON)

2-3x faster than standard json. I use this in API endpoints where milliseconds matter.

# pip install ujson
import ujson

start = time.time()
json_str = ujson.dumps(data)
parsed = ujson.loads(json_str)
print(f"ujson: {time.time() - start:.3f}s")
# Typically 2-3x faster than json module

orjson (Fastest)

When building AI translation pipelines that process gigabytes of product data, orjson's 5-10x speed improvement is game-changing. It's written in Rust.

# pip install orjson
import orjson

start = time.time()
json_bytes = orjson.dumps(data)  # Returns bytes, not str
parsed = orjson.loads(json_bytes)
print(f"orjson: {time.time() - start:.3f}s")
# Typically 5-10x faster than json module

# orjson has nice defaults:
# - Handles datetime automatically
# - Handles UUID automatically
# - Handles dataclasses automatically
from datetime import datetime
data = {"created": datetime.now()}
orjson.dumps(data)  # Just works, no custom encoder needed

When to use which

  • json - Default choice, always available, good enough for most cases
  • ujson - When you need better performance but want a drop-in replacement
  • orjson - When performance is critical (APIs, data processing pipelines, large files)
  • ijson - When files are too large to fit in memory (streaming parser)

In my translation systems, I use orjson for API responses and ijson for processing multi-gigabyte product feeds. The performance gain is worth the extra dependency.