Working with JSON in Python: Read, Write, and Transform
Python's json module is part of the standard library, so you don't need to install anything. It handles most JSON tasks perfectly well. For data analysis, pandas has some nice shortcuts too.
Basic Operations
Four functions cover most use cases:
json.loads() and json.dumps()
For working with strings (the "s" stands for string).
import json
# String to Python object
json_string = '{"name": "Alice", "age": 28}'
data = json.loads(json_string)
print(data['name']) # Alice
# Python object to string
user = {"name": "Bob", "age": 30}
json_string = json.dumps(user)
print(json_string) # '{"name": "Bob", "age": 30}'
json.load() and json.dump()
For working with files.
import json
# Read from file
with open('data.json') as f:
data = json.load(f)
# Write to file
with open('output.json', 'w') as f:
json.dump(data, f)
Quick reference
json.loads(string)- parse string to Pythonjson.dumps(obj)- convert Python to stringjson.load(file)- read file to Pythonjson.dump(obj, file)- write Python to file
Reading JSON Files
Basic file reading
import json
with open('users.json', 'r', encoding='utf-8') as f:
users = json.load(f)
for user in users:
print(user['name'])
From a URL
import json
import urllib.request
url = 'https://api.example.com/users'
with urllib.request.urlopen(url) as response:
data = json.load(response)
# Or with requests library (install with: pip install requests)
import requests
response = requests.get('https://api.example.com/users')
data = response.json() # Built-in JSON parsing
Handle large files with streaming
For really large files, you might want to use a streaming parser like ijson to avoid loading everything into memory:
# pip install ijson
import ijson
with open('huge_file.json', 'rb') as f:
for item in ijson.items(f, 'item'):
process(item) # Process one item at a time
Writing JSON Files
Basic writing
import json
users = [
{"name": "Alice", "age": 28},
{"name": "Bob", "age": 30}
]
with open('users.json', 'w', encoding='utf-8') as f:
json.dump(users, f)
Pretty print to file
with open('users.json', 'w', encoding='utf-8') as f:
json.dump(users, f, indent=2)
Handle non-ASCII characters
By default, non-ASCII characters get escaped. To keep them readable:
data = {"city": "Tokyo", "greeting": "Hello"}
# Default: escapes unicode
json.dumps(data)
# '{"city": "Tokyo", "greeting": "Hello"}'
# Keep unicode readable
json.dumps(data, ensure_ascii=False)
# '{"city": "Tokyo", "greeting": "Hello"}'
Formatting Output
Indentation
data = {"user": {"name": "Alice", "settings": {"theme": "dark"}}}
# Compact (default)
json.dumps(data)
# '{"user": {"name": "Alice", "settings": {"theme": "dark"}}}'
# Pretty with 2 spaces
json.dumps(data, indent=2)
# {
# "user": {
# "name": "Alice",
# "settings": {
# "theme": "dark"
# }
# }
# }
# With tabs
json.dumps(data, indent='\t')
Sort keys
data = {"c": 3, "a": 1, "b": 2}
json.dumps(data, sort_keys=True)
# '{"a": 1, "b": 2, "c": 3}'
Compact separators
# Remove spaces for smaller output
json.dumps(data, separators=(',', ':'))
# '{"a":1,"b":2,"c":3}'
JSON with pandas
If you're doing data analysis, pandas makes JSON handling really easy.
Read JSON to DataFrame
import pandas as pd
# From file
df = pd.read_json('users.json')
# From string
json_string = '[{"name": "Alice", "age": 28}, {"name": "Bob", "age": 30}]'
df = pd.read_json(json_string)
# From URL
df = pd.read_json('https://api.example.com/users')
DataFrame to JSON
# To string
json_string = df.to_json(orient='records')
# To file
df.to_json('output.json', orient='records', indent=2)
# Different orientations
df.to_json(orient='records') # [{"name": "Alice"}, ...]
df.to_json(orient='columns') # {"name": {"0": "Alice"}, ...}
df.to_json(orient='index') # {"0": {"name": "Alice"}, ...}
Normalize nested JSON
Flatten nested structures into a table:
nested_data = [
{"name": "Alice", "address": {"city": "NYC", "zip": "10001"}},
{"name": "Bob", "address": {"city": "LA", "zip": "90001"}}
]
df = pd.json_normalize(nested_data)
# Columns: name, address.city, address.zip
Error Handling
Catching parse errors
import json
def safe_parse(json_string):
try:
return json.loads(json_string)
except json.JSONDecodeError as e:
print(f"Invalid JSON: {e}")
return None
data = safe_parse('{"broken": }') # Invalid JSON: ...
data = safe_parse('{"valid": true}') # Returns dict
Handle file not found
def load_json_file(filepath, default=None):
try:
with open(filepath) as f:
return json.load(f)
except FileNotFoundError:
return default
except json.JSONDecodeError:
return default
config = load_json_file('config.json', {})
Common Patterns
Merge JSON files
import json
from pathlib import Path
def merge_json_files(directory):
all_data = []
for json_file in Path(directory).glob('*.json'):
with open(json_file) as f:
data = json.load(f)
if isinstance(data, list):
all_data.extend(data)
else:
all_data.append(data)
return all_data
Convert CSV to JSON
import csv
import json
with open('data.csv') as csv_file:
reader = csv.DictReader(csv_file)
data = list(reader)
with open('data.json', 'w') as json_file:
json.dump(data, json_file, indent=2)
Custom serialization
Handle types that aren't JSON serializable:
from datetime import datetime
import json
class DateTimeEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat()
return super().default(obj)
data = {"created": datetime.now()}
json.dumps(data, cls=DateTimeEncoder)
# '{"created": "2026-01-28T10:30:00"}'
Advanced Patterns
When building enterprise AI systems for translating product catalogs and processing PIM data, I've learned these advanced JSON handling patterns are essential.
Custom encoders and decoders
In my experience working with international e-commerce data, standard JSON serialization doesn't handle many Python types. Custom encoders solve this.
import json
from decimal import Decimal
from datetime import datetime, date
class ExtendedEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, Decimal):
return float(obj)
if isinstance(obj, (datetime, date)):
return obj.isoformat()
if isinstance(obj, set):
return list(obj)
return super().default(obj)
# Usage
data = {
"price": Decimal("19.99"),
"date": datetime.now(),
"tags": {"new", "sale"}
}
json.dumps(data, cls=ExtendedEncoder)
# '{"price": 19.99, "date": "2026-01-28T10:30:00", "tags": ["new", "sale"]}'
Datetime handling across timezones
When translating content for global markets, timezone handling becomes critical. Here's the pattern I use to avoid timezone bugs.
from datetime import datetime, timezone
import json
def datetime_handler(obj):
if isinstance(obj, datetime):
# Always convert to UTC and include timezone
return obj.astimezone(timezone.utc).isoformat()
return obj
# Serialize with timezone info
data = {"published": datetime.now(timezone.utc)}
json.dumps(data, default=datetime_handler)
# '{"published": "2026-01-28T10:30:00+00:00"}'
# Parse with timezone awareness
def parse_datetime(dct):
for k, v in dct.items():
if isinstance(v, str) and 'T' in v:
try:
dct[k] = datetime.fromisoformat(v)
except ValueError:
pass
return dct
json.loads(json_string, object_hook=parse_datetime)
Streaming large JSON files with ijson
When processing multi-gigabyte product feeds for translation, loading the entire file crashes. I use ijson to stream parse one item at a time.
# pip install ijson
import ijson
def process_large_json(filename):
with open(filename, 'rb') as f:
# Parse array items one at a time
for item in ijson.items(f, 'products.item'):
# Process each product without loading entire file
process_product(item)
# For deeply nested data
def extract_nested(filename):
with open(filename, 'rb') as f:
# Navigate to specific path
for value in ijson.items(f, 'data.users.item.email'):
print(value) # Only extracts email fields
Partial JSON updates
When updating specific fields in large JSON config files without loading the entire structure:
import json
def update_json_field(filename, path, new_value):
"""Update a specific field in a JSON file"""
with open(filename, 'r') as f:
data = json.load(f)
# Navigate to the field using path
keys = path.split('.')
current = data
for key in keys[:-1]:
current = current[key]
# Update the field
current[keys[-1]] = new_value
# Write back
with open(filename, 'w') as f:
json.dump(data, f, indent=2)
# Usage
update_json_field('config.json', 'database.port', 5433)
Performance Comparison
In production systems processing thousands of JSON files daily, I've benchmarked different Python JSON libraries. The standard library is good, but alternatives can be 10x faster.
Standard json module
Built-in, reliable, but not the fastest. Good for most use cases.
import json
import time
data = {"users": [{"id": i, "name": f"User{i}"} for i in range(10000)]}
start = time.time()
json_str = json.dumps(data)
parsed = json.loads(json_str)
print(f"json module: {time.time() - start:.3f}s")
ujson (UltraJSON)
2-3x faster than standard json. I use this in API endpoints where milliseconds matter.
# pip install ujson
import ujson
start = time.time()
json_str = ujson.dumps(data)
parsed = ujson.loads(json_str)
print(f"ujson: {time.time() - start:.3f}s")
# Typically 2-3x faster than json module
orjson (Fastest)
When building AI translation pipelines that process gigabytes of product data, orjson's 5-10x speed improvement is game-changing. It's written in Rust.
# pip install orjson
import orjson
start = time.time()
json_bytes = orjson.dumps(data) # Returns bytes, not str
parsed = orjson.loads(json_bytes)
print(f"orjson: {time.time() - start:.3f}s")
# Typically 5-10x faster than json module
# orjson has nice defaults:
# - Handles datetime automatically
# - Handles UUID automatically
# - Handles dataclasses automatically
from datetime import datetime
data = {"created": datetime.now()}
orjson.dumps(data) # Just works, no custom encoder needed
When to use which
- json - Default choice, always available, good enough for most cases
- ujson - When you need better performance but want a drop-in replacement
- orjson - When performance is critical (APIs, data processing pipelines, large files)
- ijson - When files are too large to fit in memory (streaming parser)
In my translation systems, I use orjson for API responses and ijson for processing multi-gigabyte product feeds. The performance gain is worth the extra dependency.