JSON Streaming and Real-Time Data: NDJSON, SSE, WebSockets
When building real-time translation pipelines for our AI platform, I discovered that loading entire JSON files into memory wasn't viable. Processing 500MB JSON files crashed the server. We needed streaming - parsing JSON as it arrives, processing incrementally, without waiting for the full response.
This guide shares what I've learned about JSON streaming and real-time data patterns from building production systems that handle thousands of events daily. These techniques apply to large file processing, real-time dashboards, live chat, and any system where latency matters.
When to Use Streaming vs Batch Processing
The decision between streaming and batch processing JSON comes down to data size, latency requirements, and memory constraints. I learned this the hard way.
Use streaming when:
- Files are large: Processing 100MB+ JSON files without loading everything into memory
- Latency matters: Showing first results before the full response arrives (progressive loading)
- Real-time updates: Chat messages, stock prices, live dashboards
- Memory is limited: Processing on devices with constrained resources
- Unknown data size: When you don't know how much data will arrive
Use batch processing when:
- Small payloads: Files under 10MB that fit comfortably in memory
- Need full data: Operations require the entire dataset (sorting, aggregating)
- Simplicity matters: Batch processing is easier to implement and debug
- No time pressure: Results can wait for complete processing
Real example: Log analysis
When building our log analysis tool, we started with batch processing. Load the entire log file, parse JSON, analyze. It worked great until logs hit 200MB. The Node.js process would consume 1.5GB RAM and take 30 seconds to respond.
After switching to streaming NDJSON, memory usage stayed under 50MB and first results appeared in under 1 second. The full processing took the same time, but user experience was dramatically better.
NDJSON (Newline Delimited JSON)
NDJSON is JSON streaming's secret weapon. Each line is a complete, valid JSON object. This makes it perfect for streaming because you can parse line-by-line without waiting for the entire file.
NDJSON format example
{"id":1,"name":"Alice","email":"alice@example.com"}
{"id":2,"name":"Bob","email":"bob@example.com"}
{"id":3,"name":"Charlie","email":"charlie@example.com"}
Compare this to standard JSON array format:
[
{"id":1,"name":"Alice","email":"alice@example.com"},
{"id":2,"name":"Bob","email":"bob@example.com"},
{"id":3,"name":"Charlie","email":"charlie@example.com"}
]
With NDJSON, you can start processing the first line immediately. With a JSON array, you must parse the entire structure before accessing any data.
When I use NDJSON
- Log files: Each log entry is one line, easily grep-able and streamable
- Database exports: Exporting thousands of records without memory issues
- Data pipelines: Processing large datasets incrementally
- API streaming: Sending results as they're computed
Converting between formats
Standard JSON array to NDJSON:
const fs = require('fs');
// Read JSON array
const data = JSON.parse(fs.readFileSync('data.json', 'utf8'));
// Write NDJSON
const output = fs.createWriteStream('data.ndjson');
data.forEach(item => {
output.write(JSON.stringify(item) + '\n');
});
output.end();
NDJSON to standard JSON array:
const fs = require('fs');
const readline = require('readline');
async function ndjsonToArray(inputFile, outputFile) {
const rl = readline.createInterface({
input: fs.createReadStream(inputFile)
});
const results = [];
for await (const line of rl) {
if (line.trim()) {
results.push(JSON.parse(line));
}
}
fs.writeFileSync(outputFile, JSON.stringify(results, null, 2));
}
ndjsonToArray('data.ndjson', 'data.json');
NDJSON best practices
- One complete JSON object per line (no multi-line formatting)
- Use UTF-8 encoding
- End each line with
\n(Unix line endings) - Skip empty lines during parsing
- Include error recovery (skip malformed lines, log errors)
Streaming Parsers and Libraries
Standard JSON.parse() requires the entire JSON string in memory. Streaming
parsers process JSON incrementally, emitting events as data arrives.
JSONStream (Node.js)
My go-to library for streaming large JSON files in Node.js. Perfect for processing arrays and filtering data on-the-fly.
const fs = require('fs');
const JSONStream = require('JSONStream');
// Stream large JSON array
fs.createReadStream('large-data.json')
.pipe(JSONStream.parse('users.*')) // Parse array at 'users' path
.on('data', (user) => {
// Process each user as it arrives
console.log(user.name);
processUser(user);
})
.on('end', () => {
console.log('Stream finished');
})
.on('error', (err) => {
console.error('Stream error:', err);
});
This processes a 500MB JSON file using constant memory (around 10-20MB).
oboe.js (Browser)
Client-side streaming JSON parser. When building our real-time translation interface, I used oboe.js to show results as they arrived from the API.
oboe('/api/translations')
.node('items.*', function(item) {
// Called for each item in 'items' array
displayResult(item);
// Return oboe.drop to free memory after processing
return oboe.drop;
})
.done(function(finalJSON) {
console.log('Complete response:', finalJSON);
})
.fail(function(error) {
console.error('Request failed:', error);
});
The oboe.drop pattern is crucial for memory efficiency. It tells oboe to
discard processed items, preventing memory buildup.
stream-json (Node.js)
More modern alternative to JSONStream with better performance:
const {chain} = require('stream-chain');
const {parser} = require('stream-json');
const {streamArray} = require('stream-json/streamers/StreamArray');
const pipeline = chain([
fs.createReadStream('data.json'),
parser(),
streamArray()
]);
pipeline.on('data', ({key, value}) => {
// key is the array index
// value is the object
console.log(`Item ${key}:`, value);
});
pipeline.on('end', () => {
console.log('Done');
});
Performance comparison
I benchmarked these libraries processing a 100MB JSON file (1 million objects) on my laptop:
| Method | Memory Usage | Time | Notes |
|---|---|---|---|
| JSON.parse() | 850MB | 3.2s | Fast but massive memory |
| JSONStream | 15MB | 12.1s | Stable memory, slower |
| stream-json | 18MB | 8.4s | Best balance |
| NDJSON (readline) | 8MB | 4.5s | Fastest streaming option |
For new projects, I recommend NDJSON format with simple readline parsing. It's fast, memory-efficient, and easy to debug.
Server-Sent Events (SSE) with JSON
When building real-time dashboards, I use Server-Sent Events for server-to-client streaming. It's simpler than WebSockets when you only need one-way communication.
What is SSE?
SSE is a browser API that opens a persistent HTTP connection. The server can push JSON updates to the client anytime. Perfect for live notifications, stock tickers, real-time logs.
Server implementation (Node.js)
const express = require('express');
const app = express();
app.get('/api/live-stats', (req, res) => {
// Set headers for SSE
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
// Send initial data
sendEvent(res, 'connected', { timestamp: Date.now() });
// Send updates every 2 seconds
const interval = setInterval(() => {
const stats = {
users: Math.floor(Math.random() * 1000),
requests: Math.floor(Math.random() * 10000),
timestamp: Date.now()
};
sendEvent(res, 'stats', stats);
}, 2000);
// Clean up when client disconnects
req.on('close', () => {
clearInterval(interval);
res.end();
});
});
function sendEvent(res, event, data) {
res.write(`event: ${event}\n`);
res.write(`data: ${JSON.stringify(data)}\n\n`);
}
app.listen(3000);
Client implementation (Browser)
const eventSource = new EventSource('/api/live-stats');
eventSource.addEventListener('connected', (e) => {
const data = JSON.parse(e.data);
console.log('Connected:', data);
});
eventSource.addEventListener('stats', (e) => {
const stats = JSON.parse(e.data);
updateDashboard(stats);
});
eventSource.onerror = (error) => {
console.error('SSE error:', error);
eventSource.close();
};
When I use SSE
- Live dashboards: Real-time metrics and analytics
- Notifications: Push alerts to users
- Progress tracking: Long-running job status updates
- Live logs: Streaming server logs to admin interface
SSE advantages over WebSockets
- Simpler protocol (just HTTP)
- Automatic reconnection built-in
- Works through most proxies and firewalls
- No special server infrastructure needed
- Built-in event types and message IDs
SSE limitations
- One-way only (server to client)
- Limited browser connections (6 per domain in some browsers)
- Text-only (but JSON works fine)
For bi-directional communication (client sends messages too), use WebSockets instead.
WebSocket JSON Messaging Patterns
When building our real-time collaboration tool, SSE wasn't enough. We needed clients to send data to the server and to each other. WebSockets provided the solution.
Basic WebSocket with JSON
Server (Node.js with ws library):
const WebSocket = require('ws');
const wss = new WebSocket.Server({ port: 8080 });
wss.on('connection', (ws) => {
console.log('Client connected');
// Send welcome message
ws.send(JSON.stringify({
type: 'welcome',
message: 'Connected to server',
timestamp: Date.now()
}));
// Handle incoming messages
ws.on('message', (data) => {
try {
const message = JSON.parse(data);
handleMessage(ws, message);
} catch (err) {
ws.send(JSON.stringify({
type: 'error',
message: 'Invalid JSON'
}));
}
});
ws.on('close', () => {
console.log('Client disconnected');
});
});
function handleMessage(ws, message) {
switch (message.type) {
case 'chat':
// Broadcast to all clients
broadcast({ type: 'chat', ...message });
break;
case 'typing':
// Send typing indicator to others
broadcastExcept(ws, { type: 'typing', user: message.user });
break;
default:
ws.send(JSON.stringify({ type: 'error', message: 'Unknown type' }));
}
}
function broadcast(data) {
wss.clients.forEach((client) => {
if (client.readyState === WebSocket.OPEN) {
client.send(JSON.stringify(data));
}
});
}
function broadcastExcept(sender, data) {
wss.clients.forEach((client) => {
if (client !== sender && client.readyState === WebSocket.OPEN) {
client.send(JSON.stringify(data));
}
});
}
Client (Browser):
const ws = new WebSocket('ws://localhost:8080');
ws.onopen = () => {
console.log('Connected to server');
// Send a message
ws.send(JSON.stringify({
type: 'chat',
user: 'Alice',
message: 'Hello everyone!'
}));
};
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
switch (data.type) {
case 'welcome':
console.log('Welcome:', data.message);
break;
case 'chat':
displayMessage(data.user, data.message);
break;
case 'typing':
showTypingIndicator(data.user);
break;
default:
console.warn('Unknown message type:', data.type);
}
};
ws.onerror = (error) => {
console.error('WebSocket error:', error);
};
ws.onclose = () => {
console.log('Disconnected from server');
};
Message structure pattern
I always use a consistent message structure:
{
"type": "message_type",
"id": "unique-message-id",
"timestamp": 1707315600000,
"data": {
// Message-specific payload
}
}
Benefits:
typefield enables easy routing and handlingidallows tracking and deduplicationtimestamphelps with ordering and debugging- Separating
datakeeps structure clean
Heartbeat and connection health
WebSocket connections can silently die. Implement heartbeats:
// Server
const HEARTBEAT_INTERVAL = 30000; // 30 seconds
wss.on('connection', (ws) => {
ws.isAlive = true;
ws.on('pong', () => {
ws.isAlive = true;
});
});
const heartbeat = setInterval(() => {
wss.clients.forEach((ws) => {
if (ws.isAlive === false) {
return ws.terminate();
}
ws.isAlive = false;
ws.ping();
});
}, HEARTBEAT_INTERVAL);
// Client
let heartbeatInterval;
ws.onopen = () => {
heartbeatInterval = setInterval(() => {
if (ws.readyState === WebSocket.OPEN) {
ws.send(JSON.stringify({ type: 'ping' }));
}
}, 30000);
};
ws.onclose = () => {
clearInterval(heartbeatInterval);
};
When I use WebSockets
- Real-time chat: Instant messaging between users
- Collaborative editing: Multiple users editing same document
- Multiplayer games: Low-latency game state synchronization
- Live trading: Real-time stock prices and order execution
- IoT dashboards: Bi-directional device communication
Performance Comparison: Streaming vs Batch
I ran benchmarks comparing different JSON processing approaches for a common use case: processing 1 million log entries (100MB JSON file).
Test scenario
- 1 million JSON objects (log entries)
- Total size: 100MB
- Task: Filter for errors, count by type, extract timestamps
- Hardware: MacBook Pro M1, 16GB RAM, Node.js v20
Results
| Approach | Time | Peak Memory | Time to First Result |
|---|---|---|---|
| Batch (JSON.parse) | 3.2s | 850MB | 3.2s |
| JSONStream | 12.1s | 15MB | 0.2s |
| stream-json | 8.4s | 18MB | 0.15s |
| NDJSON (readline) | 4.5s | 8MB | 0.08s |
Key insights
- Batch is fastest for total processing time, but uses 100x more memory
- NDJSON streaming provides the best balance of speed and memory efficiency
- Time to first result is critical for user experience - streaming wins by 40x
- Memory usage with streaming stays constant regardless of file size
Real-world impact
When I switched our log analysis tool from batch to NDJSON streaming, the perceived performance improvement was dramatic. Users saw results immediately instead of waiting 30+ seconds, even though total processing time was similar.
More importantly, we eliminated out-of-memory crashes that occurred with large log files. The streaming version handles 10GB files without issues. Check out our guide on JSON performance optimization for more techniques.
Real-Time Use Cases from Production
Here are specific implementations I've built using these patterns.
Use case 1: AI translation streaming
When translating large documents with OpenAI, I stream results back to the user as they arrive:
// Server
app.get('/api/translate', async (req, res) => {
res.setHeader('Content-Type', 'text/event-stream');
const stream = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{role: 'user', content: req.query.text}],
stream: true
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
sendEvent(res, 'translation', { text: content });
}
}
sendEvent(res, 'done', { complete: true });
res.end();
});
Users see translated text appearing in real-time, word by word.
Use case 2: Live analytics dashboard
Real-time metrics streaming to dashboard using SSE:
// Aggregate metrics every second
setInterval(async () => {
const metrics = await database.getRealtimeMetrics();
// Broadcast to all connected clients
clients.forEach(res => {
sendEvent(res, 'metrics', metrics);
});
}, 1000);
Use case 3: Log tailing
Stream server logs to browser in real-time:
const tail = require('tail').Tail;
const logFile = new tail('/var/log/app.log');
logFile.on('line', (line) => {
try {
const logEntry = JSON.parse(line); // Assuming NDJSON logs
// Send to all connected WebSocket clients
broadcast({
type: 'log',
level: logEntry.level,
message: logEntry.message,
timestamp: logEntry.timestamp
});
} catch (err) {
// Skip malformed lines
}
});
Use case 4: Progress tracking for bulk operations
Show progress when processing large datasets:
app.post('/api/process', async (req, res) => {
res.setHeader('Content-Type', 'text/event-stream');
const items = await getItemsToProcess(); // 10,000 items
let processed = 0;
for (const item of items) {
await processItem(item);
processed++;
// Send progress update every 100 items
if (processed % 100 === 0) {
sendEvent(res, 'progress', {
processed,
total: items.length,
percentage: (processed / items.length * 100).toFixed(1)
});
}
}
sendEvent(res, 'complete', { processed });
res.end();
});
Implementation Best Practices
Lessons learned from building production streaming systems.
Error handling
Always wrap JSON parsing in try-catch:
ws.on('message', (data) => {
try {
const message = JSON.parse(data);
handleMessage(message);
} catch (err) {
console.error('Invalid JSON:', err);
ws.send(JSON.stringify({
type: 'error',
message: 'Invalid JSON format'
}));
}
});
Backpressure handling
When streaming large files, respect backpressure to avoid memory issues. Learn more in our performance optimization guide.
const readable = fs.createReadStream('large.ndjson');
const writable = processStream();
readable.on('data', (chunk) => {
const canContinue = writable.write(chunk);
if (!canContinue) {
// Pause reading until drain event
readable.pause();
}
});
writable.on('drain', () => {
readable.resume();
});
Connection recovery
Implement exponential backoff for reconnections:
class ReconnectingWebSocket {
constructor(url) {
this.url = url;
this.reconnectDelay = 1000;
this.maxReconnectDelay = 30000;
this.connect();
}
connect() {
this.ws = new WebSocket(this.url);
this.ws.onopen = () => {
console.log('Connected');
this.reconnectDelay = 1000; // Reset delay on successful connect
};
this.ws.onclose = () => {
console.log('Disconnected, reconnecting...');
setTimeout(() => this.connect(), this.reconnectDelay);
// Exponential backoff
this.reconnectDelay = Math.min(
this.reconnectDelay * 2,
this.maxReconnectDelay
);
};
}
}
Security considerations
Always validate and sanitize JSON data, especially in real-time systems. See our security best practices guide.
// Validate message structure
function validateMessage(message) {
if (!message.type || typeof message.type !== 'string') {
throw new Error('Invalid message type');
}
if (message.type === 'chat' && !message.message) {
throw new Error('Chat message required');
}
// Sanitize HTML in user content
if (message.message) {
message.message = sanitizeHtml(message.message);
}
return message;
}
Monitoring and debugging
Log events for debugging streaming issues:
function sendEvent(res, event, data) {
const payload = JSON.stringify(data);
// Log for debugging
console.log(`[SSE] ${event}: ${payload.substring(0, 100)}...`);
res.write(`event: ${event}\n`);
res.write(`data: ${payload}\n\n`);
}