JSON Streaming and Real-Time Data: NDJSON, SSE, WebSockets

ยท 15 min read

When building real-time translation pipelines for our AI platform, I discovered that loading entire JSON files into memory wasn't viable. Processing 500MB JSON files crashed the server. We needed streaming - parsing JSON as it arrives, processing incrementally, without waiting for the full response.

This guide shares what I've learned about JSON streaming and real-time data patterns from building production systems that handle thousands of events daily. These techniques apply to large file processing, real-time dashboards, live chat, and any system where latency matters.

When to Use Streaming vs Batch Processing

The decision between streaming and batch processing JSON comes down to data size, latency requirements, and memory constraints. I learned this the hard way.

Use streaming when:

  • Files are large: Processing 100MB+ JSON files without loading everything into memory
  • Latency matters: Showing first results before the full response arrives (progressive loading)
  • Real-time updates: Chat messages, stock prices, live dashboards
  • Memory is limited: Processing on devices with constrained resources
  • Unknown data size: When you don't know how much data will arrive

Use batch processing when:

  • Small payloads: Files under 10MB that fit comfortably in memory
  • Need full data: Operations require the entire dataset (sorting, aggregating)
  • Simplicity matters: Batch processing is easier to implement and debug
  • No time pressure: Results can wait for complete processing

Real example: Log analysis

When building our log analysis tool, we started with batch processing. Load the entire log file, parse JSON, analyze. It worked great until logs hit 200MB. The Node.js process would consume 1.5GB RAM and take 30 seconds to respond.

After switching to streaming NDJSON, memory usage stayed under 50MB and first results appeared in under 1 second. The full processing took the same time, but user experience was dramatically better.

NDJSON (Newline Delimited JSON)

NDJSON is JSON streaming's secret weapon. Each line is a complete, valid JSON object. This makes it perfect for streaming because you can parse line-by-line without waiting for the entire file.

NDJSON format example

{"id":1,"name":"Alice","email":"alice@example.com"}
{"id":2,"name":"Bob","email":"bob@example.com"}
{"id":3,"name":"Charlie","email":"charlie@example.com"}

Compare this to standard JSON array format:

[
  {"id":1,"name":"Alice","email":"alice@example.com"},
  {"id":2,"name":"Bob","email":"bob@example.com"},
  {"id":3,"name":"Charlie","email":"charlie@example.com"}
]

With NDJSON, you can start processing the first line immediately. With a JSON array, you must parse the entire structure before accessing any data.

When I use NDJSON

  • Log files: Each log entry is one line, easily grep-able and streamable
  • Database exports: Exporting thousands of records without memory issues
  • Data pipelines: Processing large datasets incrementally
  • API streaming: Sending results as they're computed

Converting between formats

Standard JSON array to NDJSON:

const fs = require('fs');

// Read JSON array
const data = JSON.parse(fs.readFileSync('data.json', 'utf8'));

// Write NDJSON
const output = fs.createWriteStream('data.ndjson');
data.forEach(item => {
  output.write(JSON.stringify(item) + '\n');
});
output.end();

NDJSON to standard JSON array:

const fs = require('fs');
const readline = require('readline');

async function ndjsonToArray(inputFile, outputFile) {
  const rl = readline.createInterface({
    input: fs.createReadStream(inputFile)
  });

  const results = [];
  for await (const line of rl) {
    if (line.trim()) {
      results.push(JSON.parse(line));
    }
  }

  fs.writeFileSync(outputFile, JSON.stringify(results, null, 2));
}

ndjsonToArray('data.ndjson', 'data.json');

NDJSON best practices

  • One complete JSON object per line (no multi-line formatting)
  • Use UTF-8 encoding
  • End each line with \n (Unix line endings)
  • Skip empty lines during parsing
  • Include error recovery (skip malformed lines, log errors)

Streaming Parsers and Libraries

Standard JSON.parse() requires the entire JSON string in memory. Streaming parsers process JSON incrementally, emitting events as data arrives.

JSONStream (Node.js)

My go-to library for streaming large JSON files in Node.js. Perfect for processing arrays and filtering data on-the-fly.

const fs = require('fs');
const JSONStream = require('JSONStream');

// Stream large JSON array
fs.createReadStream('large-data.json')
  .pipe(JSONStream.parse('users.*'))  // Parse array at 'users' path
  .on('data', (user) => {
    // Process each user as it arrives
    console.log(user.name);
    processUser(user);
  })
  .on('end', () => {
    console.log('Stream finished');
  })
  .on('error', (err) => {
    console.error('Stream error:', err);
  });

This processes a 500MB JSON file using constant memory (around 10-20MB).

oboe.js (Browser)

Client-side streaming JSON parser. When building our real-time translation interface, I used oboe.js to show results as they arrived from the API.

oboe('/api/translations')
  .node('items.*', function(item) {
    // Called for each item in 'items' array
    displayResult(item);

    // Return oboe.drop to free memory after processing
    return oboe.drop;
  })
  .done(function(finalJSON) {
    console.log('Complete response:', finalJSON);
  })
  .fail(function(error) {
    console.error('Request failed:', error);
  });

The oboe.drop pattern is crucial for memory efficiency. It tells oboe to discard processed items, preventing memory buildup.

stream-json (Node.js)

More modern alternative to JSONStream with better performance:

const {chain} = require('stream-chain');
const {parser} = require('stream-json');
const {streamArray} = require('stream-json/streamers/StreamArray');

const pipeline = chain([
  fs.createReadStream('data.json'),
  parser(),
  streamArray()
]);

pipeline.on('data', ({key, value}) => {
  // key is the array index
  // value is the object
  console.log(`Item ${key}:`, value);
});

pipeline.on('end', () => {
  console.log('Done');
});

Performance comparison

I benchmarked these libraries processing a 100MB JSON file (1 million objects) on my laptop:

Method Memory Usage Time Notes
JSON.parse() 850MB 3.2s Fast but massive memory
JSONStream 15MB 12.1s Stable memory, slower
stream-json 18MB 8.4s Best balance
NDJSON (readline) 8MB 4.5s Fastest streaming option

For new projects, I recommend NDJSON format with simple readline parsing. It's fast, memory-efficient, and easy to debug.

Server-Sent Events (SSE) with JSON

When building real-time dashboards, I use Server-Sent Events for server-to-client streaming. It's simpler than WebSockets when you only need one-way communication.

What is SSE?

SSE is a browser API that opens a persistent HTTP connection. The server can push JSON updates to the client anytime. Perfect for live notifications, stock tickers, real-time logs.

Server implementation (Node.js)

const express = require('express');
const app = express();

app.get('/api/live-stats', (req, res) => {
  // Set headers for SSE
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');

  // Send initial data
  sendEvent(res, 'connected', { timestamp: Date.now() });

  // Send updates every 2 seconds
  const interval = setInterval(() => {
    const stats = {
      users: Math.floor(Math.random() * 1000),
      requests: Math.floor(Math.random() * 10000),
      timestamp: Date.now()
    };

    sendEvent(res, 'stats', stats);
  }, 2000);

  // Clean up when client disconnects
  req.on('close', () => {
    clearInterval(interval);
    res.end();
  });
});

function sendEvent(res, event, data) {
  res.write(`event: ${event}\n`);
  res.write(`data: ${JSON.stringify(data)}\n\n`);
}

app.listen(3000);

Client implementation (Browser)

const eventSource = new EventSource('/api/live-stats');

eventSource.addEventListener('connected', (e) => {
  const data = JSON.parse(e.data);
  console.log('Connected:', data);
});

eventSource.addEventListener('stats', (e) => {
  const stats = JSON.parse(e.data);
  updateDashboard(stats);
});

eventSource.onerror = (error) => {
  console.error('SSE error:', error);
  eventSource.close();
};

When I use SSE

  • Live dashboards: Real-time metrics and analytics
  • Notifications: Push alerts to users
  • Progress tracking: Long-running job status updates
  • Live logs: Streaming server logs to admin interface

SSE advantages over WebSockets

  • Simpler protocol (just HTTP)
  • Automatic reconnection built-in
  • Works through most proxies and firewalls
  • No special server infrastructure needed
  • Built-in event types and message IDs

SSE limitations

  • One-way only (server to client)
  • Limited browser connections (6 per domain in some browsers)
  • Text-only (but JSON works fine)

For bi-directional communication (client sends messages too), use WebSockets instead.

WebSocket JSON Messaging Patterns

When building our real-time collaboration tool, SSE wasn't enough. We needed clients to send data to the server and to each other. WebSockets provided the solution.

Basic WebSocket with JSON

Server (Node.js with ws library):

const WebSocket = require('ws');
const wss = new WebSocket.Server({ port: 8080 });

wss.on('connection', (ws) => {
  console.log('Client connected');

  // Send welcome message
  ws.send(JSON.stringify({
    type: 'welcome',
    message: 'Connected to server',
    timestamp: Date.now()
  }));

  // Handle incoming messages
  ws.on('message', (data) => {
    try {
      const message = JSON.parse(data);
      handleMessage(ws, message);
    } catch (err) {
      ws.send(JSON.stringify({
        type: 'error',
        message: 'Invalid JSON'
      }));
    }
  });

  ws.on('close', () => {
    console.log('Client disconnected');
  });
});

function handleMessage(ws, message) {
  switch (message.type) {
    case 'chat':
      // Broadcast to all clients
      broadcast({ type: 'chat', ...message });
      break;
    case 'typing':
      // Send typing indicator to others
      broadcastExcept(ws, { type: 'typing', user: message.user });
      break;
    default:
      ws.send(JSON.stringify({ type: 'error', message: 'Unknown type' }));
  }
}

function broadcast(data) {
  wss.clients.forEach((client) => {
    if (client.readyState === WebSocket.OPEN) {
      client.send(JSON.stringify(data));
    }
  });
}

function broadcastExcept(sender, data) {
  wss.clients.forEach((client) => {
    if (client !== sender && client.readyState === WebSocket.OPEN) {
      client.send(JSON.stringify(data));
    }
  });
}

Client (Browser):

const ws = new WebSocket('ws://localhost:8080');

ws.onopen = () => {
  console.log('Connected to server');

  // Send a message
  ws.send(JSON.stringify({
    type: 'chat',
    user: 'Alice',
    message: 'Hello everyone!'
  }));
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);

  switch (data.type) {
    case 'welcome':
      console.log('Welcome:', data.message);
      break;
    case 'chat':
      displayMessage(data.user, data.message);
      break;
    case 'typing':
      showTypingIndicator(data.user);
      break;
    default:
      console.warn('Unknown message type:', data.type);
  }
};

ws.onerror = (error) => {
  console.error('WebSocket error:', error);
};

ws.onclose = () => {
  console.log('Disconnected from server');
};

Message structure pattern

I always use a consistent message structure:

{
  "type": "message_type",
  "id": "unique-message-id",
  "timestamp": 1707315600000,
  "data": {
    // Message-specific payload
  }
}

Benefits:

  • type field enables easy routing and handling
  • id allows tracking and deduplication
  • timestamp helps with ordering and debugging
  • Separating data keeps structure clean

Heartbeat and connection health

WebSocket connections can silently die. Implement heartbeats:

// Server
const HEARTBEAT_INTERVAL = 30000; // 30 seconds

wss.on('connection', (ws) => {
  ws.isAlive = true;

  ws.on('pong', () => {
    ws.isAlive = true;
  });
});

const heartbeat = setInterval(() => {
  wss.clients.forEach((ws) => {
    if (ws.isAlive === false) {
      return ws.terminate();
    }

    ws.isAlive = false;
    ws.ping();
  });
}, HEARTBEAT_INTERVAL);

// Client
let heartbeatInterval;

ws.onopen = () => {
  heartbeatInterval = setInterval(() => {
    if (ws.readyState === WebSocket.OPEN) {
      ws.send(JSON.stringify({ type: 'ping' }));
    }
  }, 30000);
};

ws.onclose = () => {
  clearInterval(heartbeatInterval);
};

When I use WebSockets

  • Real-time chat: Instant messaging between users
  • Collaborative editing: Multiple users editing same document
  • Multiplayer games: Low-latency game state synchronization
  • Live trading: Real-time stock prices and order execution
  • IoT dashboards: Bi-directional device communication

Performance Comparison: Streaming vs Batch

I ran benchmarks comparing different JSON processing approaches for a common use case: processing 1 million log entries (100MB JSON file).

Test scenario

  • 1 million JSON objects (log entries)
  • Total size: 100MB
  • Task: Filter for errors, count by type, extract timestamps
  • Hardware: MacBook Pro M1, 16GB RAM, Node.js v20

Results

Approach Time Peak Memory Time to First Result
Batch (JSON.parse) 3.2s 850MB 3.2s
JSONStream 12.1s 15MB 0.2s
stream-json 8.4s 18MB 0.15s
NDJSON (readline) 4.5s 8MB 0.08s

Key insights

  • Batch is fastest for total processing time, but uses 100x more memory
  • NDJSON streaming provides the best balance of speed and memory efficiency
  • Time to first result is critical for user experience - streaming wins by 40x
  • Memory usage with streaming stays constant regardless of file size

Real-world impact

When I switched our log analysis tool from batch to NDJSON streaming, the perceived performance improvement was dramatic. Users saw results immediately instead of waiting 30+ seconds, even though total processing time was similar.

More importantly, we eliminated out-of-memory crashes that occurred with large log files. The streaming version handles 10GB files without issues. Check out our guide on JSON performance optimization for more techniques.

Real-Time Use Cases from Production

Here are specific implementations I've built using these patterns.

Use case 1: AI translation streaming

When translating large documents with OpenAI, I stream results back to the user as they arrive:

// Server
app.get('/api/translate', async (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream');

  const stream = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{role: 'user', content: req.query.text}],
    stream: true
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content;
    if (content) {
      sendEvent(res, 'translation', { text: content });
    }
  }

  sendEvent(res, 'done', { complete: true });
  res.end();
});

Users see translated text appearing in real-time, word by word.

Use case 2: Live analytics dashboard

Real-time metrics streaming to dashboard using SSE:

// Aggregate metrics every second
setInterval(async () => {
  const metrics = await database.getRealtimeMetrics();

  // Broadcast to all connected clients
  clients.forEach(res => {
    sendEvent(res, 'metrics', metrics);
  });
}, 1000);

Use case 3: Log tailing

Stream server logs to browser in real-time:

const tail = require('tail').Tail;
const logFile = new tail('/var/log/app.log');

logFile.on('line', (line) => {
  try {
    const logEntry = JSON.parse(line); // Assuming NDJSON logs

    // Send to all connected WebSocket clients
    broadcast({
      type: 'log',
      level: logEntry.level,
      message: logEntry.message,
      timestamp: logEntry.timestamp
    });
  } catch (err) {
    // Skip malformed lines
  }
});

Use case 4: Progress tracking for bulk operations

Show progress when processing large datasets:

app.post('/api/process', async (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream');

  const items = await getItemsToProcess(); // 10,000 items
  let processed = 0;

  for (const item of items) {
    await processItem(item);
    processed++;

    // Send progress update every 100 items
    if (processed % 100 === 0) {
      sendEvent(res, 'progress', {
        processed,
        total: items.length,
        percentage: (processed / items.length * 100).toFixed(1)
      });
    }
  }

  sendEvent(res, 'complete', { processed });
  res.end();
});

Implementation Best Practices

Lessons learned from building production streaming systems.

Error handling

Always wrap JSON parsing in try-catch:

ws.on('message', (data) => {
  try {
    const message = JSON.parse(data);
    handleMessage(message);
  } catch (err) {
    console.error('Invalid JSON:', err);
    ws.send(JSON.stringify({
      type: 'error',
      message: 'Invalid JSON format'
    }));
  }
});

Backpressure handling

When streaming large files, respect backpressure to avoid memory issues. Learn more in our performance optimization guide.

const readable = fs.createReadStream('large.ndjson');
const writable = processStream();

readable.on('data', (chunk) => {
  const canContinue = writable.write(chunk);

  if (!canContinue) {
    // Pause reading until drain event
    readable.pause();
  }
});

writable.on('drain', () => {
  readable.resume();
});

Connection recovery

Implement exponential backoff for reconnections:

class ReconnectingWebSocket {
  constructor(url) {
    this.url = url;
    this.reconnectDelay = 1000;
    this.maxReconnectDelay = 30000;
    this.connect();
  }

  connect() {
    this.ws = new WebSocket(this.url);

    this.ws.onopen = () => {
      console.log('Connected');
      this.reconnectDelay = 1000; // Reset delay on successful connect
    };

    this.ws.onclose = () => {
      console.log('Disconnected, reconnecting...');
      setTimeout(() => this.connect(), this.reconnectDelay);

      // Exponential backoff
      this.reconnectDelay = Math.min(
        this.reconnectDelay * 2,
        this.maxReconnectDelay
      );
    };
  }
}

Security considerations

Always validate and sanitize JSON data, especially in real-time systems. See our security best practices guide.

// Validate message structure
function validateMessage(message) {
  if (!message.type || typeof message.type !== 'string') {
    throw new Error('Invalid message type');
  }

  if (message.type === 'chat' && !message.message) {
    throw new Error('Chat message required');
  }

  // Sanitize HTML in user content
  if (message.message) {
    message.message = sanitizeHtml(message.message);
  }

  return message;
}

Monitoring and debugging

Log events for debugging streaming issues:

function sendEvent(res, event, data) {
  const payload = JSON.stringify(data);

  // Log for debugging
  console.log(`[SSE] ${event}: ${payload.substring(0, 100)}...`);

  res.write(`event: ${event}\n`);
  res.write(`data: ${payload}\n\n`);
}