Migrating from XML to JSON: A Complete Migration Guide
I've guided several companies through XML to JSON migrations, including a major e-commerce platform that had 15 years of XML-based APIs serving thousands of users. The decision to migrate wasn't taken lightly, and the execution required careful planning to avoid disrupting production systems.
This guide shares the strategies, tools, and pitfalls I've learned from these real migrations. Whether you're modernizing an API, replacing legacy systems, or just evaluating whether to migrate, this will help you make informed decisions.
Why Companies Migrate from XML to JSON
When I joined one company, they were still serving XML responses from APIs built in 2008. Every new mobile developer complained about the complexity. Here's why we decided to migrate.
JSON is simpler and more readable
Compare the same data in both formats:
<?xml version="1.0" encoding="UTF-8"?>
<product>
<id>12345</id>
<name>Wireless Mouse</name>
<price currency="USD">29.99</price>
<tags>
<tag>electronics</tag>
<tag>accessories</tag>
</tags>
</product>
{
"id": 12345,
"name": "Wireless Mouse",
"price": {
"amount": 29.99,
"currency": "USD"
},
"tags": ["electronics", "accessories"]
}
JSON is immediately understandable. Developers can parse it visually. XML requires constant tag matching and namespace resolution.
Native JavaScript support
JSON is JavaScript Object Notation. In the browser and Node.js, parsing is built-in and fast:
// JSON parsing - native and simple
const data = JSON.parse(response);
console.log(data.name); // "Wireless Mouse"
// XML parsing - requires library and complexity
const parser = new DOMParser();
const xmlDoc = parser.parseFromString(response, "text/xml");
const name = xmlDoc.getElementsByTagName("name")[0].textContent;
Smaller payload sizes
JSON is typically 20-30% smaller than equivalent XML. When I migrated one API, average response sizes dropped from 4.2KB to 2.8KB, reducing bandwidth costs by 33%.
Better mobile app experience
Every iOS and Android developer on our team preferred JSON. Native JSON parsing is fast, well-documented, and doesn't require third-party XML libraries.
Modern API ecosystem
GraphQL, REST best practices, OpenAPI specifications - the entire modern API ecosystem is built around JSON. Staying with XML meant isolation from tooling and community support.
Key Structural Differences to Consider
XML and JSON have fundamental differences. Understanding these is critical before starting a migration.
Attributes vs properties
XML has both elements and attributes. JSON only has properties:
<product id="12345" status="active">
<name>Wireless Mouse</name>
</product>
You must decide how to represent attributes in JSON:
// Option 1: Flatten everything
{
"id": 12345,
"status": "active",
"name": "Wireless Mouse"
}
// Option 2: Use @-prefix for attributes (common in converters)
{
"@id": 12345,
"@status": "active",
"name": "Wireless Mouse"
}
// Option 3: Separate metadata object
{
"id": 12345,
"attributes": {
"status": "active"
},
"name": "Wireless Mouse"
}
I recommend Option 1 (flattening) for new APIs. It's the most natural JSON structure.
Namespaces
XML namespaces don't have a direct JSON equivalent:
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<m:GetProduct xmlns:m="http://example.com/products">
<m:ProductID>12345</m:ProductID>
</m:GetProduct>
</soap:Body>
</soap:Envelope>
In JSON, you typically drop namespaces or use prefixed property names:
{
"soap_envelope": {
"soap_body": {
"get_product": {
"product_id": 12345
}
}
}
}
Mixed content
XML allows text mixed with child elements. JSON doesn't:
<description>
This product is <emphasis>amazing</emphasis> and available now.
</description>
You must restructure this in JSON:
{
"description": {
"text": "This product is amazing and available now.",
"emphasis": ["amazing"]
}
}
// Or use markup in string
{
"description_html": "This product is <em>amazing</em> and available now."
}
Order preservation
XML elements have a defined order. JSON objects are unordered (though arrays are ordered). If order matters, use arrays:
<steps>
<step>Preheat oven</step>
<step>Mix ingredients</step>
<step>Bake for 30 minutes</step>
</steps>
{
"steps": [
"Preheat oven",
"Mix ingredients",
"Bake for 30 minutes"
]
}
Type inference
XML represents everything as text. JSON has native types:
<product>
<price>29.99</price>
<in_stock>true</in_stock>
<quantity>150</quantity>
</product>
In JSON, you can use native types without quotes:
{
"price": 29.99,
"in_stock": true,
"quantity": 150
}
This requires careful type conversion during migration to avoid breaking clients.
Conversion Strategies and Tooling
When migrating that e-commerce platform, I evaluated dozens of tools and strategies. Here's what actually worked in production.
Automated conversion tools
xml2js (Node.js)
const xml2js = require('xml2js');
const parser = new xml2js.Parser({
explicitArray: false, // Don't wrap single elements in arrays
ignoreAttrs: false, // Preserve attributes
attrkey: '@', // Prefix attributes with @
charkey: '_text' // Store text content in _text
});
const xml = `
<product id="12345">
<name>Wireless Mouse</name>
<price>29.99</price>
</product>
`;
parser.parseString(xml, (err, result) => {
console.log(JSON.stringify(result, null, 2));
});
Output:
{
"product": {
"@id": "12345",
"name": "Wireless Mouse",
"price": "29.99"
}
}
xmltodict (Python)
import xmltodict
import json
xml_data = """
<product id="12345">
<name>Wireless Mouse</name>
<price>29.99</price>
</product>
"""
data = xmltodict.parse(xml_data)
print(json.dumps(data, indent=2))
Custom transformation pipeline
For complex migrations, automated tools aren't enough. I built custom transformation pipelines with validation:
class XMLToJSONMigration {
constructor() {
this.parser = new xml2js.Parser();
this.transformations = [];
}
addTransformation(fn) {
this.transformations.push(fn);
}
async convert(xmlString) {
// Step 1: Parse XML
let data = await this.parser.parseStringPromise(xmlString);
// Step 2: Apply custom transformations
for (const transform of this.transformations) {
data = transform(data);
}
// Step 3: Validate result
this.validate(data);
return data;
}
validate(data) {
// Custom validation logic
if (!data.product || !data.product.id) {
throw new Error('Invalid product structure');
}
}
}
// Usage
const migration = new XMLToJSONMigration();
// Transform attribute @id to top-level id property
migration.addTransformation(data => {
if (data.product && data.product['@id']) {
data.product.id = parseInt(data.product['@id']);
delete data.product['@id'];
}
return data;
});
// Convert price string to number
migration.addTransformation(data => {
if (data.product && data.product.price) {
data.product.price = parseFloat(data.product.price);
}
return data;
});
const result = await migration.convert(xmlString);
Dual-format strategy for gradual migration
The safest approach I've used: serve both XML and JSON simultaneously during transition.
// Express.js example
app.get('/api/products/:id', async (req, res) => {
const product = await db.getProduct(req.params.id);
// Check Accept header or query parameter
const format = req.query.format ||
(req.accepts('application/json') ? 'json' : 'xml');
if (format === 'json') {
res.json(product);
} else {
const xml = convertToXML(product);
res.type('application/xml').send(xml);
}
});
This allows gradual client migration without breaking existing systems.
Handling XML-Specific Features
These XML features don't map directly to JSON and require special handling.
CDATA sections
XML CDATA allows unescaped text:
<description><![CDATA[
Price: $29.99 <- Special offer!
Available <now>
]]></description>
In JSON, just use strings (characters are automatically escaped):
{
"description": "Price: $29.99 <- Special offer!\nAvailable <now>"
}
Processing instructions
XML processing instructions don't exist in JSON:
<?xml-stylesheet type="text/xsl" href="style.xsl"?>
Store these separately if needed, or drop them entirely for API responses.
Comments
XML comments have no JSON equivalent:
<!-- This is a comment -->
<product>...</product>
JSON has no comment syntax. Use a metadata field if comments are critical:
{
"_comment": "This is a comment",
"product": {...}
}
Empty elements
XML has multiple ways to represent empty elements:
<tag/>
<tag></tag>
<tag> </tag>
In JSON, choose a consistent representation:
{
"tag": null // Option 1: explicit null
// "tag": "" // Option 2: empty string
// Omit the key // Option 3: don't include it
}
I recommend omitting empty values entirely, especially after reading about cleaning JSON data to reduce payload size.
Testing and Validation During Migration
Testing is where most migrations fail. When handling thousands of API requests daily, you can't afford data loss or incorrect transformations.
Snapshot testing
Convert a large sample of XML responses and save them as test fixtures:
const fs = require('fs');
const assert = require('assert');
describe('XML to JSON migration', () => {
it('converts product responses correctly', async () => {
const xmlSamples = fs.readdirSync('./test/xml-samples');
for (const file of xmlSamples) {
const xml = fs.readFileSync(`./test/xml-samples/${file}`, 'utf8');
const json = await convertXMLToJSON(xml);
// Load expected JSON
const expected = JSON.parse(
fs.readFileSync(`./test/json-expected/${file}.json`, 'utf8')
);
assert.deepStrictEqual(json, expected);
}
});
});
Schema validation
Define JSON schemas to ensure converted data meets requirements. Learn more about JSON Schema validation.
const Ajv = require('ajv');
const ajv = new Ajv();
const productSchema = {
type: 'object',
required: ['id', 'name', 'price'],
properties: {
id: { type: 'integer' },
name: { type: 'string', minLength: 1 },
price: { type: 'number', minimum: 0 }
}
};
const validate = ajv.compile(productSchema);
// After conversion
const json = await convertXMLToJSON(xml);
const valid = validate(json);
if (!valid) {
console.error(validate.errors);
throw new Error('Schema validation failed');
}
A/B testing in production
Run both XML and JSON converters in parallel, comparing results without exposing JSON to clients yet:
app.get('/api/products/:id', async (req, res) => {
const product = await db.getProduct(req.params.id);
// Convert to both formats
const xml = convertToXML(product);
const json = convertToJSON(product);
// Log differences for analysis
if (process.env.MIGRATION_MODE === 'testing') {
await logConversionComparison(product.id, xml, json);
}
// Still return XML for now
res.type('application/xml').send(xml);
});
Data integrity checks
Validate that no data is lost during conversion:
function validateConversion(original, converted) {
const checks = {
fieldCount: countFields(original) === countFields(converted),
requiredFieldsPresent: ['id', 'name', 'price'].every(
field => converted[field] !== undefined
),
numericTypesCorrect:
typeof converted.id === 'number' &&
typeof converted.price === 'number'
};
const failed = Object.entries(checks)
.filter(([_, passed]) => !passed)
.map(([check]) => check);
if (failed.length > 0) {
throw new Error(`Validation failed: ${failed.join(', ')}`);
}
return true;
}
Real Migration Case Study: E-Commerce API
Let me walk through a complete migration I led for a company with 15-year-old XML APIs serving 2 million daily requests.
The situation
- 200+ XML API endpoints
- 50+ client applications (web, mobile, partner integrations)
- Complex SOAP APIs with extensive namespaces
- Average response size: 4.2KB XML
- No downtime tolerance - 99.9% SLA
Migration timeline (6 months)
Month 1: Analysis and planning
- Audited all 200 endpoints and their usage
- Identified 20 high-priority endpoints (80% of traffic)
- Analyzed XML structures and edge cases
- Built sample JSON schemas
- Got stakeholder buy-in
Month 2: Tooling and infrastructure
- Built custom conversion pipeline with validation
- Created comprehensive test suite (500+ test cases)
- Set up monitoring and logging
- Configured dual-format serving infrastructure
Month 3: Pilot migration
- Migrated 5 low-risk endpoints
- Deployed both XML and JSON versions
- Migrated internal tools to test JSON endpoints
- Monitored performance and error rates
- Fixed issues discovered during pilot
Month 4-5: Full migration rollout
- Migrated remaining 195 endpoints in batches of 20
- Updated API documentation with JSON examples
- Provided migration guides for client developers
- Offered support office hours for questions
- Maintained both formats during transition
Month 6: Deprecation
- Added deprecation warnings to XML responses
- Sent emails to known API consumers
- Monitored XML usage declining week over week
- When XML usage dropped below 5%, shut down XML endpoints
Results
- Performance: Average response size dropped from 4.2KB to 2.8KB (33% reduction)
- Developer satisfaction: New client integrations completed 40% faster
- Error rate: Remained stable at 0.02% throughout migration
- Downtime: Zero unplanned downtime
- Cost savings: $18,000/year in bandwidth costs
Key lessons learned
- Start with high-traffic endpoints - Maximize impact early
- Test extensively - We caught 50+ edge cases in testing
- Communicate constantly - Weekly updates prevented surprises
- Don't rush deprecation - Some legacy clients took 4 months to migrate
- Monitor everything - Detailed logs saved us multiple times
Common Pitfalls and How to Avoid Them
These are the mistakes I've seen (and made) during migrations.
Pitfall 1: Losing type information
XML converters often make everything a string:
// Bad - everything is a string
{
"price": "29.99",
"quantity": "150",
"in_stock": "true"
}
// Good - proper types
{
"price": 29.99,
"quantity": 150,
"in_stock": true
}
Solution: Add explicit type conversion in your transformation pipeline.
Pitfall 2: Inconsistent null handling
Decide upfront how to handle missing/null values:
// Be consistent across all endpoints
{
"optional_field": null // Include with null? Or...
// Omit entirely?
}
Solution: Document your convention and enforce it with linting and validation.
Pitfall 3: Breaking existing clients
Changing field names or structure breaks existing integrations. Use versioning or maintain compatibility.
Solution: Serve both formats during transition. See our guide on RESTful API design for versioning strategies.
Pitfall 4: Inadequate testing
Edge cases will surprise you. Test with real production data.
Solution: Create a test corpus from production logs. Test against thousands of real requests.
Pitfall 5: Ignoring performance
JSON parsing is generally faster, but measure to be sure.
Solution: Benchmark before and after. Optimize if needed. Consider our guide on JSON performance optimization.
Migration Checklist
Use this checklist for your own migration project:
Planning phase
- Audit all XML endpoints and usage patterns
- Identify high-priority endpoints
- Analyze XML structures and special features (namespaces, CDATA, etc.)
- Define JSON structure conventions
- Get stakeholder approval and timeline
- Communicate plans to API consumers
Development phase
- Choose or build conversion tools
- Create transformation pipeline
- Define JSON schemas for validation
- Build comprehensive test suite
- Set up dual-format serving infrastructure
- Configure monitoring and logging
Testing phase
- Test with real production data samples
- Validate schema compliance
- Check data integrity (no data loss)
- Verify type conversions are correct
- Test error handling and edge cases
- Benchmark performance
Deployment phase
- Deploy both XML and JSON endpoints
- Start with pilot endpoints (low risk, high value)
- Monitor error rates and performance
- Fix issues quickly
- Migrate remaining endpoints in batches
- Update API documentation
Deprecation phase
- Add deprecation warnings to XML responses
- Set sunset date (6-12 months out)
- Notify all known API consumers
- Monitor XML usage declining
- Provide migration support
- Shut down XML endpoints when usage is low