JSON vs YAML vs TOML: How to Pick the Right Config Format
Compare JSON, YAML, and TOML side by side with syntax examples, parser benchmarks, and a decision table for when each format shines or breaks.
At 2:14 AM on a Tuesday, our deploy pipeline failed. The diff looked clean. The linter passed. Code review had two approvals. But the staging cluster refused to boot, throwing a cryptic error about an unexpected key on line 47 of a Kubernetes manifest. Thirty minutes of caffeine-fueled debugging later, the culprit was a single tab character mixed into a YAML file that otherwise used spaces. The YAML parser read the tab as a syntax error and bailed. No data was lost, but the post-mortem was embarrassing.
That incident pushed our team to actually compare config formats instead of defaulting to whatever the last project used. If you've ever inherited a config file and wondered why it wasn't written in something simpler, this post is the comparison I wish I'd had that night.
The same config in three formats
Before comparing features, it helps to see the exact same data expressed in each format. Here's a small app configuration with a database connection, feature flags, and a list of allowed origins.
JSON
{
"app": {
"name": "acme-api",
"port": 8080,
"debug": false
},
"database": {
"host": "db.internal",
"port": 5432,
"pool_size": 20,
"ssl": true
},
"features": {
"dark_mode": true,
"beta_signup": false
},
"allowed_origins": [
"https://acme.com",
"https://staging.acme.com"
]
} YAML
app:
name: acme-api
port: 8080
debug: false
database:
host: db.internal
port: 5432
pool_size: 20
ssl: true
features:
dark_mode: true
beta_signup: false
allowed_origins:
- https://acme.com
- https://staging.acme.com TOML
[app]
name = "acme-api"
port = 8080
debug = false
[database]
host = "db.internal"
port = 5432
pool_size = 20
ssl = true
[features]
dark_mode = true
beta_signup = false
allowed_origins = [
"https://acme.com",
"https://staging.acme.com",
] At a glance, they all get the job done. The differences matter when files grow past 100 lines, when humans edit them by hand, and when parsers need to be strict.
Feature comparison table
| Feature | JSON | YAML | TOML |
|---|---|---|---|
| Comments | No | Yes (#) | Yes (#) |
| Multi-line strings | Escaped \n only | Multiple styles (|, >) | Triple-quoted """ |
| Date/time type | No (string only) | Yes (implicit) | Yes (RFC 3339) |
| Trailing commas | Not allowed | N/A (no commas) | Allowed |
| Indentation sensitivity | None | Significant (spaces only) | None |
| Anchors/aliases | No | Yes (& / *) | No |
| Spec complexity | ~6 pages | ~80 pages | ~20 pages |
| Deep nesting | Good | Good | Awkward past 3 levels |
YAML's quiet traps
YAML is the most expressive of the three, and that expressiveness introduces real footguns. Two deserve special attention.
The Norway problem
In YAML 1.1 (still the default in many parsers, including PyYAML), bare values get auto-typed. The country code NO for Norway? YAML reads it as boolean false. Same for yes, on, off. This has caused real production data corruption in internationalization systems.
# YAML 1.1 behavior (PyYAML, Ruby's Psych)
countries:
- DK # string "DK"
- NO # boolean false!
- SE # string "SE"
# Fix: quote the value
countries:
- "DK"
- "NO"
- "SE" YAML 1.2 fixed this by removing the implicit boolean conversion for words like NO, yes, on, and off. But many popular libraries still default to 1.1 behavior. If you're writing YAML, always quote strings that could be mistaken for booleans. JSON and TOML don't have this problem because they require explicit syntax for each type.
Deserialization attacks
YAML supports custom type tags like !!python/object, which means a YAML parser can instantiate arbitrary objects during parsing. In 2013, Ruby on Rails suffered a critical remote code execution vulnerability (CVE-2013-0156) partly because YAML deserialization was enabled on user input. Python's yaml.load() had the same class of vulnerability until yaml.safe_load() became the recommended default.
The rule is simple: never parse untrusted YAML with a full loader. Always use safe/strict parsing modes. JSON doesn't have this problem by design since it has no type tag system. TOML also avoids it since its spec doesn't allow arbitrary type construction.
Multi-line strings compared
How each format handles a multi-line SQL query tells you a lot about their design philosophy.
// JSON - escaped newlines, hard to read
{
"query": "SELECT id, name\nFROM users\nWHERE active = true\nORDER BY created_at DESC"
}
# YAML - literal block, line breaks preserved
query: |
SELECT id, name
FROM users
WHERE active = true
ORDER BY created_at DESC
# TOML - triple-quoted, clear boundaries
query = """
SELECT id, name
FROM users
WHERE active = true
ORDER BY created_at DESC"""
JSON is painful here. YAML is the most readable, but choosing between |, >, |-, and >- requires understanding chomping behavior (see our YAML multi-line strings guide for the full breakdown). TOML's triple-quote approach is less flexible but harder to get wrong.
Parser performance benchmarks
To get a sense of parse speed, I ran a benchmark on a ~500-line config file (equivalent content in each format) using popular libraries in Python, Node.js, and Go. Each test parsed the file 10,000 times on an M2 MacBook Pro. These numbers are approximate and vary by library version, but the relative order is consistent.
| Language | JSON | YAML | TOML |
|---|---|---|---|
| Python | 0.8 ms/parse | 4.2 ms/parse | 1.9 ms/parse |
| Node.js | 0.3 ms/parse | 2.8 ms/parse | 1.1 ms/parse |
| Go | 0.12 ms/parse | 1.4 ms/parse | 0.6 ms/parse |
JSON wins every time. That's expected since JSON's grammar is tiny and most runtimes include a C-optimized parser. YAML is consistently the slowest because its spec has more rules to evaluate (anchors, tags, implicit typing). TOML sits in the middle. For config files loaded once at startup, these differences don't matter. For hot-path parsing (API responses, streaming data), JSON is the only sensible choice.
Where each format actually belongs
After using all three across different projects, here's my honest take on where each one works best.
JSON: machine-first communication
- APIs and data interchange: JSON is the universal format. Every language has a fast parser. Every API speaks it.
- Lock files and generated configs:
package-lock.json,tsconfig.json,composer.json. These are read by tools, not humans. No comments needed. - Databases and message queues: Anything stored in MongoDB, sent through Kafka, or cached in Redis speaks JSON natively.
JSON's biggest weakness is the lack of comments. If you find yourself wanting to annotate a JSON config, that's a signal you should probably switch formats. The JSON specification at json.org is intentionally minimal, and that's a feature.
YAML: complex, hierarchical configs
- Kubernetes manifests and Helm charts: The entire k8s ecosystem is built on YAML. Fighting this is pointless.
- CI/CD pipelines: GitHub Actions, GitLab CI, CircleCI all use YAML. Anchors help reduce repetition in large pipeline files.
- Ansible and infrastructure-as-code: When your config is 500+ lines with deep nesting, YAML's concise syntax helps.
YAML's biggest weakness is its implicit type coercion and indentation sensitivity. If your team has junior developers editing YAML by hand, expect bugs. The full YAML 1.2.2 spec is 80+ pages for a reason.
TOML: human-first app configuration
- Rust projects:
Cargo.tomlmade TOML mainstream. The Rust ecosystem standardized on it. - Python packaging:
pyproject.tomlreplacedsetup.cfgandsetup.py. PEP 518 and 621 codified this. - Flat-to-medium configs: Settings files, Hugo site config, Netlify config. TOML shines when nesting stays under 3 levels.
TOML's biggest weakness is deep nesting. Once you need arrays of tables inside tables inside tables, the syntax gets verbose and confusing. The TOML spec at toml.io is clean and readable, which reflects the format's design goal.
Decision table
When a new project needs a config format, I run through these questions:
| Question | Answer |
|---|---|
| Is the file mostly read by machines? | JSON |
| Does the ecosystem mandate a format? | Use that format (k8s = YAML, Cargo = TOML) |
| Do you need comments in the config? | YAML or TOML (not JSON) |
| Is nesting deeper than 3 levels? | YAML or JSON (not TOML) |
| Are non-developers editing the file? | TOML (safest syntax, fewest surprises) |
| Is the file under 100 lines? | Any format works; pick what the team knows |
My concrete recommendation
If I'm starting a greenfield project today and I get to choose, I pick TOML for app configuration. The reasoning: most app configs are flat or one level deep, developers need comments to explain non-obvious settings, and TOML's strict typing means NO is always a string and 8080 is always an integer. You don't get surprised at runtime.
For anything that talks to other services, JSON. For anything the k8s ecosystem touches, YAML (but with a linter like yamllint enforced in CI).
The worst choice is picking a format by habit and never questioning it. If your 2 AM incident happened because of implicit type coercion in YAML, maybe your next config file shouldn't be YAML.
Quick reference links
- JSON specification (json.org)
- YAML 1.2.2 specification
- TOML specification (toml.io)
- YAML multi-line strings guide for block scalar details
- Text Counter to check line counts and character lengths in your config files