YAML Formatter & Validator
Format, validate, and convert YAML files. Perfect for Kubernetes, Docker Compose, and configuration files.
📄YAML Formatter & Validator
YAML Quick Reference
key: valueKey-value pair- itemList item# commentCommentkey: |Multi-line literalkey: >Multi-line folded&anchorAnchor referenceWhat is YAML?
YAML (YAML Ain't Markup Language) is a human-readable data serialization format commonly used for configuration files and data exchange between languages with different data structures.
YAML is often preferred over JSON for configuration files due to its cleaner syntax, support for comments, and better readability for complex nested structures.
Common Use Cases
- •Kubernetes: Deployments, Services, ConfigMaps
- •Docker Compose: Multi-container applications
- •CI/CD: GitHub Actions, GitLab CI, CircleCI
- •Configuration: Application settings, API specs
- •Infrastructure: Ansible, Terraform, CloudFormation
YAML Best Practices
Do's
- ✅ Use spaces for indentation (2 or 4)
- ✅ Be consistent with indentation
- ✅ Use comments to explain complex sections
- ✅ Quote strings with special characters
- ✅ Use anchors to avoid repetition
Don'ts
- ❌ Never use tabs for indentation
- ❌ Avoid deeply nested structures
- ❌ Don't mix indentation styles
- ❌ Avoid very long lines
- ❌ Don't forget to escape special chars
YAML vs JSON vs XML
| Feature | YAML | JSON | XML |
|---|---|---|---|
| Readability | Excellent | Good | Fair |
| Comments | ✅ Supported | ❌ Not supported | ✅ Supported |
| Multi-line strings | ✅ Native | ⚠️ Escaped | ✅ CDATA |
| Schema validation | ✅ Available | ✅ JSON Schema | ✅ XSD |
| Parse speed | Slower | Fastest | Slow |
How YAML Works
YAML Syntax Fundamentals
YAML (YAML Ain't Markup Language) uses indentation-based structure similar to Python. Key-value pairs are written as key: value, with colons separating keys from values. Indentation (spaces, never tabs) indicates nesting levels—child elements are indented under parents. This whitespace-significant syntax makes YAML highly readable but sensitive to formatting errors. Two or four spaces per indentation level are standard; consistency is critical.
Sequences (arrays) use dash-space syntax: - item creates list entries. Lists can be nested by indenting items under a parent key. Inline list syntax [item1, item2, item3] is also supported but less common in configuration files. Objects (mappings) use key: value pairs, and can be nested arbitrarily. The combination of lists and mappings creates hierarchical data structures matching JSON's expressiveness.
Scalars (simple values) include strings, numbers, booleans, and null. Strings don't require quotes unless they contain special characters, start with special YAML symbols, or could be confused with other types. Numbers are automatically typed (integers, floats). Booleans are true/false, yes/no, on/off. Null values are ~, null, or simply omitting the value. This implicit typing makes YAML concise but sometimes ambiguous.
Multi-line strings have two forms: literal (|) preserves newlines and formatting exactly as written, useful for scripts or formatted text. Folded (>) joins lines into a single paragraph, replacing newlines with spaces except for blank lines which become paragraph breaks. These operators make YAML superior to JSON for embedded text content, eliminating the need for \n escapes.
Comments use # and extend to end-of-line. Unlike JSON, YAML allows comments anywhere, making it excellent for configuration files that need documentation. Comments can explain settings, provide examples, or temporarily disable options. This feature alone makes YAML preferable to JSON for human-edited configs despite YAML's parsing complexity and security concerns.
Parsing and Type Inference
YAML parsers use multi-pass processing: lexical analysis (tokenizing), syntactic analysis (building parse tree), and semantic analysis (type inference, anchor resolution). The parser must track indentation levels carefully—each indentation change affects structure. Unlike JSON's explicit delimiters (, []), YAML relies on whitespace, making parsing more complex and error-prone.
Type inference determines data types from string representations. The string 42 becomes an integer, 3.14 becomes a float, true becomes a boolean. However, this causes subtle issues: NO (Norwegian for yes) becomes boolean false. ZIP codes like 012345 lose leading zeros when parsed as numbers. Dates in ISO 8601 format (2024-01-15) are automatically parsed as date objects in some parsers.
Explicit typing uses tags to override inference. !!str forces string type: !!str 123 remains the string "123" rather than becoming integer 123. Tags like !!int, !!float, !!bool, !!null, !!binary, !!timestamp specify types explicitly. Custom tags enable application-specific types. While powerful, tags reduce readability and are rarely needed for standard configuration files.
Anchors and aliases enable references to avoid repetition. &name creates an anchor, *name references it. Example: defaults: &defaults name: app declares an anchor, then prod: <<: *defaults extends it. The merge key << combines mappings, useful for sharing common configuration across environments. This DRY (Don't Repeat Yourself) feature is powerful but can make configs harder to understand.
Safe loading versus full loading affects security and features. Safe loading (recommended) disables dangerous features like arbitrary code execution through Python object serialization (!!python/object). Full loading enables all YAML features but can execute code if parsing untrusted input—a major security vulnerability. Always use safe loaders for user-provided YAML; full loading is only safe with trusted sources you control.
Common Use Cases in DevOps
Kubernetes manifests define cluster resources in YAML. Deployments, Services, ConfigMaps, and Secrets use YAML configuration. The hierarchical structure maps well to Kubernetes' resource model: apiVersion, kind, metadata, and spec fields organize configuration logically. Multi-document YAML files (documents separated by ---) allow defining multiple resources in one file, simplifying deployment of related components together.
Docker Compose uses YAML to define multi-container applications. Services, networks, and volumes are specified declaratively. Version-specific syntax (version: '3.8') determines available features. Environment variables, port mappings, volume mounts, and dependencies between services create complete application stacks. YAML's readability makes docker-compose.yml files accessible to developers unfamiliar with Docker's CLI complexity.
CI/CD pipelines (GitHub Actions, GitLab CI, CircleCI) use YAML to define workflows. Jobs, steps, triggers, and dependencies create automation pipelines. YAML's structure naturally represents pipeline stages and parallel execution. Comments document workflow purposes, and anchors/aliases reduce duplication across similar jobs. The format's human-friendliness encourages teams to maintain CI/CD configs alongside code.
Infrastructure as Code tools (Ansible, Terraform, CloudFormation) leverage YAML for resource definitions. Ansible playbooks describe server configuration and orchestration. CloudFormation templates define AWS resources. YAML's readability makes infrastructure definitions accessible to operators without programming backgrounds. The self-documenting nature (with comments) creates executable documentation of infrastructure state.
Application configuration files benefit from YAML's features. Rails, Spring Boot, and many frameworks use YAML for settings. Environment-specific configs (development, staging, production) share common values via anchors. Sensitive values can be externalized through environment variable interpolation. Multi-line strings simplify embedded SQL or scripts. Comments explain non-obvious settings, creating maintainable configuration.
Common Pitfalls and Best Practices
Indentation errors are the most common YAML mistake. Mixing spaces and tabs causes parsing failures—tabs are forbidden in YAML. Inconsistent indentation (sometimes 2 spaces, sometimes 4) confuses parsers. Use an editor with YAML syntax highlighting and validation. Configure your editor to show whitespace characters and convert tabs to spaces. Linters catch indentation problems before runtime failures.
Quoting strings prevents type inference issues. The value no becomes boolean false unless quoted as "no". Country codes like NO, SE, ON can trigger boolean conversion. Numeric-looking strings (version numbers like 1.20) become floats, losing precision. Phone numbers with leading zeros (012345) lose zeros. When in doubt, quote strings explicitly. This is safer than relying on implicit typing.
Security considerations demand safe parsing of untrusted input. Never use full YAML loading (yaml.load() in Python) with user-provided data—it can execute arbitrary code. Use safe loaders (yaml.safe_load()) that disable dangerous features. Validate loaded data against schemas. Limit file size to prevent denial-of-service through deeply nested structures or billion laughs attacks (exponential entity expansion via anchors).
Validation with schemas catches errors early. Tools like yamllint verify syntax. Application-specific validation checks required fields, value ranges, and relationships. JSON Schema can validate YAML after conversion to JSON. Kubernetes uses OpenAPI schemas to validate manifests. Schema validation in CI/CD prevents deploying broken configurations. Document expected schemas to guide users creating configs.
Version control best practices improve collaboration. Keep YAML files in git for history and code review. Use meaningful commit messages when changing configs. Format YAML consistently (linters enforce this). Split large files into logical components. Document breaking changes in migration guides. Tag releases so rollbacks reference specific config versions. Treat infrastructure configs with the same rigor as application code.
Converting Between YAML and JSON
YAML to JSON conversion is straightforward since YAML is a superset of JSON. Parse YAML to an object structure, serialize as JSON. Comments are lost (JSON doesn't support them). Anchors and aliases are resolved into duplicated data. Multi-line strings become escaped strings with \n. Custom tags may not convert properly—stick to standard types for portability. The output is valid JSON but loses YAML's readability features.
JSON to YAML conversion creates more readable configs. JSON's braces and quotes are removed, indentation indicates structure. The result is valid YAML and often more compact. However, automatic conversion produces basic YAML without comments, anchors, or multi-line strings. Manual editing adds these features post-conversion. Use conversion as a starting point, then enhance with YAML-specific features.
Round-trip conversion rarely produces identical output. YAML to JSON to YAML loses comments, anchors, and may change scalar representations (quoted vs unquoted strings). Different parsers may format output differently. For version control, establish canonical formatting (using a formatter like prettier with YAML plugin) so round-trips don't create spurious diffs.
Tools for conversion include command-line utilities (yq, jq), programming libraries (PyYAML, js-yaml), and online converters. The yq tool processes YAML like jq processes JSON, enabling transformations beyond simple conversion. Libraries in every major language support YAML, though implementations vary in features and safety. Choose tools actively maintained and security-conscious, especially for production use.
YAML 1.2 and Compatibility
YAML 1.2 (2009) is the current specification, but many tools still use 1.1 (2005). Key differences: 1.2 removes octal number syntax (0123 is string in 1.2, octal in 1.1), changes boolean values (true/false only in 1.2, yes/no/on/off allowed in 1.1), and uses JSON as a subset. Incompatibilities cause subtle bugs when mixing tools using different versions.
Parser implementation quality varies widely. PyYAML (Python) defaults to 1.1, ruamel.yaml supports 1.2. JavaScript's js-yaml supports 1.2. Go's gopkg.in/yaml.v3 implements 1.2. Some parsers have bugs with edge cases (deeply nested structures, unusual quoting). Test your YAML with the actual parser your application uses, not just online validators that might use different parsers.
Strictness settings affect what's accepted. Some parsers allow duplicate keys (last wins), others error. Some permit tabs despite the spec forbidding them. Strict mode enables maximum compliance checking, catching issues like duplicate keys, invalid indentation, or deprecated features. Use strict parsing during development to catch problems early, before they cause production failures.
Future of YAML faces criticism for complexity and security issues. JSON5 and TOML offer alternatives with similar readability but simpler parsing and better safety. However, YAML's ecosystem and momentum (Kubernetes, Docker, CI/CD platforms) ensure continued relevance. Recommendations: use YAML for established ecosystems, consider alternatives for new projects, always parse safely, validate inputs, and keep YAML simple (avoid advanced features like anchors in public-facing configs).
FAQ
Why does my YAML file fail to parse?
Common issues: mixing tabs and spaces (YAML forbids tabs), inconsistent indentation, missing colons after keys, unquoted strings that look like booleans (no, yes, on, off) or numbers with special formats. Use a YAML linter to catch syntax errors. Enable "show whitespace" in your editor to see tabs vs spaces. Validate with the same parser your application uses—different parsers have subtle differences.
Should I use YAML or JSON for configuration?
Use YAML for human-edited configurations where readability and comments matter (DevOps tools, application settings). Use JSON for machine-generated configs, API data exchange, or when parsing performance is critical. YAML is more error-prone (indentation sensitivity, type inference ambiguity) but more maintainable for complex configs. JSON is faster, simpler, and safer but verbose and lacks comments.
How do I prevent leading zeros from being removed?
Quote strings that should preserve leading zeros: zip_code: "02134" instead of zip_code: 02134. Unquoted, it becomes an octal (YAML 1.1) or integer (YAML 1.2), losing leading zeros. Same for phone numbers, product codes, and version numbers. When in doubt, quote it—explicit string typing prevents type inference issues. Use schema validation to enforce string types for fields prone to this problem.
Is YAML safe for parsing untrusted input?
Only with safe loading! Full YAML loading (like Python's yaml.load()) can execute arbitrary code through object serialization, enabling remote code execution attacks. Always use safe loaders (yaml.safe_load() in Python, safeLoad() in JavaScript). Validate and sanitize loaded data. Limit file size to prevent DoS attacks. For untrusted input, consider JSON instead—it's simpler and safer.