JSON (JavaScript Object Notation) is a standardized format for representing structured data. Although JSON grew out of the JavaScript programming language, it’s now an ubiquitous method of data exchange between systems. Most modern-day APIs accept JSON requests and issue JSON responses so it’s useful to have a good working knowledge of the format and its features.
In this article, we’ll explain what JSON is, how it expresses different data types, and the ways you can produce and consume it in popular programming languages. We’ll also cover some of JSON’s limitations and the alternatives that have emerged.
JSON Basics
JSON was originally devised by Douglas Crockford as a stateless format for communicating data between browsers and servers. Back in the early 2000s, websites were beginning to asynchronously fetch extra data from their server, after the initial page load. As a text-based format derived from JavaScript, JSON made it simpler to fetch and consume data within these applications. The specification was eventually standardized as ECMA-404 in 2013.
JSON is always transmitted as a string. These strings can be decoded into a range of basic data types, including numbers, booleans, arrays, and objects. This means object hierarchies and relationships can be preserved during transmission, then reassembled on the receiving end in a way that’s appropriate for the programming environment.
A Basic JSON Example
This is a JSON representation of a blog post:
{ "id": 1001, "title": "What is JSON?", "author": { "id": 1, "name": "James Walker" }, "tags": ["api", "json", "programming"], "published": false, "publishedTimestamp": null }
This example demonstrates all the JSON data types. It also illustrates the concision of JSON-formatted data, one of the characteristics that’s made it so appealing for use in APIs. In addition, JSON is relatively easy to read as-is, unlike more verbose formats such as XML.
JSON Data Types
Six types of data can be natively represented in JSON:
- Strings – Strings are written between double quotation marks; characters may be escaped using backslashes.
- Numbers – Numbers are written as digits without quotation marks. You can include a fractional component to denote a float. Most JSON parsing implementations assume an integer when there’s no decimal point present.
- Booleans – The literal values
true
andfalse
are supported. - Null – The
null
literal value can be used to signify an empty or omitted value. - Arrays – An array is a simple list denoted by square brackets. Each element in the list is separated by a comma. Arrays can contain any number of items and they can use all the supported data types.
- Objects – Objects are created by curly brackets. They’re a collection of key-value pairs where the keys are strings, wrapped in double quotation marks. Each key has a value that can take any of the available data types. You can nest objects to create cascading hierarchies. A comma must follow each value, signifying the end of that key-value pair.
JSON parsers automatically convert these data types into structures appropriate to their language. You don’t need to manually cast id
to an integer, for example. Parsing the entire JSON string is sufficient to map values back to their original data format.
Semantics and Validation
JSON has certain rules that need to be respected when you encode your data. Strings that don’t adhere to the syntax won’t be parseable by consumers.
It’s particularly important to pay attention to quotation marks around strings and object keys. You must also ensure a comma’s used after each entry in an object or array. JSON doesn’t allow a trailing comma after the last entry though – unintentionally including one is a common cause of validation errors. Most text editors will highlight syntax problems for you, helping to uncover mistakes.
Despite these common trip points, JSON is one of the easiest data formats to write by hand. Most people find the syntax quick and convenient once they gain familiarity with it. Overall JSON tends to be less error-prone than XML, where mismatched opening and closing tags, invalid schema declarations, and character encoding problems often cause issues.
Designating JSON Content
The .json
extension is normally used when JSON is saved to a file. JSON content has the standardized MIME type application/json
, although text/json
is sometimes used for compatibility reasons. Nowadays you should rely on application/json
for Accept
and Content-Type
HTTP headers.
Most APIs that use JSON will encapsulate everything in a top-level object:
This isn’t required though – a literal type is valid as the top-level node in a file, so the following examples are all valid JSON too:
They’ll decode to their respective scalars in your programming language.
Working With JSON
Most programming languages have built-in JSON support. Here’s how to interact with JSON data in a few popular environments.
JavaScript
In JavaScript the JSON.stringify()
and JSON.parse()
methods are used to encode and decode JSON strings:
const post = { id: 1001, title: "What Is JSON?", author: { id: 1, name: "James Walker" } }; const encodedJson = JSON.stringify(post); // {"id": 1001, "title": "What Is JSON?", ...} console.log(encodedJson); const decodedJson = JSON.parse(encodedJson); // James Walker console.log(decodedJson.author.name);
PHP
The equivalent functions in PHP are json_encode()
and json_decode()
:
$post = [ "id" => 1001, "title" => "What Is JSON?", "author" => [ "id" => 1, "name" => "James Walker" ] ]; $encodedJson = json_encode($post); // {"id": 1001, "title": "What Is JSON?", ...} echo $encodedJson; $decodedJson = json_decode($encodedJson, true); // James Walker echo $decodedJson["author"]["name"];
Python
Python provides json.dumps()
and json.loads()
to serialize and deserialize respectively:
import json post = { "id": 1001, "title": "What Is JSON?", "author": { "id": 1, "name": "James Walker" } } encodedJson = json.dumps(post) # {"id": 1001, "title": "What Is JSON?", ...} print(encodedJson) decodedJson = json.loads(encodedJson) # James Walker print(decodedJson["author"]["name"])
Ruby
Ruby offers JSON.generate
and JSON.parse
:
require "json" post = { "id" => 1001, "title" => "What Is JSON?", "author" => { "id" => 1, "name" => "James Walker" } } encodedJson = JSON.generate(post) # {"id": 1001, "title": "What Is JSON?", ...} puts encodedJson decodedJson = JSON.parse(encodedJson) # James Walker puts decodedJson["author"]["name"]
JSON Limitations
JSON is a lightweight format that’s focused on conveying the values within your data structure. This makes it quick to parse and easy to work with but means there are drawbacks that can cause frustration. Here are some of the biggest problems.
JSON data can’t include comments. The lack of annotations reduces clarity and forces you to put documentation elsewhere. This can make JSON unsuitable for situations such as config files, where modifications are infrequent and the purposes of fields could be unclear.
No Schemas
JSON doesn’t let you define a schema for your data. There’s no way to enforce that id
is a required integer field, for example. This can lead to unintentionally malformed data structures.
No References
Fields can’t reference other values in the data structure. This often causes repetition that increases filesize. Returning to the blog post example from earlier, you could have a list of blog posts as follows:
{ "posts": [ { "id": 1001, "title": "What is JSON?", "author": { "id": 1, "name": "James Walker" } }, { "id": 1002, "title": "What is SaaS?", "author": { "id": 1, "name": "James Walker" } } ] }
Both posts have the same author but the information associated with that object has had to be duplicated. In an ideal world, JSON parser implementations would be able to produce the structure shown above from input similar to the following:
{ "posts": [ { "id": 1001, "title": "What is JSON?", "author": "{{ .authors.james }}" }, { "id": 1002, "title": "What is SaaS?", "author": "{{ .authors.james }}" } ], "authors": { "james": { "id": 1, "name": "James Walker" } } }
This is not currently possible with standard JSON.
No Advanced Data Types
The six supported data types omit many common kinds of value. JSON can’t natively store dates, times, or geolocation points, so you need to decide on your own format for this information.
This causes inconvenient discrepancies and edge cases. If your application handles timestamps as strings, like 2022-07-01T12:00:00+00:00
, but an external API presents time as seconds past the Unix epoch – 1657287000
– you’ll need to remember when to use each of the formats.
JSON Alternatives
YAML is the leading JSON alternative. It’s a superset of the format that has a more human-readable presentation, custom data types, and support for references. It’s intended to address most of the usability challenges associated with JSON.
YAML has seen wide adoption for config files and within DevOps, IaC, and observability tools. It’s less frequently used as a data exchange format for APIs. YAML’s relative complexity means it’s less approachable to newcomers. Small syntax errors can cause confusing parsing failures.
Protocol buffers (protobufs) are another emerging JSON contender designed to serialize structured data. Protobufs have data type declarations, required fields, and support for most major programming languages. The system is gaining popularity as a more efficient way of transmitting data over networks.
Summary
JSON is a text-based data representation format that can encode six different data types. JSON has become a staple of the software development ecosystem; it’s supported by all major programming languages and has become the default choice for most REST APIs developed over the past couple of decade.
While JSON’s simplicity is part of its popularity, it also imposes limitations on what you can achieve with the format. The lack of support for schemas, comments, object references, and custom data types means some applications will find they outgrow what’s possible with JSON. Younger alternatives such as YAML and Protobuf have helped to address these challenges, while XML remains a contender for applications that want to define a data schema and don’t mind the verbosity.