JSON Schema’s Role in Building and Deploying Your API
What is JSON Schema? It provides a thorough way of describing the structure and properties of any JSON value. It is supremely useful in documenting requests made to and responses from any JSON API. This article will look at its role in the software development cycle of an API.
Documenting JSON Response formats
Perhaps the most obvious use case for defining a schema for your data is for documenting the structure of your API responses.
Let’s look at a simple response for a book API:
{ "title": "The Art of Lying", "pages": 412, "is_fiction": true, "status": "published", "updated_at": "2017-04-12T23:20:50.52Z" }
We can describe the structure of those responses using the following JSON Schema document:
{ "$schema": "http://json-schema.org/draft-04/schema#" "title": "Book", "description": "Describes a book in our online database", "type": "object", "properties": { "title": { "description": "The title of the book" "type": "string", "minLength": 1 }, "pages": { "type": "integer", "minimum": 1 }, "is_fiction": { "type": "boolean" }, "updated_at": { "type": "string", "format": "date-time" } }, "additionalProperties": false }
Consumers of our API will find the above a useful reference as to what fields are available and what types of data they store.
To make things even more official, we can even add a custom response header that includes a link to our schema document. Here is a PHP example of sending a custom header:
; rel="describedby"');
I highly recommend this guide for JSON Schema authors for more discussion and examples than are present on the regular JSON Schema website.
Documenting JSON Request Formats
Perhaps even more valuable than documenting the format of responses is documenting the format of requests. You might figure out through some trial and error what a response looks like, but it’s virtually impossible to guess what format might be required when posting data to an endpoint.
To make matters worse, there is not a standard place to put a link to the necessary schema. You could reference a schema in an error message, but we can see pretty quickly the need for an organizational approach that ties JSON Schemas to routes and request methods. This is why we have organizational tools for our API.
API Blueprint, RAML, and Open API Spec (formerly Swagger) are the most common tools for documenting your API. All of them offer support for JSON Schema definitions, albeit in differing degrees.
!Sign up for a free Codeship Account
A Developer’s Ideal Workflow
Ultimately what we as developers want is a simpler workflow. Ideally, we want a solution that solves the following problems:
- One source of truth (one place to update a definition). If we have JSON Schema documents for each type of data involved in our API and we can reference that schema from our API tool, then we will have accomplished a single source of truth for all use cases. Check!
- Rapid Prototyping. Using any coherent API tool will grant us the ability to prototype rapidly. Services like Apiary consume Swagger or API Blueprint files and can act as a mock API! As soon as you agree on the schema of a response, the front-end team can work on writing code that will format it and the back-end team can work on code that will fetch the raw data and return it to the client in the given format. Check!
- Generate Documentation. If we have a nicely structured document(s) for our API routes including JSON Schemas for the data structures in it, it is relatively easy to extract the titles, descriptions, and examples into readable documentation (yes, there are lots of tools that do this). Check!
- Validate payloads before you send them. JSON Schema validators have been written in virtually every language. You can use them client-side to validate a payload before it’s sent or server-side to validate the format before you execute business logic validation. Check!
- Test an API against its documentation. If we have a comprehensive tool that documents each route and method along with the JSON Schema of the response or payload, then it is not much of a stretch to imagine iterating over the defined routes and validating that they accept and/or return the objects that match the JSON Schema formats defined. Dredd is one NPM package that does this homework for us: it validates an API against its documentation (currently it supports Swagger and API Blueprint). Abao is a package that performs this task for RAML definitions. Check!
- Generate SDKs. Each of the big API tools supports the generation of SDK code to access the API. Check!
Given all these green lights, it would seem that we are living in a developer fantasy land where all is perfect! Well, we are “bloody close,” as Phil Sturgeon quipped in his excellent article on the topic, but we are not quite there.
I would like to posit that the most serious shortcomings in any API tool have to do with how thoroughly that tool implements the JSON Schema spec. This isn’t to say that everything is awesome and perfect so long as an API tool fully implements JSON Schema. However, following the JSON Schema spec avoids the most egregious problems — we can fix aesthetics far more easily than we can patch over architectural impossibilities.
Let’s review how our three main API tooling options help or hinder our ideal workflow.
API Blueprint shortcomings
Although this is a popular and well-supported tool that does let you reference JSON Schemas for indicating request or response formats, it does struggle with including multiple files. That poses a major problem when it comes to having a single source of truth (item 1 from the above list). What if two or more of your API endpoints return a response of the same schema? If we want a single source of truth, then all of the endpoints should reference the same file.
There are workarounds for this problem — others have documented methods for using multiple files in API Blueprint. JSON Schema already supports a powerful $ref
keyword which can act as an “include” statement, so it would be nice if API Blueprint could leverage this functionality.
RAML
RAML does support including multiple files via its own !includes
directive, so it can easily reference the same schema file from multiple places. It also fully supports the JSON Schema spec along with its powerful $ref
parameter, so schemas can reference sub-schemas or remote schemas without problems.
Most of the complaints I’ve seen about RAML are about the styling of its documentation generation (item 3 in the above list), or the fact that it is only represented in YAML instead of having a JSON option, both of which are rather superficial. The differences between its structure and Swagger’s are minimal.
The only thing that confuses me about RAML is why it isn’t more widely adopted. It does a good job of meeting the requirements of our ideal developer’s workflow.
Swagger and its limitations
For better or worse, Swagger has the hip factor going for it, so it seems to be where the winds are blowing. However, it does suffer from several inane limitations that (you guessed it) stem from its lack of support for the JSON Schema standard.
Right off the bat, if you have perfectly well-defined and 100-percent-valid JSON Schema documents that describe everything in your API, Swagger will not work. That’s right: Swagger 2.0 only supports some (but not all) of the JSON Schema keywords, and Open API 3.0 (the next iteration of Swagger) is not guaranteed to solve all the shortcomings. It is struggling to implement some of the JSON Schema features that have been around for years, let alone the newer ones that are scheduled for imminent release.
Since JSON Schema has been around for a long time, it might be fair to say that Swagger is not doing a great job of respecting its elders. The Swagger spec (and its Open API replacement) have been fully described by a JSON Schema document, so it seems counterproductive to reinvent a wheel when it demonstrably doesn’t roll as well. Why, for example, must we rely on a temperamental online editor to validate our schemas?
Let’s set out some warning flags so you are aware of some of the landmines. For example, Swagger will not allow your schemas to declare which version of JSON Schema they were written in. This is done using the $schema
keyword, but Swagger does not support it.
Another painful shortcoming is that Swagger does not yet support the concept of nullable fields. In JSON Schema parlance, a field can be defined as an array of types, eg {"type": ["string", "null"]}
, to indicate a nullable string. If Swagger encounters this perfectly valid JSON Schema convention, it will choke. Not good!
JSON Schema’s patternProperties
keyword is extremely useful for mapping a value to a schema via a regular expression, but you guessed it, Swagger doesn’t support it.
Another shortcoming is its lack of support for a schema’s “id” (or “$id”) property. In JSON Schema parlance, a schema’s id acts somewhat like a namespace, so that referenced schemas can be properly understood as sub-schemas under the parent umbrella or as independent definitions. So if your referenced schema uses $ref
to point to another schema, beware! Swagger may not approve. This can make it extremely difficult to spread your definitions out across multiple files. Swagger would seem to prefer that all reusable definitions get stored in the root document’s definitions
object, but that is hardly practical when working with a multi-file setup.
One of the most challenging definitions involves “circular references,” where an instance of one data type may contain children properties which are instances of the same data type. To be fair, this is tricky regardless of which API tool you use, but it becomes especially difficult when the tool has somewhat haphazard support of the JSON Schema functionality behind it.
At the end of the day, you can make Swagger work for you, but you have to operate within its limitations. At a minimum, that means defacing your JSON Schema documents and sometimes that means going as far as arbitrarily imposing limitations that may not accurately describe your API. We risk running afoul of items 3, 4, and 5 from our sacred list.
Maintaining Your API Definitions Over Time
Regardless of which API tool you are using to develop your API, eventually you will need to maintain it. This task is usually easier when your definitions are spread out across multiple files. RAML supports this easily and Swagger can do it with some caveats.
See this article about splitting a Swagger definition into multiple files. During my explorations, I authored a Github repo with some multi-file Swagger examples that you may find as a useful reference.
Testing Your API
As long as your API tool defines your application’s routes and methods, it’s a simple matter of iterating over them and hitting those endpoints to verify that they do what they say they do. As mentioned previously, Dredd and Abao are two NPM packages that perform this tedious task. The main goal is to easily verify that your API is doing what you expect it to and it’s also a great place to start if you’re working with test-driven development (TDD or BDD).
Summary
I could spend some time pondering about the likelihood of RAML or API Blueprint’s demise since Swagger seems so popular, but really, none of the solutions fully accomplish what I want as a developer (yet).
I think that we are on the cusp of achieving it, but only if and when one of the tools fully implements the already feature-rich JSON Schema standard will we really have the freedom we seek. I think the Open API standard is headed in that direction, and as long as one of these tools arrives at that destination, I’ll be happy to use it in my next API.
Published on Java Code Geeks with permission by Everett Griffiths, partner at our JCG program. See the original article here: JSON Schema’s Role in Building and Deploying Your API Opinions expressed by Java Code Geeks contributors are their own. |