Robust Software

Tales of a code samurai

Pragmatic Web Service Design

Web services are a crucial part of most solutions nowadays, I spend a significant portion of my time designing and writing them and I have read a lot about them to make them better, faster and more resilient each time. This is a summary of how I approach web service design and the things I bear in mind.

Protocols and content types

Unless you require extreme performance from your service then use the most compatible technologies available. Today that means HTTP, JSON and HTML forms. The lowest common denominator in any solution is usually Javascript in the browser. This shapes all your decisions about how to expose your service. HTTP, JSON and HTML forms are the easiest things to work with in Javascript and they are well supported in other languages. XML is an option but JSON is a more efficient transport medium and much easier to work with in Javascript.

Before you write a web service make sure to learn HTTP inside and out. It is a powerful protocol that solves many more problems than most people realise. I would recommend RESTful web services as a starting point, it demonstrates how to create a web service that is sympathetic to HTTP and there is a useful glossary in the back. This is not going to be another post about REST but if you know about it already there will be some familiar concepts.

Using the correct HTTP code is important, think of it as a well established domain-specific language for what the server thought of your request. It removes the need to duplicate something like a response code within the body of the response. A caveat on exploiting all that HTTP gives you is to avoid 300 errors as browsers will redirect the whole page even when it is a response to an AJAX request.

For sending and receiving data I recommend HTML form values in, JSON out. You may want to send requests in as JSON instead and that’s not a bad choice, I just find HTML forms easier. However, I would recommend using the same input and output content type for requests across your whole API whenever possible. It makes it easier to consume an API when you do not have to think what format a given method accepts and responds with.

Avoid supporting several content types, particularly for the first couple of releases. You will end up iterating over your service during initial development and having to maintain compatibility for several content types will be adding wasteful overhead at this point. Get it shipped with one pair of content types then look to fix your bugs, harden your API and learn from real usage patterns. Put this into the next version and repeat. Once things stabilise you can evaluate whether support for several content types is needed based upon real world requests without it being such a burden as your API goes through the churn of first contact with the real world.

Core design considerations

A lot of the considerations are identical to those of designing any API but there are some additional ones that are specific to web services as they involve transmission over a network.

Log everything

It should be obvious but you need to log every request that comes in. You may need to replicate an issue that is a result of several requests, without logging this will be difficult. You want to easily answer the question “what was the consumer doing at the time?”. Logs will also allow you to monitor usage, response times and other useful metrics. As your first attempt at an API is no more than an informed guess you need to be collecting metrics in order to make an educated decision about what to do next.

Comply with the expected behaviour of HTTP

There are several expected behaviours with HTTP such as GET not producing side effects and PUT and DELETE requests being idempotent. This all comes from knowing HTTP as previously encouraged. Being able to justify a design decision by referring to RFCs is awesome.

Less methods returning more data

The biggest bottleneck in communicating with a server is no longer the size of the data being communicated, it is in establishing connections. This is particularly true for internal services. The common bottleneck with web services is the number of concurrent requests they can serve, not the amount of data being transferred. You want to aim to have less methods but return more data from them, therefore reducing the number of requests consumers need to make. Having less methods also makes it easier for the consumer to make the right choice.

Imagine everything a consumer is likely to want to display as a result of making the request and give it to them. The only thing you need to be careful of is your internal implementation bleeding into your API. You do not want to expose yourself in public, especially early on. Following this advice makes it likely the consumer will get all the data they need in one request rather than making a separate request for each part. This reduces the surface area of your API which has several great benefits:

  • as a developer I have less methods to maintain
  • as a consumer I have a lower cognitive load, I might even be able to memorise your API
  • as an administrator less requests helps me with caching and usually means I can use less hardware, improving service and making scaling up cheaper

In terms of web service surface area and required requests, less is definitely more.

This may seem to contradict my recommendation of using JSON but choosing a more efficient protocol when there is no compelling reason not to is foolish.

Highlander principle

There should be one, and only one, way to do something. Not only does it save you effort, it makes things easier for the consumer. Sometimes this may mean that one action requires several requests but this is less confusing in the long run than creating a specific method for every action. It is generally acceptable that the solution is good enough, performing several simple steps is cognitively easier than wading through a sea of methods to find the one intended for your task.

Again I may seem to be contradicting myself as this goes againsts the idea of trying to reduce the number of connections required to do something and it does to some extent. However, when you consider most systems are at least 80% reads, that this situation usually applies to writes and that this should not be a regularly occurring problem if you have modelled the domain correctly then it should be a drop in the ocean of the general usage of the API.

Any method you add on a hunch as to what future use will be you will have to support for the lifetime of the API. It is better to wait until you have real statistics and use-cases to work from than to increase the surface area of your API speculatively. If an action requiring several requests becomes common practice then you can add a method to simplify it and you will be certain that it is adding value to your code base and consumers.

Give people URLs

Whenever possible provide URLs to the consumer, do not make them work them out. Every URL a consumer has to create is a support call waiting to happen. When someone has to create a URL it is likely your internals are bleeding into your API. Once someone else is creating a URL you can never change that URL and that restricts refactoring and scaling options. When you create URLs for your consumers you can rename them and point them towards different servers to name just two options that would be closed to you the second you are not in control of your URLs.

Version from the outset

There will be several versions of your web service. Think about identifying the schema of your data when it is returned, think about how you will host several versions of the API as you cannot just switch one on and the previous version off. It will happen and it will be difficult to lever in at a later date, make it a solved problem.

Example - Twitter’s feed

Here’s the response for a single tweet from my timeline:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
[ { ...,
    "entities" : { "hashtags" : [  ],
        "urls" : [  ],
        "user_mentions" : [ { "id" : 1102,
              "id_str" : "1102",
              "indices" : [ 3,
                  10
                ],
              "name" : "David Ulevitch",
              "screen_name" : "davidu"
            } ]
      },
    "favorited" : false,
    "geo" : null,
    "id" : 34612066580955136,
    "id_str" : "34612066580955136",
    "in_reply_to_screen_name" : null,
    "in_reply_to_status_id" : null,
    "in_reply_to_status_id_str" : null,
    "in_reply_to_user_id" : null,
    "in_reply_to_user_id_str" : null,
    "place" : null,
    "retweet_count" : 10,
    "retweeted" : false,
    "retweeted_status" : {
        ...,
        "id" : 34583191385935872,
        "id_str" : "34583191385935872",
        ...,
        "retweet_count" : 10,
        ...,
        "text" : "The last 5% of a project is always the worst half of a project. :-)",
        "truncated" : false,
        "user" : { "contributors_enabled" : false,
            "created_at" : "Sun Jul 16 02:30:23 +0000 2006",
            "description" : "Positively disruptive.  Started OpenDNS, ...",
            ...,
            "id" : 1102,
            "id_str" : "1102",
            ...,
            "profile_background_color" : "9BE5E9",
            "profile_background_image_url" : "http://a3.twimg.com/profile...jpg",
            ...
          }
      },
    "source" : "web",
    "text" : "RT @davidu: The last 5% of a project is always the worst half of a project. :-)",
    "truncated" : false,
    "user" : { "contributors_enabled" : false,
        "created_at" : "Thu May 29 20:25:50 +0000 2008",
        "description" : "Freelance software developer fond of Linux, ...",
        ...
      }
  } ]

As you can see this contains everything about the tweet, including which tweet was being retweeted, who they were, what their profile preferences are, where their profile image is, everything you could want really. As a consumer I don’t need to make at least two additional request to retrieve the details for the involed users which would lead to a N+1 load on the server.

The only thing that sticks out to me as possibly being bad is the id values, that smells of internals details leaking out. Instead they might use user_url and instead give a URL for the user’s full profile and so forth. There also doesn’t appear to be any mention of a version for the message but they may handle versioning by using different URLs for different versions of the API.

Things to bear in mind during implementation

Do not reinvent the wheel

For example, methods to achieve caching already exist which utilise the caching mechanisms built into HTTP itself. Use these when possible rather than reimplementing it yourself. Squid and Varnish are two open source software based solutions that are easy to set up. Learn about the wide array of established HTTP headers available, it is rare that you need to create a custom header.

Comprehensive documentation

Yes, even for internal use. If your API is not documented it will be hard to use and under utilised. Without documentation you will spend a lot of time explaining how you go about using your API when you could tell a HTML file once and have that explain it to everyone.

Use examples of performing a common task with your API on top of an example for each method. Ideally the reader will be able to run these examples as they read. Ask a designer to throw a template together for you, it doesn’t need to be amazing but it will make reading your documentation a more pleasurable experience.

No breaking changes, ever

Once you’ve published it and it is exposed to the world, you cannot change anything, ever. Not even fix bugs as people will have written workarounds for them. You can add methods or return more data in response but you must never alter what has been published before. People will be relying on it and you will fuck them up.

TL;DR

  • Writing an API is very hard
  • Really learn HTTP
  • Remove choices and complexity by using less content types and exposing less methods
  • Reduce the quantity of requests by returning more data in responses
  • Give URLs to the consumer, don’t let them create them
  • Documentation, documentation, documentation