RESTful APIs from Scratch: Lessons Learnt (so far)

sufw · ‎06-03-2012

About nine months ago, we started to investigate the implementation of Financial Supply Chain Management in our ECC system in order to streamline our Accounts Receivables process. One of several changes we had in mind was the provision of account self-service capabilities for the several tens of thousands of business customers who maintained a charge account with the company.

We initially looked to customizing SAP’s standard Biller Direct solution, but soon discovered that a significant re-write of the standard solution would be needed to meet our branding and accessibility requirements. We thus proceeded with “Plan B” – to build from scratch a modern, Java-based web application which would be both visually appealing and performant enough to handle thousands of concurrent sessions. We also decided that this application should not have its own data store but rely exclusively on ECC for all of its data.

After some deliberation, we decided to apply REST principles to the design of the API used by this web application to interact with ECC. Since a user’s interaction with the application would consist of a number of naturally connected tasks, REST’s reliance on hyperlinks (“HATEOAS”) was of great benefit as it eliminated the need for the Java web app to have a hard-coded sequence of SOAP operations to invoke. So the hypertext in the REST API would roughly match a user’s progression from a view of open invoices, to making a payment against some line items, followed by raising disputes, modifying address details, etc.

After piloting an early and feature-limited version of the application with a few customers, we are now only a couple of weeks from launching the final version and beginning to transition larger numbers of customers across. With development all done, and testing almost complete, now is probably as good a time as any to reflect on the many things I have learned during the process of designing the REST API connecting the Java web application to the ECC backend.

Mistakes are after all the best way to truly learn, and I have learnt a tremendous amount. Any mistakes are also all mine – the team who built the API has been fantastic in their attitude and execution from the beginning, and have been crucial to the success of this approach. If it had been left to me alone, we would now have an API implemented on a whiteboard and a couple of text files in Notepad!

So, here’s my list (so far), and in no particular order. I hope to be able to add many more points in the future, as this would mean I will have learnt more! :smile:

Solve Security Early!

Security is about much more than authenticating the party making the HTTP request. In most cases, you will want to authenticate both the application interacting with your API, as well as the actual human user, and those are very different things. A number of options exist which may be suitable, including three-legged OAuth, but you probably don’t want to implement OAuth 2.0 from scratch. API Management tools (e.g. Apigee, Layer7, Vordel) can help tremendously, but not thinking about this up-front can cause a lot of rework in later iterations. Which leads me to my next point:

Make Some Design Decisions Upfront

These days, no hip developer or architect wants to be accused of doing BDUF (Big Design Up-Front, generally considered to be A Bad Thing). However, postponing some decisions with potentially far-reaching consequences can cause technical or design debt to amassed, and then hit you in a later iteration when you don’t expect it – like the security example above.

We initially implemented a hack using a custom HTTP header in an early iteration, but then didn't have the time to fix this. As a result, this header has to be present (and contain appropriate values) in order for the API to provide any success responses. Developers using the API should of course read the documentation which says as much, but that’s out-of band information which is also generally A Bad Thing in REST.

Read The Fine Standards

Read any applicable standards properly. I skimmed some stuff, and as a result designed an API which violates a recommendation in RFC5988 to use URLs for custom link rel names.

So instead of:

we now have

It seems like a small thing, but “foobar” is not part of the IANA link relations registry (yet?), so API clients won’t know what to do with it. If I had used an URL, then at least a client developer can punch that into a browser and hope to get some documentation. Another example of how to reduce out-of-band information and have a self-documenting API! It also still bugs me every time I look at it (ain’t hindsight great!), and would be very high on the list of changes for v2 of the API.

API Versioning

While not properly puritanical*, version IDs are useful to avoid postpone refactoring and rework caused by non-backwards compatible changes. The Apigee Best Practices guide recommends something simple like “v1” at the root of the API’s URL space, which to me seems pleasant enough.

* URLs identify resources. Resources are conceptual – they don’t have versions in the same way that the concept of a dog doesn’t have a version. The representation of a dog however can have many different versions. For example, one version of an image/jpg representation of a dog could be a dusty scan of an old photo, while another version could be an improved high-res digital image.

Lay the Foundations for Caching

Most importantly for the foundation is some “version” indicator. A timestamp on the server should be hashed into a UUID. Raw DateTime stamps are better than nothing, but UUIDs are better as they are less ambiguous.

(UUIDs look like an opaque string, so nobody’s going to be tempted to parse them. Version IDs which happen to look like a dateTime are still version IDs and parsing them as dates will just end in tears for someone at some point. The same goes for integers – at some point someone is going to try to increment them in the client and then wonder why things break).

If you don’t have a version ID, it’s probably best to somehow add one into the backend system’s data model if you think you might ever want to cache or edit those representations.

Caching Properly is Hard!

It is remotely possible that in a moment of weakness and youthful ignorance, I might once have implied that caching was free. Well, I exaggerated for presentation impact, but TINSTAAFL still applies. The tools and infrastructure might be open-source, mature, rock-solid and widely implemented, but it costs time to properly design and build a cacheable API.

Once you have version IDs, you’ll want to use them in the ETag HTTP header. An ETag is HTTP’s way of expressing version IDs in a HTTP header regardless of what format the payload is in, and they should be unique for a domain, so having decided on UUIDs earlier is starting to pay off already 😉

HTTP supports a whole raft of conditional requests to optimize the performance of GET requests and maximize the cache hit ratios. Without this, the internet would probably not have been able to be scaled to what we have today!

This is what a conditional GET could look like:

>>> GET order/12345 HTTP/1.1

>>> If-None-Match: 82b6bc44-0914-c012-97b9-9b4bca641a5d

<<< HTTP/1.1 304 Not Modified

(hooray, we’ve saved some bandwidth and hopefully also response time!)

or alternatively:

<<< HTTP/1.1 200 OK

<<< Etag: 550e8400-e29b-41d4-a716-446655440000

<<< … insert payload here

(i.e. the server’s representation of that order has changed, here’s new one and its ETag)

To further complicate things, there are other conditional headers which can be combined for tricky stuff. “Vary” for example, but I won’t go into details here or this blog will never end.

Luckily we don’t have to solve all of this up front. With HTTP, the server is in charge of the interaction with the client, so if it gets a conditional request which it doesn’t understand, it has options like the following:

<<< HTTP/1.1 412 Precondition Failed

<<< Content-Type: text/plain

<<< Sorry, this has not been implemented yet.

Be Careful when Editing!

Of course version IDs and conditional requests are not restricted to GET operations either. If your API supports editing of any kind, then I would consider it essential to use conditional PUT requests to push updates onto the server. The ETag the client got from its initial GET would be used in an “If-Match” header just like in the example above. This solves two problems at once:

Changing outdated representations

Thanks to Murphy’s Law, it’s often safe to assume that at some point, a client will try to perform an update using a stale representation retrieved from the cache. Of course the client has no way of knowing whether it was served directly from the backend or any intermediate caches such as the ICM cache, reverse proxies and the like. Such scenarios could result in inconsistent data if the server does not have version IDs available to compare between the incoming request and the version in its persistent store. Conditional requests using the version ID held by the client as a requisite condition, are an effective way of preventing these problems.

Preventing Race Conditions

This is really just a variation on the first problem, in that the second edit PUT request is being performed against an outdated version from the server’s point of view. Since it’s basically certain that a race condition will occur at some point in an application used by more than one person, this is a great side-effect of having thought about version IDs up-front even if the initial version of the API only supports data retrieval through GET.

Don’t Partially Update

Tempting as it might be to increase efficiency, reduce network traffic and the like by performing partial updates, don’t do it! It’s just a bad idea. Apart from violating the idempotency constraint of the HTTP PUT verb, it creates far more problems than it solves: dealing with concurrent access becomes difficult and messy; processing requests in sequence becomes critical; we need code to handle duplicate requests, and that’s before we even started to consider how to design JSON or XML structures which allow us to express the difference between “null” and “delete” – is a data element in the request absent because it’s meant to be deleted, or because it’s been omitted for brevity? In a word – messy. :sad:

Versions, Versions, Versions

Let me repeat because I think this is really quite important: REST APIs are nice partly because they scale much better than SOAP ever could. The web is essentially a subset of REST with humans as API consumers, so it’s fair to say that REST APIs can be truly web-scale. Having Version IDs on representations enables patterns such as optimistic locking and effective caching, and thus minimize the risk of inconsistent data while also exhibiting improved performance.

Synthetic Resources

Don't be afraid to create 'synthetic' resources to represent temporary collections of otherwise independent resources, and perform operations on all of them together. This is the first step beyond seeing REST as simply performing CRUD operations on database rows or business objects, and taking an outside-in, resource-oriented view of the problem domain. And this is A Good Thing.

For example, one way of requesting many documents (each identified by its own URL of course) as a single ZIP file is to POST the document URLs to a handler resource. The server then creates a synthetic collection resource and returns a URL to it as a pointer. The client can poll that URL for its result, like this:

>>> POST http://api.acme.com/v1/documentCollection HTTP/1.1

>>> Content-Type: application/json

>>> Accept: application/zip

>>> [“http://acme.com/docs/q8h2”, “http://acme.com/docs/sdo8y”, “http://acme.com/docs/ok23a”, “http://acme.com/docs/1kma6”]

<<< HTTP/1.1 201 Created

<<< Location: http://api.acme.com/collections/d0f84d8a

(the client can now poll this location for the result…)

>>> GET http://api.acme.com/collections/d0f84d8a HTTP/1.1

<<< HTTP/1.1 204 No Content

…and when the server is done generating the content:

<<< HTTP/1.1 200 OK

<<< Content-Type: application/zip

<<< Etag: f1dbe2f4-830d-49c6-a4a5-9891230e8182

<<< ...insert payload here

This is just a simple example of this pattern; there are of course other implementation options, including some which make use of existing standard formats such as OpenSearch or HTML forms.

One downside of this pattern is the increased chattiness and thus increased network traffic and latencies. It also prioritises the server's need for workload management over the client's need for timeliness – if creating the collection is computationally expensive, such as rendering PDF documents, then the server can take its time doing this and is not restricted by the client’s HTTP timeout in a simple synchronous exchange. It’s of course no silver bullet, but a design choice which can be appropriate for some cases.

Consider the Alternative Dispatcher Layer

Frameworks can greatly simplify things when building RESTful APIs from. SAP’s own NetWeaver Gateway is such a framework, as is the Alternative Dispatcher Layer (aka ADL) managed by dj.adams on the SCN CodeExchange. Frameworks differ in their scope and complexity, and need to be chosen carefully lest they intrude too much into the design, forcing developers to ‘fight’ them. Gateway is arguably one which places quite a lot of restrictions and limitations onto the implementation team, whereas the ADL is designed to be light-weight and flexible. Conversely, it requires developers to write more code than a more fully-featured framework like Gateway. Again, no silver bullet but a design choice.

We did consider an early version of Gateway but ultimately decided against it as it lacked support for crucial features of our API – among them support for representations other than OData and server-driven content negotiation. We probably would have picked ADL had it been publicly available at the start of the project (BDUF!), and retrofitting a framework halfway through was not something any of us wanted to do. There’s always the next API though! :wink:

Developers, Developers, Developers!

Of course, it would not have been possible for me to actually learn any of the above had it not been for a bunch of great developers who turned some of my – sometimes whacky – ideas into specs and working code. Many thanks to andre.olivier2, custodio.deoliveira, brad.pokroy and others; lessons learnt from this implementation were entirely due to faults in my design rather than their implementation of it.

Permalink · ‎06-03-2012

Great content Sasha - just when I thought there couldn't be more - there was another page. Very glad that you have taken the time to share this with us all.

Thanks!

GrahamRobbo · ‎06-04-2012

Great information Sascha - you get my vote for blog of the year. cc alvaro.tejadagalindo3

custodio_deoliveira · ‎06-04-2012

Great stuff Sascha. I only think you should have started with the "developers, developers, developers" bit :cool:

fred_verheul · ‎06-04-2012

Wow Sascha, great informative blog, and on one of my favourite subjects! I really envy you, working on such a project :wink: . Thanks for sharing!

Btw, I second Graham's vote for blog of the year.

sufw · ‎06-04-2012

Haha, would you believe this was the short version?! 😉

sufw · ‎06-04-2012

Well, I wanted to leave the best part until the end. Thank you very much for all your great work on the implementation! Where the rubber hits the road is where it matters - you can't log onto a system on a whiteboard! 🙂

Former Member · ‎06-04-2012

Hi Sascha,

great blog, immediately had to bookmark some of the ressources you mentioned.

Regarding caching I wonder if Caching Properly is not only Hard but I wonder if it is worth it given practices shared over the history of HTTP.

First, to my knowledge pretty many web-based "eCommerce" applications try hard to avoid any caching of critical (non-static) ressources at all by using all possible means to tell any appliance down the stream not to cache, like with Cache-Control: no cache, Expires: 0, Pragma No-cache, Cache-Control: max-age=0 and the like. I believe this has become a custom model since no one wants to trust the caching of the web when it comes to critical ressources.

Second, since business entities like orders are no static ressources like an HTML file, the final appliance in the request stream, the webserver on an ABAP server for instance, can't simply and quickly look up the date of a file on the filesystem but it has to call a handler class on the application server. If orders are not somehow cached in the app server a fuull instanciation of the object or at least a DB read is necessary to find out if the object has changed or not. IMHO, given the time this "bottleneck" application server need to retrieve the requested information, it doesn' matter much to the total response time (on an average ressource, which, since it is usually no multimedia thing but simple text) if the request returns the full response or just 302. Avoiding all potential caching troubles, I'd try to go without caching, if no heavy multimedia content (images, videos, other binaries, ...) where involved.

But anyway, very informative blog. Thanks!

sufw · ‎06-04-2012

Hi Fred,

Thank you Fred. I hope you also admire the developers who had to turn all of my crazy ideas into working code - they are the difference between success and failure! :smile:

Sascha

sufw · ‎06-04-2012

Hi Anton,

you know what? I totally agree! Caching isn't always (often?) going to be desirable as some many resources in a transactional REST API do change frequently and caching them can lead to undesirable outcomes. We also found that working out a resource's "last modified" timestamp or its ETag is often no more computationally expensive than just retrieving the whole thing from the database.

However, there will be exceptions and I think it's nevertheless important to consider this early on in the API design even if the answer will be not to cache, or to actively prevent caching. I can think of cases where the underlying data structures are composed of header and detail tables and require joins or complex computations to create, and in some of those cases computing an ETag first and checking that may indeed have a good pay-off.

Our API also has a bit of content which is effectively immutable - payment advices or invoices (once finalised) are good examples ripe for long-term caching. But many cases are less clear-cut and may require a trade-off to be made between 100% end to end data consistency, and scalability in a cost-effective manner. Another design choice. In the end we settled on a mix of caching durations (from 0 to long term) as well as a two-tiered setup with the ICM cache handling resources with longer freshness, while an application-side cache based on Ehcache handled shorter-term resources and which could be flushed selectively by the API consumer.

Thank you for the thoughtful response; it's really great to see that there is no single best solution and things depend on our viewpoints.

Sascha

sufw · ‎06-04-2012

Thank you Graham, but I'm pretty sure that this blog has already won.

bradp · ‎06-04-2012

Great blog and brilliant architecture Sascha! It was an awesome experience getting to be involved in this and a huge learning curve. I'm sold on it :smile:

former_member183750 · ‎06-04-2012

Wow. Talk about timely. Many, many thanks for this.

I just started to look at the upcoming RESTful APIs for SAP BI 4.0. One day - soon I hope, I'll be making a similar blog. Doubt it will be as insightful as this, but info is gold these days.

matt_steiner · ‎06-04-2012

Brilliant blog - thanks a ton for sharing. As dagfinn.parnas blogged in his recent post it is quite simple Exposing a REST API from #SAPNWCloud. Have you tried the SAP NetWeaver Cloud platform yet? If not, you should really give it a try... :wink:

Keep up the great work!

sufw · ‎06-06-2012

Hi Brad,

and thank you for making it all happen by turning some scrawling on a whiteboard into working code. Great work mate, and looking forward to the next one!

Sascha

sufw · ‎06-06-2012

Hi Ludek,

don't say that it won't be insightful - I for one have absolutely no idea at all about BI 4, so would definitely be interested in it!

Sascha

sufw · ‎06-06-2012

You mean me write actual code?!? I think you must be mistaking me for Alisdair! 🙂

I think I might be able to hack together some JavaScript by copy/pasting from Google, but Java might be a bit beyond me I'm afraid... :neutral:

qmacro · ‎06-06-2012

Wow this sure is an excellent post and a 'keeper' resource. I'd made some notes the other night but never got round to finalising them; I then realised it was a silly idea to hold back, and that I should just 'release early, release often' with respect to comments too!

I love that you embraced the REST constraints from the start. Especially the one that many people (including me!) have a tendency to leave until later 'because it's hard' - the HATEOAS constraint. I'm wondering whether the introduction of OData to the data landscape will change that, because, while not entirely a RESTful system*, it certainly gives you a lot of HATEOAS out of the box for next-to-nothing.

(* More on that another time, perhaps. I did notice a comment on your Gateway deployment post today that I haven't had time to follow up, that was of a similar nature.)

I really appreciate the pointers to stuff, there are things out there that I was vaguely aware of, but had been pushed down my reading list until now. The Apigee Best Practices guide is coming out of the printer, bathtub-reading-ready, as I type this.

I would like to take a devil's advocate position on your thoughts in the "API Versioning" section, particularly the point about being puritanical. I would suggest that while resources don't have versions, APIs do. So the question to me is "Is a resource the API, or a particular version of the API?" I would say the latter, especially as two or more versions can validly co-exist. And on the co-existence front, I do think there's a difference between a version and a representation; after all, there can be, say, a PDF representation of version 1 of a document, and an HTML representation, and a MS-Word representation of version 2 of the same document. What is / are the resources then? I would say document-v1 and document-v2.

I agree with Anton and you - Caching is hard, and not always the panacea one might imagine. The stuff on ETags reminds me of a little hack I did a while ago - Etag enabled wget.

Also, your comments about partial updates are very interesting and relevant. I certainly agree that they violate PUT's idempotency. But what about POST, though? Rather than the 'classic' POST to a 'collection', semantically you could POST to an 'item' with partial information in the representation. On the one hand, though, when I write that, it doesn't feel right. On the other hand, POSTing to a collection is, at least philosophically, POSTing a partial to that collection resource, i.e. adding just one bit of information to the whole connection. I'm rambling a bit now.

The comment about empty strings made me smile. Partially (geddit) because it's a conversation that I've just had this week with someone, and partially because it reminds me of one of my favourite quotes, this one from the father of Perl, Larry Wall:

"And don't tell me there isn't one bit of difference between space and null, because that's exactly how much difference there is :-)" (see http://en.wikiquote.org/wiki/Larry_Wall#1990)

Cheers

dj

matt_steiner · ‎06-07-2012

Nice to see the godfather of REST in a SAP environment joining the conversation! Guess that means you're now Sir Sascha Wenninger... 🙂

I fully agree that the Apigee API guide is a great read and I recommend to everyone to read it who's planning to develop a RESTish API. Stumbled across it on InfoQ a while back and I loved its simplicity and the great recommendations it contains. Geek porn at its best ...

Looking forward to the upcoming discussions 🙂

qmacro · ‎06-07-2012

Actually I read some of the Apigee guide last night and did choke on the suggestion that people might best stick to only 3 status codes (200, 400 and 500). It did say that if that wasn't enough then go ahead and use some more (e.g. 201) but I did find it a little odd. I can understand somewhat as the document does make a disclaimer about championing 'pragmatic REST' but this is more like 'too-little REST'.

Some of the document's content is very good and does remind me of the great stuff from Joe Gregorio a few years back, too.

matt_steiner · ‎06-07-2012

Hm... It's been a while since I read it, but didn't it say to use the standas code and providing a fall-back to always return 200 - when an optional URL parameter was used (e.g. ommitErrorCodes=true) or the like.

In any way... I think it's a good starting point if you're about of going down that road. You can always make your own rules if you got good reasons. You know - kindof: there's an excepton to every rule.

When I read it - it mostly made sense...