What do people have against URLs?

The web or rather HTTP is RESTful and as a result it should make the web a wonderfully simple thing. Everything is a deferencable resource, addressable at a URI. You then have a small set of well defined operations which can be applied to each resource, HTTP gives you: POST, GET, PUT and DELETE which you can read as CREATE, READ, UPDATE and DELETE. And that’s all you need. So what is it with folk wanting to muck it?

As Elliotte Rusty Harold puts it, all resources are identified by URLs:

Tagging distinct resources with distinct URLs enables bookmarking, linking, search engine storage, and painting on billboards. It is much easier to find a resource when you can say, “Go to http://www.example.com/foo/bar” than when you have to say, “Go to http://www.example.com/. Type ‘bar’ into the form field. Then press the foo button.”

Do not be afraid of URLs. Most resources should be identified only by URLs. For example, a customer record should have a URL such as http://example.com/patroninfo/username rather than http://example.com/patroninfo. That is, each customer should have a separate URL that links directly to their record (protected by a password, of course), rather than all your customers sharing a single URL whose content changes depending on the value of some login cookie.

Yet despite this there are quite a lot of sites that persist in doing the exact opposite – personalising a site by changing the content available at a URL. And when you do that you are nolonger designing sites the way the internet was designed to work because you stop having a resource identified by a URL and instead have something else entirely, you end up trying to make the web a statefull system and that’s bad news.

The client and server may each have state, but neither relies on the other side remembering what its state is. All necessary information is transferred in each communication. Statelessness enables scalability through caching and proxy servers. It also enables a server to be easily replaced by a server farm as necessary. There’s no requirement that the same server respond to the same client two times in a row.

Robust, scalable web applications work with HTTP rather than against it. RESTful applications can do everything that more familiar client/server applications do, and they can do it at scale. However, implementing this may require some of the largest changes to your systems. Nonetheless, if you’re experiencing scalability problems, these can be among the most critical refactorings to make.

It just doesn’t make sense to work against the nature of the web – not if you want a site that scales and plays nicely with the rest of the web. Keeping one resource per URL is the way forward and that includes personalisation – personalise your site at specific URLs such as …/user/me/stuff/.

Now failing to implement a RESTful architecture is one thing but deciding to reinvent HTTP URIs – now that is truly perverse. I’m talking technologies such as about eXtensible Resource Identifier (XRI) and Digital Object Identifiers (DOI). Neither of which make much sense to me. And when it comes to XRI I’m in good company:

We are not satisfied that XRIs provide functionality not readily available from http: URIs. Accordingly the TAG recommends against taking the XRI specifications forward, or supporting the use of XRIs as identifiers in other specifications.

Tim Berners-Lee and Stuart Williams, co-chairs, W3C Technical Architecture Group

No one has been able to explain why “=tomscott” or “doi:10.1000/182” is easier than a URL, unless of course you’re an i-names provider or DOI Registration Agency. In my mind neither DOI and XRI are needed – we have a very respectable, proven technology to identify resources – I don’t see any need to invent anything else. Likewise I don’t think we need anything more that the existing HTTP operations.

All we need to do then is adopt these four simple rules:

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information.
  4. Include links to other URIs. so that they can discover more things.

Failing to adopt this approach doesn’t destroy anything, but it does mean you’re failing to take advantage of how the internet is built and, as Sir Tim puts it that means:

…you are misses an opportunity to make your data interconnected. This in turn limits the ways it can later be reused in unexpected ways. It is the unexpected re-use of information which is the value added by the web.

Photo: Colon Slash Slash, by Jeff Smallwood. Used under licence.