What do people have against URLs?

The web or rather HTTP is RESTful and as a result it should make the web a wonderfully simple thing. Everything is a deferencable resource, addressable at a URI. You then have a small set of well defined operations which can be applied to each resource, HTTP gives you: POST, GET, PUT and DELETE which you can read as CREATE, READ, UPDATE and DELETE. And that’s all you need. So what is it with folk wanting to muck it?

As Elliotte Rusty Harold puts it, all resources are identified by URLs:

Tagging distinct resources with distinct URLs enables bookmarking, linking, search engine storage, and painting on billboards. It is much easier to find a resource when you can say, “Go to http://www.example.com/foo/bar” than when you have to say, “Go to http://www.example.com/. Type ‘bar’ into the form field. Then press the foo button.”

Do not be afraid of URLs. Most resources should be identified only by URLs. For example, a customer record should have a URL such as http://example.com/patroninfo/username rather than http://example.com/patroninfo. That is, each customer should have a separate URL that links directly to their record (protected by a password, of course), rather than all your customers sharing a single URL whose content changes depending on the value of some login cookie.

Yet despite this there are quite a lot of sites that persist in doing the exact opposite – personalising a site by changing the content available at a URL. And when you do that you are nolonger designing sites the way the internet was designed to work because you stop having a resource identified by a URL and instead have something else entirely, you end up trying to make the web a statefull system and that’s bad news.

The client and server may each have state, but neither relies on the other side remembering what its state is. All necessary information is transferred in each communication. Statelessness enables scalability through caching and proxy servers. It also enables a server to be easily replaced by a server farm as necessary. There’s no requirement that the same server respond to the same client two times in a row.

Robust, scalable web applications work with HTTP rather than against it. RESTful applications can do everything that more familiar client/server applications do, and they can do it at scale. However, implementing this may require some of the largest changes to your systems. Nonetheless, if you’re experiencing scalability problems, these can be among the most critical refactorings to make.

It just doesn’t make sense to work against the nature of the web – not if you want a site that scales and plays nicely with the rest of the web. Keeping one resource per URL is the way forward and that includes personalisation – personalise your site at specific URLs such as …/user/me/stuff/.

Now failing to implement a RESTful architecture is one thing but deciding to reinvent HTTP URIs – now that is truly perverse. I’m talking technologies such as about eXtensible Resource Identifier (XRI) and Digital Object Identifiers (DOI). Neither of which make much sense to me. And when it comes to XRI I’m in good company:

We are not satisfied that XRIs provide functionality not readily available from http: URIs. Accordingly the TAG recommends against taking the XRI specifications forward, or supporting the use of XRIs as identifiers in other specifications.

Tim Berners-Lee and Stuart Williams, co-chairs, W3C Technical Architecture Group

No one has been able to explain why “=tomscott” or “doi:10.1000/182″ is easier than a URL, unless of course you’re an i-names provider or DOI Registration Agency. In my mind neither DOI and XRI are needed – we have a very respectable, proven technology to identify resources – I don’t see any need to invent anything else. Likewise I don’t think we need anything more that the existing HTTP operations.

All we need to do then is adopt these four simple rules:

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information.
  4. Include links to other URIs. so that they can discover more things.

Failing to adopt this approach doesn’t destroy anything, but it does mean you’re failing to take advantage of how the internet is built and, as Sir Tim puts it that means:

…you are misses an opportunity to make your data interconnected. This in turn limits the ways it can later be reused in unexpected ways. It is the unexpected re-use of information which is the value added by the web.

Photo: Colon Slash Slash, by Jeff Smallwood. Used under licence.

Comments

4 Comments so far. Leave a comment below.
  1. Hi Tom, Interesting post which I found thanks to Bob Dylan at the BBC.

    If I’ve understood you right, you’re confusing Identifiers (URIs)> and Names (URNs ), which sometimes are the same thing, other times not.

    If you say everybody should just “Use HTTP URIs so that people can look up those names” then you force people to tightly couple the names they give things with the identifiers they use for them.

    I reckon sometimes, this is fine but other times its a bad idea. Consider ISBN’s which are URNs which can be used to identify books, completely independently of HTTP.

    urn:isbn:1844138518 is a name which can obviously have many different URIs. Despite this, it is a perfectly good name that uniquely identifies a book by Chris Anderson.

    The same is true for DOIs, urn:doi:10.1016/S1535-6108(02)00133-2 is a unique name for an interesting paper in biology.

    So if you insist that all URIs should be HTTP URIs, (rather than sometimes just humble URNs), you are forcing people to tightly couple their naming and identity schemes to HTTP, even when it might not be appropriate for them to do so.

    Just my €0.02!

  2. Duncan

    For sure if you are using a different platforms then HTTP URIs won’t be appropriate – as you say ISBN is an appropriate URN for printed books.

    But If the resource is available on the web i.e. over HTTP then I see no good reason to use any other URN other than HTTP URIs.

    DOIs can’t easily be dereferenced even if they identify a web resource – why add another level of indirection? So goes for XRI.

  3. I see, I suppose what I’m trying to say is, URNs are useful when you want to give something a unique but location-independent name.

    There are plenty of examples where this is useful

    ISBNs
    DOIs
    Life Science Identifiers (LSIDS)

    In many cases they can be magically transformed from URNs to HTTP URIs if required.

    e.g.

    amazon.co.uk/exec/obidos/ASIN/1844138518

    http://dx.doi.org/10.1016/S1535-6108(02)00133-2

    So URNs needn’t be HTTP URIs, but the two can happily co-exist without breaking any of the webs magical goodness…

  4. Pete,

    I agree.

    Yes, you can use DOI’s or LSID’s, but then all the programs that process them have to know how to dereference them. Web crawlers and other software will need to able to recognize that the identifier is a DOI and then use DOI specific code to dereference it. There is a similar problem with LSID’s.

    Are those who advocate DOI’s and LSIDs promising to write and maintain this code for all the semantic web libraries in use?

    I think that there is confusion about URI’s because of their similarity to URL’s. They are the same as DOI’s and LSID’s in that the are globally unique strings of characters. They are different and preferable because they are relatively well supported and understood.

    There are other existing ways to deal with the issues of server persistence, location independence.
    The simplest way is to make you data (or part of it) available as an RDF dump.
    My URI’s work just fine on the Linked Open Data cloud even if my server is down.

Trackbacks

One Trackback

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,335 other followers

%d bloggers like this: