What do people have against URLs?

The web or rather HTTP is RESTful and as a result it should make the web a wonderfully simple thing. Everything is a deferencable resource, addressable at a URI. You then have a small set of well defined operations which can be applied to each resource, HTTP gives you: POST, GET, PUT and DELETE which you can read as CREATE, READ, UPDATE and DELETE. And that’s all you need. So what is it with folk wanting to muck it?

As Elliotte Rusty Harold puts it, all resources are identified by URLs:

Tagging distinct resources with distinct URLs enables bookmarking, linking, search engine storage, and painting on billboards. It is much easier to find a resource when you can say, “Go to http://www.example.com/foo/bar” than when you have to say, “Go to http://www.example.com/. Type ‘bar’ into the form field. Then press the foo button.”

Do not be afraid of URLs. Most resources should be identified only by URLs. For example, a customer record should have a URL such as http://example.com/patroninfo/username rather than http://example.com/patroninfo. That is, each customer should have a separate URL that links directly to their record (protected by a password, of course), rather than all your customers sharing a single URL whose content changes depending on the value of some login cookie.

Yet despite this there are quite a lot of sites that persist in doing the exact opposite – personalising a site by changing the content available at a URL. And when you do that you are nolonger designing sites the way the internet was designed to work because you stop having a resource identified by a URL and instead have something else entirely, you end up trying to make the web a statefull system and that’s bad news.

The client and server may each have state, but neither relies on the other side remembering what its state is. All necessary information is transferred in each communication. Statelessness enables scalability through caching and proxy servers. It also enables a server to be easily replaced by a server farm as necessary. There’s no requirement that the same server respond to the same client two times in a row.

Robust, scalable web applications work with HTTP rather than against it. RESTful applications can do everything that more familiar client/server applications do, and they can do it at scale. However, implementing this may require some of the largest changes to your systems. Nonetheless, if you’re experiencing scalability problems, these can be among the most critical refactorings to make.

It just doesn’t make sense to work against the nature of the web – not if you want a site that scales and plays nicely with the rest of the web. Keeping one resource per URL is the way forward and that includes personalisation – personalise your site at specific URLs such as …/user/me/stuff/.

Now failing to implement a RESTful architecture is one thing but deciding to reinvent HTTP URIs – now that is truly perverse. I’m talking technologies such as about eXtensible Resource Identifier (XRI) and Digital Object Identifiers (DOI). Neither of which make much sense to me. And when it comes to XRI I’m in good company:

We are not satisfied that XRIs provide functionality not readily available from http: URIs. Accordingly the TAG recommends against taking the XRI specifications forward, or supporting the use of XRIs as identifiers in other specifications.

Tim Berners-Lee and Stuart Williams, co-chairs, W3C Technical Architecture Group

No one has been able to explain why “=tomscott” or “doi:10.1000/182” is easier than a URL, unless of course you’re an i-names provider or DOI Registration Agency. In my mind neither DOI and XRI are needed – we have a very respectable, proven technology to identify resources – I don’t see any need to invent anything else. Likewise I don’t think we need anything more that the existing HTTP operations.

All we need to do then is adopt these four simple rules:

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information.
  4. Include links to other URIs. so that they can discover more things.

Failing to adopt this approach doesn’t destroy anything, but it does mean you’re failing to take advantage of how the internet is built and, as Sir Tim puts it that means:

…you are misses an opportunity to make your data interconnected. This in turn limits the ways it can later be reused in unexpected ways. It is the unexpected re-use of information which is the value added by the web.

Photo: Colon Slash Slash, by Jeff Smallwood. Used under licence.

Web design 2.0 – it’s all about the resource and its URL

Jamie’s recent post about his work on the design of BBC’s /programmes service highlights an important trend in the design of modern web products. It’s all about the resource and its URL.

Topography thumnail

Web site design use to be focused on the page and the sitemap – we assumed users would visit a site and browse around a bit while they were there – sites therefore wrapped their content in heavy branding and optimised their navigation around the ‘left hand nav’.

This sort of made sense if, as a publisher, you’re dealing with a small number of pages, your paradigm for web publishing is the same as print publishing and your technology is limited to static files. However, it does force users into navigating in a ‘wagon wheel strategy’ i.e. they would read something and then navigate up to a top level index page (e.g. news, programmes, a-z &c.) to find something else of interest before navigating out again to a resource.

Site owners effectively thought of their sites as silos – a self contained object, a web of pages, with a handful of doors (links) in and out – well even if they didn’t think of them as silos they sure treated them as such. But as Tom Coates puts it Web 2.0 is about moving from a “web of pages to a web of data“:

A web of data sources, services for exploring and manipulating data, and ways that users can connect them together.

This has some important implications for the design of web sites. Users expect to be able to navigate directly from resource to resource. From concept to concept. Look at the YouTube design – right there on the right are links to other related videos and the top level navigation is light weight and focused around aggregations or list views.

Lego Darth Vader Canteen Incident

In other words web 2.0 sites have two main classes of page: primary objects or resources and aggregation views which give their users multiple routes into the same resource. And that resource is located at a single URL. That last point might seem painfully self evident: ‘one resource = one URL’ but either its not that obvious or its difficult to achieve.

Wikipedia are the masters of this – they work to ensure that they only have one entry per concept – this means that they have one URL per concept. And that’s a lot better, if harder to achieve, than one URL per resource. On Wikipedia an entry will be moved, merged or split to ensure that it only deals with one concept. But even if your resource deals with more than one concept, it should only be found at one (persistent) URL so that search engines can index it properly (so people can find it); so people can link to it and; so third party applications can be integrated with it. Simon Willison explains why, although people may well be able to discern that two URLs probably point to the same thing, machines can’t.

Once you have a single resource at a single persistent URL you can start to do some interesting things. You can make that resource available in a variety of different formats each optimised for different uses: HTML for web browsers, XHTML MB or WML for mobile, JSON for Ajax applications etc. You can start to expose your data to other web applications and then you can start to benefit from the network effect.

One consequence of a network effect is that the purchase of a good by one individual indirectly benefits others who own the good — for example by purchasing a telephone a person makes other telephones more useful.

By making your site and its data available in this way means that others will be able to link to your resources to help give context to their content. And if you’ve done your job well they won’t even need to manually ‘link’ to it – if you happen to be using a common set of identifiers and vocabularies then web applications can do a lot of the work for you.

If however, you start off by thinking about web pages, site maps, left hand navigations and visual design you will very quickly photoshop yourself into a corner. You will find it difficult to create a web of data instead you will end up with a bag of pages, others won’t be able to integrate with your data and your relevance on the web will fall or fail to start.

And if you are after a case study of what to do – you should have a look at Matt’s presentation at FOWA on the development of Dopplr.