The web as an ethical layer

If you’ve been around the web for any length of time you’ve probably seen a diagram similar to this:

It’s the classic internet hourglass with signal carriers down the bottom, IP in the middle and applications up top. You can see the World Wide Web perched atop HTTP, one more technical layer in a technical layer cake.

It’s maybe because I’m not that technical but I’ve never really seen the web as a technical layer on top of the internet. In terms of technical design there’s not that much there. The design decisions of the web always seemed more political / ethical than pure technical. So at least in my opinion the web is a political / ethical layer above the internet.

As Ben Ward recently pointed out we tend to obsess on new standards OAuth, OpenID, Contacts, Connect, Geolocation, microformats, widgets, AJAX, HTML5, local storage, SPDY, ‘The Cloud’ and lose track of what the web is actually for. In Ben’s words articles and poems and pictures and movies and music, everywhere! How brilliant is that! Or put in my simple terms the point of the web is universal access to information. Everything else is just window dressing and mostly leads to restrictions. I think just about every blog post I’ve written includes this quote from Tim Berners-Lee. Now doesn’t seem like a good time to break that habit so:

The Web is designed as a universal space. Its universality is its most important facet. I spend many hours giving talks just to emphasize this point. The success of the Web stems from its universality as do most of the architectural constraints.

And that’s more important than any tech spec. The web isn’t politically / ethically neutral and wasn’t designed by people who are / were politically / ethically neutral. Which is why the most important design decision of the web was statelessness and the most important architectural style is REST. Statelessness means everyone has equal access to information regardless of age or gender or ethnic background or physical location or physical ability etc etc etc. Because the web doesn’t care about who you are, only what you asked for.

Which is also why accessibility really matters. Anything that restricts access to information to any one group is bad. Which means accessibility also means mobile views (because that’s the main access point for many people in “less developed” countries) and data views (because for some people the access they want is to the raw data).

And it’s why anything that attempts to impose state on top of the web is, in general, bad. It just adds friction and any friction reduces people’s access to information. So walled gardens, paywalls, anything that requires you to log in, anything that forces you to accept cookies, anything that needs to know something about you before it gives you information.

At the risk of descending properly into freetard territory the other great thing about the web is once you’ve found what you’re looking for nothing is locked down (other than a few clumsy attempts at DRM). More or less anything you find (text, images, a/v files) can be taken away and played with and recontextualised and republished and taken again…

Which sometimes is bad. Like when someone posts a picture of their friend pulling a silly face to flickr. And fails to understand licencing and makes it available for commercial use. And some company takes it, adds a demeaning strapline and posts it on billboards across Australia causing some degree of pain and distress

But more often it’s a good thing. Because I could search for the TimBL quote above, find it, copy it and paste it in here. And when my daughter does her homework (actually she’s 4 so doesn’t really have any yet) she can go to the web and take a picture and paste it into the story she’s writing. And sometimes she’ll probably steal and sometimes probably give credit but in general what you can find you can borrow and take into your real life and reshape and recontextualise and make new meanings. And that’s good.

And there’s other ways it’s good. The election day Sun front page barely left the presses before pictures of it were winging round the web. And then people took that image, downloaded it (because the web makes that easy – it didn’t have to but it does), modified it and uploaded new versions. Which people commented on and talked about so more people made more versions and talked more about press bias and made jokes. And I think that’s healthy.

There have been occasional attempts to fragment the web. To create an academic space or a commercial space or a copyleft space or a ‘safe’ space. Apple’s shiny iThing app store model is just the latest attempt. Usually the motivations have been honourable. But the effect is always to create a something that’s less free than the open web; to take a public space and turn it into a policed enclosure. Or maybe like a public space in the same way a shopping mall might be thought to be a public space but is owned and controlled and often privately policed. Policing access is dangerous because it removes universality. And policing re-contextualisation is dangerous because it takes away the right to fair usage (my daughter’s homework…). But the people who really do want to steal will always find a way round any form of rights restriction that’s embodied in code and not in community norms. So you punish the “fair users” in an attempt to restrict the real “criminals” who get round the restrictions anyway. And end up building something that just frustrates.

So I really think the web (not the internet which is really just some pipes) is the greatest thing we’ve ever created. More than telly, more than radio, more than newspapers, more than books. Because it’s universal and because it’s open for reuse.

But there are problems. Anything that requires a computer and a phone line (or at least a web capable mobile) can’t quite be universal unless everyone has those things (or lives in a community with shared access to those things.) There’s a lot of talk about digital inclusion, about taxes to fund broadband and about universal access to the web. But it all misses the point. It was never just about having access to other people’s information. It was always about everybody, everywhere having the ability to add their thoughts, the things they know, to the web. Treating digital inclusion as a question of connecting pipes to homes is an easy mistake to make because it follows established patterns of water and gas and electricity and television aerials. But the web was never designed to be a broadcast / distribution mechanism. Digital inclusion doesn’t just mean everyone needs to have a receiver on their roof; it means they need access to a transmitter too. Without the ability to transmit, to publish, people just become passive consumers of other people’s information. And digital inclusion has to include the ability to produce as well as consume.

So physical access is only the first hurdle. Once you’re over that, the barrier to publishing is still too high. Owning your own publishing space means you have to start understanding domain names and DNS and server set ups and code installs and updates. Which for most people is just too difficult. It’s certainly too difficult for me which is why I end up publishing this here (wherever here turns out to be). Luckily “social media” sites arrived to fill the skills gap. But social media is a bit of a misnomer. The web was always supposed to be social and always meant to be open to contributions from everyone. The innovation of social media wasn’t really socialness. From Flickr to WordPress to Blogger to YouTube to Twitter the real innovation was the commoditisation of publishing technology. Now everyone could share what they knew. But at a price.

The most obvious price of commodity publishing is loss of control over your content. In almost all cases the hosting organisation will take a permissive licence on your content:

a worldwide, non-exclusive, royalty-free, transferable licence (with right to sub-licence) to use, reproduce, distribute, prepare derivative works of, display, and perform that User Submission in connection with the provision of the Services and otherwise in connection with the provision of the Website and YouTube’s business, including without limitation for promoting and redistributing part or all of the Website (and derivative works thereof) in any media formats and through any media channels

where for YouTube you can pretty much substitute any website with user submissions from Facebook to the BBC. It means you retain copyright but we we give ourselves so many rights your copyright is virtually useless. Content acquisition on the cheap. It’s a bigger problem than digital literacy because there’s no point educating people about the issues if they still can’t publish and avoid them.

The second major problem is privacy. You’d have had to be living life under stones to not notice that privacy has become the big issue of year. Facebook in particular have gotten regularly flamed for their ever decreasing privacy circle. Now they’re stepping outside the realms of knowing about your social network, your status and your photos and attempting to own the graph of what you like from elsewhere on the web. There are, as ever, arguments on both sides but the only one really worth reading is danah boyd‘s Privacy and Publicity in the Context of Big Data. There’s too much in there to really sum up in a one liner but my attempt would be: privacy issues aren’t about how much information you share; they’re about the gap between your perception of the context of sharing and the reality. Extrapolating from that, once you trust your personal information to “the cloud” you lose control over the context of use. Your data can be meshed with other data in ways you didn’t even begin to anticipate. And the rules around context can be nudged in whatever direction most benefits the cloud service.

It’s like building a giant Tesco loyalty card in the sky. Clive Humby (chairman of Dunnhumby (the people who run the real Tesco clubcard)) once said:

credit-card data tells you how they live generally, the supermarket data tells you their motivations, the media data tells you how to talk to them. If you have those three things, you’re in marketing nirvana

The social media “cloud” seems uncomfortably like Mr Humby’s dream web. And unlike Tesco it doesn’t even pay you for your data. Obviously there are worse fates than being the target of one of Clive’s targeted mail drops. Liberal democracies tend to assume they’ll always be liberal democracies. History seems to suggest otherwise. If the worst were to happen do you really want all that personal data out there outside your control? You might end up with more to worry about than whether your prospective boss sees you drunk on Facebook.

Is Clive’s web the one we really want to build? Or is there a fairer, more distributed solution that allows everyone to share the things they know on their terms? With the power to publish, redact, edit… I’m probably in danger of jumping on Steven Pemberton‘s bandwagon (who’s been saying this for several years now) but until everyone owns and controls their own publishing space we won’t really have built the web. And (with my day job hat on) until “the public” can “broadcast” without fear or favour we won’t really have built public service broadcasting.

I’ll leave with this:

It’s the original logo for the World Wide Web drawn by its co-creator Robert Cailliau. Until Dan Brickley pointed me at it I wasn’t even aware of its existence. The most important point is it doesn’t attempt to qualify the ‘us’; it just means everyone.

In my dream world everybody working with the web in any capacity would have this stapled above their desk. So when all the talk of product planning and sprint planning and deployment and test driven development and check ins and check outs and branded experience and user stories gets too tiring you can look up and remember why we’re doing this.

The problem with breadcrumb trails

The other day I was chatting with some of the designers at work about secondary navigation and the subject of breadcrumb trails came up. Breadcrumb trails are those bits of navigation summed up by Jakob Nielsen as:

a single line of text to show a page’s location in the site hierarchy. While secondary, this navigation technique is increasingly beneficial to users.

and illustrated on Wikipedia by:

Home page > Section page > Subsection page

For reasons which will hopefully become clear the whole subject of breadcrumb trails vexes me and rather than shout into Twitter I thought I’d type up some thoughts so here goes.

Types of breadcrumb trail

There are 2 main types of breadcrumb trail:

  • path based trails show the path the user has navigated through to arrive at the current page
  • location based trails show where the page is located in the ‘website hierarchy’

Both of these are problematic so let’s deal with each in turn.

Path based breadcrumb trails

The first thought most people have when confronted by the concept of a breadcrumb trail is Hansel and Gretel. In the story the children were led into the forest and as they walked dropped a trail of bread crumbs. The intention was to retrace their steps out of the forest by following the trail of breadcrumbs (at least until the birds ate them).

The important point is that Hansel and Gretel weren’t conducting a topographical study of the forest. The trail they laid down was particular to their journey. If Alice and Bob had been wandering round the same forest on the same day they might have left a trail of cookie crumbs or even traced out their journey with string. The 2 journeys might have crossed or merged for a while but each trail would be individual to the trail maker(s).

The path based breadcrumb trail is the same principle but traced out in pages the user has taken to get to the page they’re on now. So what’s wrong with that?

If you’re a user experience person you’ve probably heard developers talking about REST and RESTful APIs and possibly thought REST was just techy stuff that doesn’t impact on UX. This would be wrong. From a developer point of view REST provides an architectural style for working with the grain of the web. And the grain of the web is HTTP and HTTP is stateless.

So what does that mean? It means when you ask for a page across the web the only data sent in the request is (HTTP) get me this resource and a tiny bit of incidental header information (what representation / format you want the resource in – desktop HTML, mobile HTML, RSS; which languages do you prefer etc). When the server receives the request it doesn’t know or need to know anything about the requester. In short HTTP does not know who you are, does not know ‘where’ you are, does not care where you’ve come from.

There are various reasons given for this design style; some of them technical, some of them ethical. As ever the ethical arguments are far more interesting so:

The Web is designed as a universal space. Its universality is its most important facet. I spend many hours giving talks just to emphasize this point. The success of the Web stems from its universality as do most of the architectural constraints.

is my favourite quote from Tim Berners-Lee. It’s the universality of the web that led to the design decision of stateless HTTP and the widespread adoption of REST as a way to work with that design. Put simply anybody with a PC and a web connection can request a page on the web and they’ll get the same content; regardless of geographic location, accessibility requirements, gender, ethnic background, relative poverty and all other external factors. And it’s the statelessness of HTTP that allows search bots to crawl (and index) pages just like any other user.

You can of course choose to work against the basic grain of the web and use cookies to track users and their journeys across your site. If you do choose that route then it’s possible to dynamically generate a path based breadcrumb trail unique to that user’s navigation path. But that functionality doesn’t come out of the box; you’re just giving yourself more code to write and maintain. And that code will just replicate functionality already built into the browser: the back button and browser history.

It’s possibly also worth pointing out that any navigation links designed to be seen by a single user are not, by definition, seen by any other user. This includes search bots which are to all intents and purposes just very dumb users. Any effort you put into creating links through user specific path based breadcrumbs will not be seen or followed by Google et al so will accrue no extra SEO juice and won’t make your content any more findable by other users. Besides which…

…it’s really not about where you’ve come from

One of my main bugbears with usability testing is the tendency to sit the user down in front of a browser already open at the homepage of the site to be tested. There’s an expectation that user experience is a matter of navigating hierarchies from homepage to section page to sub-section page to content page. If this were true then path based breadcrumb trails might make some sense.

But in reality how many of your journeys start at a site homepage? And how many start from a Google search or a blog post link or an RSS feed or a link shared by friends in Facebook or Twitter. You can easily find yourself deep inside a site structure without ever needing to navigate through that site structure. In which case a path based trail becomes meaningless.

In fairness Jakob Neilson points out pretty much the same thing in the ‘Hierarchy or History’ section of his post:

Offering users a Hansel-and-Gretel-style history trail is basically useless, because it simply duplicates functionality offered by the Back button, which is the Web’s second-most-used feature.

A history trail can also be confusing: users often wander in circles or go to the wrong site sections. Having each point in a confused progression at the top of the current page doesn’t offer much help.

Finally, a history trail is useless for users who arrive directly at a page deep within the site.

All this is true but it’s only part of the truth. The real point is that path based breadcrumb trails work against the most fundamental design decision of the web: universality through statelessness. By choosing to layer state behaviour over the top of HTTP you’re choosing to pick a fight with HTTP. As ever it’s best to pick your fights with care…

Location based breadcrumb trails

In the same post Jakob Neilson goes on to say:

breadcrumbs show their greatest usability benefit [for users arriving directly at a page deep within a site], but only if you implement them correctly – as a way to visualize the current page’s location in the site’s information architecture. Breadcrumbs should show the site hierarchy, not the user’s history.

But what’s meant by ‘site hierarchy’?

Hierarchy and ‘old’ media

Imagine you’re in proud possession of a set of box sets of Doctor Who series 1-4. Each box has 4, 5 or 6 DVDs. Each DVD has 2 or 3 episodes. Each episode has 10 or so chapters:

This structure is obviously mono-hierarchical; each thing has a single parent. So the chapter belongs to one episode, the episode is on one disc, the disc is in one box set. It’s the same pattern with tracks on CDs, chapters in books, sections in newspapers…

With ‘old’ media the physical constraints of the delivery mechanisms enforce a mono-hierarchical structure. Which makes it easy to signpost to users ‘where’ they are. An article in a newspaper can be in the news section or the comment section or the sport section or the culture section but it’s only ever found in one (physical) place (unless there’s a cock-up at the printing press). So it gets an appropriate section banner and a page number and a page position.

But how does this map to the web?

Files and folders, sections and subsections, identifiers and locations

The first point is people like to organise things. And they do this by categorising, sub-categorising and filing appropriately, dividing up the world into sets and sub-sets and sub-sub-sets… Many of the physical methods of categorisation have survived as metaphors into the digital world. So we have folders and files and inboxes and sent items and trash cans.

In the early days of the web the easiest way to publish pages was to stick a web server on a Unix box and point to the folder you wanted to expose. All the folders inside that folder and all the folders inside those folders and all the files in all the folders were suddenly exposed to the world via HTTP. And because of the basic configuration of web servers they were exposed according to the folder structure on the server. So a logo image filed in a folder called new, filed in a folder called branding, filed in a folder called images would get the URL /images/branding/new/logo.jpg. It was around this time that people started to talk about URLs (mapping resources to document locations on web servers) rather than HTTP URIs (file location independent identifiers for resources).

Obviously file and folder structures are also mono-hierarchical; it’s not possible for a file to be in 2 folders simultaneously. And the easiest and most obvious way to build site navigation was to follow this hierarchical pattern. So start at the home page, navigate to a section page, navigate to a sub-section page and navigate to a ‘content’ page; just as you navigate through folders and files on your local hard drive. Occasionally some sideways movement was permitted but mostly it was down, down, up, down….

Many of the early battles in Information Architecture were about warping the filing systems and hierarchies that made sense inside businesses into filing systems and hierarchies that made sense to users. But it was still about defining, exposing and navigating hierarchies of information / pages. In this context the location based breadcrumb trail made sense. As Neilson says the job of the location based breadcrumb trails is to show the site hierarchy and if you have a simple, well-defined hierarchy why not let users see where they are in it? So location based breadcrumb trails make sense for simple sites. The problem comes with…

Complex sites and breadcrumb trails

Most modern websites are no longer built by serving static files out of folders on web servers. Instead pages are assembled on the fly as and when users request them by pulling data out of a database and content out of a CMS, munging together with feeds from other places and gluing the whole lot together with some HTML and a dash of CSS. (Actually, when I say most I have no idea of the proportions of dynamic vs static websites but all the usual suspects (Amazon, Facebook, Twitter, Flickr etc) work dynamically.) Constructing a site dynamically makes it much easier to publish many, many pages; both aggregation pages and content pages. The end result is a flatter site with more complex polyhierarchical structures that don’t fit into the traditional IA discipline of categorisation and filing.

The problem is wholly contained sets within sets within sets are a bad way to model most things. Real things in real life just don’t lend themselves to being described as leafs in a mono-hierarchical taxonomy. It’s here that I part company with Neilson who, in the same post, goes on to say:

For non-hierarchical sites, breadcrumbs are useful only if you can find a way to show the current page’s relation to more abstract or general concepts. For example, if you allow users to winnow a large product database by specifying attributes (of relevance to users, of course), the breadcrumb trail can list the attributes that have been selected so far. A toy site might have breadcrumbs like these: Home > Girls > 5-6 years > Outdoor play

There’s an obvious problem here. In real life sets are fuzzy and things can be ‘filed’ into multiple categories. Let’s pretend the toy being described by Neilson is a garden swing that’s also perfectly suited to a 5-6 year old boy. In this case journeys to the swing product page might be ‘Home > Girls > 5-6 years > Outdoor play’ or ‘Home > Boys > 5-6 years > Outdoor play’. If there’s an aggregation of all outdoor playthings there might be journeys like ‘Home > Outdoor play > Girls > 5-6 years’ and ‘Home > Outdoor play > Boys > 5-6 years. If the swing goes on sale there might be additional journeys like ‘Home > Offers > Outdoor play’ etc.

Now it’s not clear from the quote whether Neilson is only talking about breadcrumb trails on pages you navigate through on your way to the product page or also including the product page itself. But the problem remains. If the garden swing can be filed under multiple categories in your ‘site structure’ which ‘location’ does the product page breadcrumb trail display? There are 4 possible ways to deal with this:

  • drop the breadcrumb trail from your product pages. But the product pages are the most important pages on the website. They’re the pages you want to turn up in search results and be shared by users. I can’t imagine it was Neilson’s intent to show crumbtrails on aggregation pages but not on content pages so…
  • make the breadcrumb trail reflect the journey / attribute selection of the current user. Unless I’m misreading / misunderstanding Neilson this seems to be what he’s suggesting by the breadcrumb trail can list the attributes that have been selected so far. Quite how this differs from a path based breadcrumb trail confuses me. Again you’re serving one page at one URI that changes state depending on where the user has ‘come from’. Again you’re choosing to fight the statelessness of HTTP. And again the whole thing fails if the user has not navigated to that page via your ‘site structure’ but instead arrived via Google or Bing or a Twitter link or a link in an email…
  • Serve (almost) duplicate pages at every location the thing might be categorised under with the breadcrumb trail tweaked to reflect ‘location’. For all kinds of reasons (not least your Google juice and general sanity) serving duplicate pages is bad. It’s something you can do but really, really doesn’t come recommended.
  • Serve a single page at a single RESTful URI and make a call about which of the many potential categories is the most appropriate.

The latter option can be seen in use on The Guardian website which attempts to replicate the linear content category sectioning which works so well in the print edition into an inherently non-linear web form. So the Chelsea stand by John Terry and insist he took no money article has a location breadcrumb trail of:

whereas the High court overturns superinjunction granted to England captain John Terry article has a location breadcrumb trail of:

At some point in the past it’s possible (probable?) that the superinjunction story was linked to from the homepage, the sport page, the Chelsea page, the John Terry page etc. But someone has made the call that although the article could be filed under Football or Chelsea or John Terry or Press freedom it’s actually ‘more’ a press freedom story than it is a John Terry story.

The point I’m trying to make is that breadcrumb design for a non-hierarchical site is tricky. It’s particularly tricky for news and sport where a single story might belong ‘inside’ many categories. But if you’re lucky…

It isn’t about ‘site structure’, it’s about ‘thing structure’

Traditional IA has been about structuring websites in a way that journeys through the pages of those sites make the most amount of sense to the most amount of users. The Linked Data approach moves away from that, giving URIs to real life things, shaping pages around those things and promoting journeys that mirror the real life connections between those things.

Two examples are BBC Programmes and BBC Wildlife Finder. Neither of these sites are hierarchical and the ontologies they follow aren’t hierarchical either. An episode of Doctor Who might be ‘filed’ under Series 2 or Drama or Science Fiction or programmes starring David Tennant or programmes featuring Daleks or programmes on BBC Three on the 4th February 2010. So again location based breadcrumb trails are tricky. But like The Guardian one of the many possible hierarchies is chosen to act as the breadcrumb trail:

which is echoed in the navigation box on the right of the page:

The same navigation box also allows journeys to previous and next episodes in the story arc:

The interesting point is that the breadcrumb links all point to pages about things in the ontology – not to category / aggregation pages. So it’s less about reflecting ‘site structure’ and more about reflecting the relationship between real world things. Which is far easier to map to a user’s mental models.

Wildlife finder is similar but subtly different. The location breadcrumb at the top of the page is a reflection of ‘site structure’. In the original Wildlife Finder it didn’t exist but initial user testing found that many people felt ‘lost’ in the site structure so it was added in. Subsequent user testing found that its addition solved the ‘lost’ problem. So in an input / output duality sense it’s primarily an output mechanism; it makes far more sense as a marker of where you are than a navigation device to take you elsewhere.

Much more interesting is the Scientific Classification box which reflects ‘thing structure’ (in this case the taxonomic rank of the Polar Bear), establishes the ‘location’ of the thing the page is about and allows navigation by relationships between things rather than via ‘site structure’:

In summary

  • We need a new word for crumbtrails. Even seasoned UX professionals get misled by the Hansel and Gretel implications. Unfortunately ‘UX widgets that expose the location of the domain object in the ontology of things’ doesn’t quite cut it
  • Secondary navigation is hard; signposting current ‘location’ to a user is particularly hard. IAs need to worry as much about ‘thing structure’ as ‘site structure’
  • Building pages around things and building navigation around relationships between things makes life easier
  • HTTP and REST are not techy / developer / geeky things. They’re the fundamental building blocks on top of which all design and user experience is built