A few things worth highlighting: Continue reading “A science ontology version 2”
Michael and I did a bit of domain modelling this afternoon – below is our first attempt at a science domain model. It’s almost certainly wrong but I quite like it and I would love to hear what you think, especially if you are a scientist!
To give a bit of context – the idea behind the ontology is to provide a relatively high level model to describe the scientific method so that organisations, such as the BBC, could structure their content (archive footage, news stories etc.) using the model. Continue reading “Science ontology”
So this is the question: do you always need separate URIs for non-information resources and the information resource? That is do you need an identifier for both the document and the thing the document is about? Your answer to that question will depend a lot on your attitudes to the semantic web project.
Now until recently I would have said “yes you do need both”, but recently I’ve been thinking that perhaps it’s not quite so black and white.
Before I get into why I think it probably makes sense to backtrack a little and explain the background to the question. After all for many people this question seems odd: why on earth would you need a URI for anything other than the web page, the document?
In the real world we give all sorts of things identifiers: people have passports and National Insurance Numbers; buildings get Post Codes; books ISBNs etc. We do this because it’s useful to be able to unambiguously identify stuff. To be able to point, discuss and share information about things.
On the Internet we have email addresses and URIs on the Web. OpenID for example is predicated on the notion that a person can have an URI to identify themselves. And the Linked Data project gives URIs for not just people, but all sorts of things: people, places, animals, music, and through dbpedia the myriad of things described in Wikipedia.
Once you have an identifier for a thing you can make assertions about that thing. How big it is, where it is (in the real world), when it was created, who owns it, anything. You can also describe how those things relate to other things – this person is friends with this person and works for this company, which is at this address etc.
Now many people will tell you (indeed I probably will too) that you need to distinguish the statements you make about the thing in the real world from the statements about the document. For example, a URI for me might return a document with some information about me, but the creation date for that document and the creation date for me are two different things. And because you don’t want to get confused it’s better to have a URI for the thing and another one for the document making assertions about the thing. Make sense?
For those that are interested there are a couple of different ways of achieving this separation. For the purposes of this post it’s not important to know how to do this, but if you’re interested have a look at this paper by Richard.
But here’s the thing, many people will tell you that this is all too complex and frankly unnecessary, indeed you may well be thinking the same thing right about now.
Some people will tell you that the whole non-information resource thing isn’t necessary – we have a web of documents and we just don’t need to worry about URIs for non-information resources; others will claim that everything is a thing and so every URL is, in effect, a non-information resource.
Michael, however, recently made a very good point (as usual): all the interesting assertions are about real world things not documents. The only metadata, the only assertions people talk about when it comes to documents are relatively boring: author, publication date, copyright details etc.
If this is the case then perhaps we should focus on using RDF to describe real world things, and not the documents about those things.
On the Web there are a number of different ways of making an assertion about a thing (as identified via a URI): you can state how it relates to other things, you can link it to a piece of data (e.g. RDF literals) or you can link it to a document which makes some statements about the thing (e.g. a news article).
The question is: is there much utility in defining non-information resources in this third scenario: do you need URIs for the documents? Obviously they still need a URL so you can link to it and you should make that document available in a variety of representations but do you need a separate identifier for the non-information resource?
I think not.
This is how I’ve started to think about it: RDF is a great way of describing how (real world) things relate to each other and for this you need URIs for non-information resources. And because you’re dealing with real world things (I know documents are real world things too, but going down this path is how we ended up with the confusion we have today) you will hopefully have interesting and useful links to other things, useful chunks of data and links to useful documents about that thing. Those documents could be in any format – they could be an HTML document, a (Flash) movie, MP3 file, even a csv file. The point is the documents decorate the tree they are discoverable via the RDF graph but they don’t need to be published as RDF themselves.
An RDF graph of things is therefore a great way to: discover documents, to make assertions and share what we know about how those things. Or put another way RDF is a way of building a vocabulary to describe how web resources related to real world objects. I my however me wrong and I would be interested to hear what others think.
Just over two years ago I wrote a post about the importance of the resource and the URL — and I still stand by what I said there: the core of a website should be the resource and its URL. And if those resources describe real world things and they are linked together in the way people think about the world then you can navigate the site by hopping from resource to resource in an intuitive fashion. But I think I missed something important in that post — the role of curation, the role of storytelling.
When we started work on Wildlife Finder we designed the site around the core concepts that we knew people cared about and those that we had content about i.e. species, their habitats and adaptations, and we’ve been publishing resources about those concepts since last September. We’ve since published the model (Wildlife Ontology) describing how those concepts relate together. I’ve talked about this work as providing us with the Lego bricks because I also realised that we needed to use those Lego bricks to build stories, to help guide people through the content. Our first foray into online story telling with these Lego bricks are the Collections.
Collections allow us to curate a set of resources – to group and sequence clips and other resources to tell stories like the plight of the tiger or the years work of the BBC’s natural history unit. Silver Oliver has recently written about why he thinks this approach is important, why curation in a metadata driven information architecture — it’s a very good post — you should read it. But I thought I would share a bit about the intellectual framework behind how I think of this stuff. As with most of my ideas it’s not my ideas but one I’ve borrowed from someone brighter than me, in this case Nathan Shedroff who proposed a framework to think about how to build Lego bricks and then things with those bricks. A framework I’ve been using for few years now.
Wildlife Finder provides information by repackaging data from elsewhere – by organising programme clips, news stories etc. around natural history resources and concepts. This is good (I hope) because it provides useful additional context; but it’s not the whole story. In Shedroff’s model this process creates information — by adding context to data by presenting and organising it in a new, useful way. This is really what encyclopedias provide — structured information presented and organised in useful ways. The next step is to take this information and build stories with it to build knowledge and facilitate conversations.
As I say, with Wildlife Finder, we have started to tell stories by localising the information into Collections, but of course, now we have a unified domain model (which links together programmes and concepts within the natural world) there are other ways in which we can add context and build knowledge on top these resources — in addition to collections. There are lots of ways we can create new experiences, but as you can see from the diagram above, we don’t hold a monopoly in terms of story telling — those that consume the information, our audiences and ‘users’ could also build stories. Although the BBC doesn’t really let people build their own stories other sites and organisations do, notably Flickr who have a series of interesting approaches to let its users add context to photos through Groups, Galleries, Sets and Collections.
By a mile the highlight of last week or so was the 2nd Linked Data meet-up. Silver and Georgi did a great job of organising the day and I came away with a real sense that not only are we on the cusp of seeing a lot of data on the web but also that the UK is at the centre of this particular revolution. All very exciting.
For my part I presented the work we’ve been doing on Wildlife Finder – how we’re starting to publish and consume data on the web. Ed Summers has a great write up of what we’re doing I’ve also published my slides here:
In terms of Wildlife Finder there are a few things that I wanted to highlight:
- If you’re interested in the RDF and how we’re modelling the data we’ve documented the wildlife ontology here. In addition to the ontology itself we’ve also included some background on why we modelled the information in the way we have.
- If you want to get you’re hands on the RDF/XML then either add .rdf to the end of most of our URLs (more on this later) or configure your client to request RDF/XML – we’ve implemented content negotiation so you’ll just get the data.
- But… we’ve not implemented everything just yet. Specifically the adaptations aren’t published as RDF – this is because we’re making a few changes to the structure of this information and I didn’t want to publish the data and then change it. Nor have we published information on the species conservation status that’s simply because we’ve not finish yet (sorry).
- It’s not all RDF – we are also marking-up our taxa pages with the species microformat which gives more structure to the common and scientific names.
Anyway I hope you find this useful.
The other day I was chatting with some of the designers at work about secondary navigation and the subject of breadcrumb trails came up. Breadcrumb trails are those bits of navigation summed up by Jakob Nielsen as:
a single line of text to show a page’s location in the site hierarchy. While secondary, this navigation technique is increasingly beneficial to users.
and illustrated on Wikipedia by:
Home page > Section page > Subsection page
For reasons which will hopefully become clear the whole subject of breadcrumb trails vexes me and rather than shout into Twitter I thought I’d type up some thoughts so here goes.
Types of breadcrumb trail
There are 2 main types of breadcrumb trail:
- path based trails show the path the user has navigated through to arrive at the current page
- location based trails show where the page is located in the ‘website hierarchy’
Both of these are problematic so let’s deal with each in turn.
Path based breadcrumb trails
The first thought most people have when confronted by the concept of a breadcrumb trail is Hansel and Gretel. In the story the children were led into the forest and as they walked dropped a trail of bread crumbs. The intention was to retrace their steps out of the forest by following the trail of breadcrumbs (at least until the birds ate them).
The important point is that Hansel and Gretel weren’t conducting a topographical study of the forest. The trail they laid down was particular to their journey. If Alice and Bob had been wandering round the same forest on the same day they might have left a trail of cookie crumbs or even traced out their journey with string. The 2 journeys might have crossed or merged for a while but each trail would be individual to the trail maker(s).
The path based breadcrumb trail is the same principle but traced out in pages the user has taken to get to the page they’re on now. So what’s wrong with that?
If you’re a user experience person you’ve probably heard developers talking about REST and RESTful APIs and possibly thought REST was just techy stuff that doesn’t impact on UX. This would be wrong. From a developer point of view REST provides an architectural style for working with the grain of the web. And the grain of the web is HTTP and HTTP is stateless.
So what does that mean? It means when you ask for a page across the web the only data sent in the request is (HTTP) get me this resource and a tiny bit of incidental header information (what representation / format you want the resource in – desktop HTML, mobile HTML, RSS; which languages do you prefer etc). When the server receives the request it doesn’t know or need to know anything about the requester. In short HTTP does not know who you are, does not know ‘where’ you are, does not care where you’ve come from.
There are various reasons given for this design style; some of them technical, some of them ethical. As ever the ethical arguments are far more interesting so:
The Web is designed as a universal space. Its universality is its most important facet. I spend many hours giving talks just to emphasize this point. The success of the Web stems from its universality as do most of the architectural constraints.
is my favourite quote from Tim Berners-Lee. It’s the universality of the web that led to the design decision of stateless HTTP and the widespread adoption of REST as a way to work with that design. Put simply anybody with a PC and a web connection can request a page on the web and they’ll get the same content; regardless of geographic location, accessibility requirements, gender, ethnic background, relative poverty and all other external factors. And it’s the statelessness of HTTP that allows search bots to crawl (and index) pages just like any other user.
It’s possibly also worth pointing out that any navigation links designed to be seen by a single user are not, by definition, seen by any other user. This includes search bots which are to all intents and purposes just very dumb users. Any effort you put into creating links through user specific path based breadcrumbs will not be seen or followed by Google et al so will accrue no extra SEO juice and won’t make your content any more findable by other users. Besides which…
…it’s really not about where you’ve come from
One of my main bugbears with usability testing is the tendency to sit the user down in front of a browser already open at the homepage of the site to be tested. There’s an expectation that user experience is a matter of navigating hierarchies from homepage to section page to sub-section page to content page. If this were true then path based breadcrumb trails might make some sense.
But in reality how many of your journeys start at a site homepage? And how many start from a Google search or a blog post link or an RSS feed or a link shared by friends in Facebook or Twitter. You can easily find yourself deep inside a site structure without ever needing to navigate through that site structure. In which case a path based trail becomes meaningless.
In fairness Jakob Neilson points out pretty much the same thing in the ‘Hierarchy or History’ section of his post:
Offering users a Hansel-and-Gretel-style history trail is basically useless, because it simply duplicates functionality offered by the Back button, which is the Web’s second-most-used feature.
A history trail can also be confusing: users often wander in circles or go to the wrong site sections. Having each point in a confused progression at the top of the current page doesn’t offer much help.
Finally, a history trail is useless for users who arrive directly at a page deep within the site.
All this is true but it’s only part of the truth. The real point is that path based breadcrumb trails work against the most fundamental design decision of the web: universality through statelessness. By choosing to layer state behaviour over the top of HTTP you’re choosing to pick a fight with HTTP. As ever it’s best to pick your fights with care…
Location based breadcrumb trails
In the same post Jakob Neilson goes on to say:
breadcrumbs show their greatest usability benefit [for users arriving directly at a page deep within a site], but only if you implement them correctly – as a way to visualize the current page’s location in the site’s information architecture. Breadcrumbs should show the site hierarchy, not the user’s history.
But what’s meant by ‘site hierarchy’?
Hierarchy and ‘old’ media
Imagine you’re in proud possession of a set of box sets of Doctor Who series 1-4. Each box has 4, 5 or 6 DVDs. Each DVD has 2 or 3 episodes. Each episode has 10 or so chapters:
This structure is obviously mono-hierarchical; each thing has a single parent. So the chapter belongs to one episode, the episode is on one disc, the disc is in one box set. It’s the same pattern with tracks on CDs, chapters in books, sections in newspapers…
With ‘old’ media the physical constraints of the delivery mechanisms enforce a mono-hierarchical structure. Which makes it easy to signpost to users ‘where’ they are. An article in a newspaper can be in the news section or the comment section or the sport section or the culture section but it’s only ever found in one (physical) place (unless there’s a cock-up at the printing press). So it gets an appropriate section banner and a page number and a page position.
But how does this map to the web?
Files and folders, sections and subsections, identifiers and locations
The first point is people like to organise things. And they do this by categorising, sub-categorising and filing appropriately, dividing up the world into sets and sub-sets and sub-sub-sets… Many of the physical methods of categorisation have survived as metaphors into the digital world. So we have folders and files and inboxes and sent items and trash cans.
In the early days of the web the easiest way to publish pages was to stick a web server on a Unix box and point to the folder you wanted to expose. All the folders inside that folder and all the folders inside those folders and all the files in all the folders were suddenly exposed to the world via HTTP. And because of the basic configuration of web servers they were exposed according to the folder structure on the server. So a logo image filed in a folder called new, filed in a folder called branding, filed in a folder called images would get the URL /images/branding/new/logo.jpg. It was around this time that people started to talk about URLs (mapping resources to document locations on web servers) rather than HTTP URIs (file location independent identifiers for resources).
Obviously file and folder structures are also mono-hierarchical; it’s not possible for a file to be in 2 folders simultaneously. And the easiest and most obvious way to build site navigation was to follow this hierarchical pattern. So start at the home page, navigate to a section page, navigate to a sub-section page and navigate to a ‘content’ page; just as you navigate through folders and files on your local hard drive. Occasionally some sideways movement was permitted but mostly it was down, down, up, down….
Many of the early battles in Information Architecture were about warping the filing systems and hierarchies that made sense inside businesses into filing systems and hierarchies that made sense to users. But it was still about defining, exposing and navigating hierarchies of information / pages. In this context the location based breadcrumb trail made sense. As Neilson says the job of the location based breadcrumb trails is to
show the site hierarchy and if you have a simple, well-defined hierarchy why not let users see where they are in it? So location based breadcrumb trails make sense for simple sites. The problem comes with…
Complex sites and breadcrumb trails
Most modern websites are no longer built by serving static files out of folders on web servers. Instead pages are assembled on the fly as and when users request them by pulling data out of a database and content out of a CMS, munging together with feeds from other places and gluing the whole lot together with some HTML and a dash of CSS. (Actually, when I say most I have no idea of the proportions of dynamic vs static websites but all the usual suspects (Amazon, Facebook, Twitter, Flickr etc) work dynamically.) Constructing a site dynamically makes it much easier to publish many, many pages; both aggregation pages and content pages. The end result is a flatter site with more complex polyhierarchical structures that don’t fit into the traditional IA discipline of categorisation and filing.
The problem is wholly contained sets within sets within sets are a bad way to model most things. Real things in real life just don’t lend themselves to being described as leafs in a mono-hierarchical taxonomy. It’s here that I part company with Neilson who, in the same post, goes on to say:
For non-hierarchical sites, breadcrumbs are useful only if you can find a way to show the current page’s relation to more abstract or general concepts. For example, if you allow users to winnow a large product database by specifying attributes (of relevance to users, of course), the breadcrumb trail can list the attributes that have been selected so far. A toy site might have breadcrumbs like these: Home > Girls > 5-6 years > Outdoor play
There’s an obvious problem here. In real life sets are fuzzy and things can be ‘filed’ into multiple categories. Let’s pretend the toy being described by Neilson is a garden swing that’s also perfectly suited to a 5-6 year old boy. In this case journeys to the swing product page might be ‘Home > Girls > 5-6 years > Outdoor play’ or ‘Home > Boys > 5-6 years > Outdoor play’. If there’s an aggregation of all outdoor playthings there might be journeys like ‘Home > Outdoor play > Girls > 5-6 years’ and ‘Home > Outdoor play > Boys > 5-6 years. If the swing goes on sale there might be additional journeys like ‘Home > Offers > Outdoor play’ etc.
Now it’s not clear from the quote whether Neilson is only talking about breadcrumb trails on pages you navigate through on your way to the product page or also including the product page itself. But the problem remains. If the garden swing can be filed under multiple categories in your ‘site structure’ which ‘location’ does the product page breadcrumb trail display? There are 4 possible ways to deal with this:
- drop the breadcrumb trail from your product pages. But the product pages are the most important pages on the website. They’re the pages you want to turn up in search results and be shared by users. I can’t imagine it was Neilson’s intent to show crumbtrails on aggregation pages but not on content pages so…
- make the breadcrumb trail reflect the journey / attribute selection of the current user. Unless I’m misreading / misunderstanding Neilson this seems to be what he’s suggesting by
the breadcrumb trail can list the attributes that have been selected so far. Quite how this differs from a path based breadcrumb trail confuses me. Again you’re serving one page at one URI that changes state depending on where the user has ‘come from’. Again you’re choosing to fight the statelessness of HTTP. And again the whole thing fails if the user has not navigated to that page via your ‘site structure’ but instead arrived via Google or Bing or a Twitter link or a link in an email…
- Serve (almost) duplicate pages at every location the thing might be categorised under with the breadcrumb trail tweaked to reflect ‘location’. For all kinds of reasons (not least your Google juice and general sanity) serving duplicate pages is bad. It’s something you can do but really, really doesn’t come recommended.
- Serve a single page at a single RESTful URI and make a call about which of the many potential categories is the most appropriate.
The latter option can be seen in use on The Guardian website which attempts to replicate the linear content category sectioning which works so well in the print edition into an inherently non-linear web form. So the Chelsea stand by John Terry and insist he took no money article has a location breadcrumb trail of:
whereas the High court overturns superinjunction granted to England captain John Terry article has a location breadcrumb trail of:
At some point in the past it’s possible (probable?) that the superinjunction story was linked to from the homepage, the sport page, the Chelsea page, the John Terry page etc. But someone has made the call that although the article could be filed under Football or Chelsea or John Terry or Press freedom it’s actually ‘more’ a press freedom story than it is a John Terry story.
The point I’m trying to make is that breadcrumb design for a non-hierarchical site is tricky. It’s particularly tricky for news and sport where a single story might belong ‘inside’ many categories. But if you’re lucky…
It isn’t about ‘site structure’, it’s about ‘thing structure’
Traditional IA has been about structuring websites in a way that journeys through the pages of those sites make the most amount of sense to the most amount of users. The Linked Data approach moves away from that, giving URIs to real life things, shaping pages around those things and promoting journeys that mirror the real life connections between those things.
Two examples are BBC Programmes and BBC Wildlife Finder. Neither of these sites are hierarchical and the ontologies they follow aren’t hierarchical either. An episode of Doctor Who might be ‘filed’ under Series 2 or Drama or Science Fiction or programmes starring David Tennant or programmes featuring Daleks or programmes on BBC Three on the 4th February 2010. So again location based breadcrumb trails are tricky. But like The Guardian one of the many possible hierarchies is chosen to act as the breadcrumb trail:
which is echoed in the navigation box on the right of the page:
The same navigation box also allows journeys to previous and next episodes in the story arc:
The interesting point is that the breadcrumb links all point to pages about things in the ontology – not to category / aggregation pages. So it’s less about reflecting ‘site structure’ and more about reflecting the relationship between real world things. Which is far easier to map to a user’s mental models.
Wildlife finder is similar but subtly different. The location breadcrumb at the top of the page is a reflection of ‘site structure’. In the original Wildlife Finder it didn’t exist but initial user testing found that many people felt ‘lost’ in the site structure so it was added in. Subsequent user testing found that its addition solved the ‘lost’ problem. So in an input / output duality sense it’s primarily an output mechanism; it makes far more sense as a marker of where you are than a navigation device to take you elsewhere.
Much more interesting is the Scientific Classification box which reflects ‘thing structure’ (in this case the taxonomic rank of the Polar Bear), establishes the ‘location’ of the thing the page is about and allows navigation by relationships between things rather than via ‘site structure’:
- We need a new word for crumbtrails. Even seasoned UX professionals get misled by the Hansel and Gretel implications. Unfortunately ‘UX widgets that expose the location of the domain object in the ontology of things’ doesn’t quite cut it
- Secondary navigation is hard; signposting current ‘location’ to a user is particularly hard. IAs need to worry as much about ‘thing structure’ as ‘site structure’
- Building pages around things and building navigation around relationships between things makes life easier
- HTTP and REST are not techy / developer / geeky things. They’re the fundamental building blocks on top of which all design and user experience is built
As a child I loved Lego. I could let my imagination run riot, design and build cars, space stations, castles and airplanes.
My brother didn’t like Lego, instead preferring to play with Action Men and toy cars. These sorts of toys did nothing for me, and from the perspective of an adult I can understand why. I couldn’t modify them, I couldn’t create anything new. Perhaps I didn’t have a good enough imagination because I needed to make my ideas real. I wanted to build things, I still do.
Then the most exciting thing happened. My dad bought a BBC micro.
Obviously computers such as the BBC Micro were in many, many ways different from today’s Macs and if you must PCs. Obviously they were several orders of magnitude less powerful than today’s computers but, and importantly, they were designed to be programmed by the user, you were encouraged to do so. It was expected that that’s what you would do. So from a certain perspective they were more powerful.
BBC Micro’s didn’t come preloaded with word processors, spreadsheets and graphics editors and they certainly weren’t WIMPs.
They also came with two thick manuals. One telling you how to set the computer up; the other how to programme it.
This was all very exciting, I suddenly had something with which I could build incredibly complex things. I could, in theory at least, build something that was more complex than the planes, spaceships and cars which I modelled with Lego a few years before.
Like so many children of my age I cut my computing teeth on the BBC Micro. Learnt to programme computers, and played a lot of games!
Unfortunately all was not well. You see I wasn’t very good at programming my BBC micro. I could never actually build the things I had pictured in my mind’s eye, I just wasn’t talented enough.
You see Lego hit a sweet spot which those early computers on the one hand and Action Man on the other missed.
What Lego provided was reusable bits.
When Christmas or my birthdays came around I would start off by building everything suggested by the sets I was given. But I would then dismantle the models and reuse those bricks to build something new, whatever was in my head. By reusing bricks from lots of different sets I could build different models. The more sets I got given, the more things I could build.
Action men simply didn’t offer any of those opportunities, I couldn’t create anything new.
Early computers where certainly very capable of providing a creative platform; but they lacked the reusable bricks, it was more like being given an infinite supply of clay. And clay is harder to reuse than bricks.
Today, with the online world we are in a similar place but with digital bits and bytes rather than moulded plastic bits and bricks.
The Web allows people to create their own stories – it allows people to follow their nose to create threads through the information about the things that interest them, commenting, and discussing it on the way. But the Web also allows developers to reuse previously published information within new, different context to tell new stories.
But only if we build it right.
Most Lego bricks are designed to allow you to stick one brick to another. But not all bricks can be stuck to all others. Some can only be put at the top – these are the tiles and pointy bricks to build your spires, turrets and roofs. These bricks are important, but they can only be used at the end because you can’t build on top of them.
The same is true of the Web – we need to start by building the reusable bits, then the walls and only then the towers and spires and twiddly bits.
But this can be difficult – the shinny towers are seductive and the draw to start with the shiny towers can be strong; only to find out that you then need to knock it down and start again when you want to reuse the bits inside.
We often don’t give ourselves the best opportunity to womble with what we’ve got – to reuse what others make, to reuse what we make ourselves. Or to let others outside our organisations build with our stuff. If you want to take these opportunities then publish your data the webby way.