Building coherence at bbc.co.uk

Michael and I have written an article for the latest addition [pdf] of Talis’s magazine Nodalities, reproduced below. If you are interested in the process behind this then I can’t recommend enough Michael’s awesome post “How we make website” over on the BBC’s Radio Lab blog.

—-

Telling (non-linear) stories

For the past 86 years the BBC has plied its trade as a storytelling organisation. In the world of linear broadcasting we’ve even gotten very good at it. Guiding the audience through complex news story lines, explaining the natural world and, interleaved narrative arcs and the plotlines of drama has become our forte. But storytelling in a linear world is different from storytelling in the non-linear, hypertext world of the web.

Joining www.bbc.co.uk to the rest of the web (of course as @gkob points out those should be dbpedia URIs)

With the exception of BBC News Online (news.bbc.co.uk) the online world has often been seen as a supporting adjunct to the linear broadcast world. Over the years we’ve commissioned and built sites to provide online support for programmes; but we’ve too often taken our linear storytelling expertise and attempted to replicate the same techniques on the web – with mixed success. Unlike linear broadcast storylines the web doesn’t provide people with a predicted and controlled linear journey. Instead we dip in and out of any given website — following different journeys — to find the information we want at that time.

Many of our programme support sites have been commissioned and developed in isolation. So you see an Archers site and an Eastenders site and a Top Gear site which are internally coherent but which fail to link up other than via editorially determined cross promotions. Want to see who presents Top Gear? No problem, we can do that. Want to see what else those people present? Sorry, can’t do that. By developing self-contained microsites the BBC has produced some good stuff but it has also been unable to reach its full potential because it hasn’t managed to join up all of its resources. By failing to link up the content (on both a data and a user experience level) the stuff we publish can never becomes greater than the sum of its parts. Without these links we can’t make bbc.co.uk a coherent experience. As a user, it’s very difficult to find everything the BBC has published about any given subject, nor can you easily navigate across BBC domains following a particular semantic thread. For example, you can’t yet navigate from a page about a musician to a page with all the programmes that have played that artist.

So how do you tell stories on a web scale? We could stick with the easy option and try to control ‘user journeys’ across the site. Provide links to where we think the user should go next. But that’s little better than those flip a dice, go to page 30 dungeons and dragons books we all had as kids. We had to recognise that non-linear storytelling puts the narrative arc into the hands of the user. What to read, what to click, where to go next is really up to you. So storylines split and merge, meta-narratives emerge and fracture; ‘user journeys’ slip out of (editorial) control.

All of this comes from the power of the link – back to basics. But we can only provide precisely targeted links at the user experience level if those links exist at a data level. And that’s the difficult part. The organic growth of our sites has been mirrored in the organic growth of our content and data management systems. We currently have a range of systems across the business for managing different bits of content throughout the production chain. And like our public facing sites none of these speak the same language or share the same identifiers. A typical episode of Top Gear might have 6 separate identifiers on it’s way from scriptwriter to airwaves to archive. Once you’ve solved this problem you hit the problem of multiple identifiers for James May and once you’ve got one canonical James May you’re back to the problem of multiple identifiers for all the other programmes he’s presented…

Solving these problems makes for a more linked, more coherent bbc.co.uk. But an internally coherent bbc.co.uk isn’t enough. bbc.co.uk needs to be weaved into the rest of the web, not merely on the web. It needs to be linked in to all those other Top Gear / James May pages out there… Luckily the tips, tricks and techniques pioneered by the Linked Data community give us some clues here.

Add into this mix the fact that there’s some data the BBC can never hope to provide. So we know when an artist is played on radio or TV. But we can’t hope to know when they were born, or where they were born, or which bands they’ve been in, or who they’re married to etc. If we want to tell stories around music all this is important data. And we can only get it by tapping into the collective knowledge of the web.

BBC in the web of data

I’d like to claim that when we set out to develop /programmes we had the warm embrace of the semantic web in mind. But that would be a lie. We were however building on very similar philosophical foundations.

In the work leading up to bbc.co.uk/programmes we were all too aware of the importance of persistent web identifiers, permanent URIs and the importance of links as a way to build meaning. To achieve all this we broke with BBC tradition by designing from the domain model up rather than the interface down. The domain model provided us with a set of objects (brands, series, episodes, versions, ondemands, broadcasts etc) and their sometimes tangled interrelationships.

We were also convinced that the value in programme websites lay not in the implicit metadata of the domain model but rather in the way this domain model overlapped and intersected with other domains. As ever the links are more important than the nodes because that’s where the context lives: programmes:segment <features> music:track, programmes:segment <features> food:recipe etc. In this way we could weave new ‘user journeys’ into and out of /programmes, into and out of bbc.co.uk. From archive episodes no longer available online, to a recipe page, to a chef, to another recipe and back to a recent episode. Using well targeted content specific links we could not only escape the dead end content silos that characterised bbc.co.uk but point users back to programmes that would hopefully inform, educate and of course entertain.

Finally we believed in the merits of opening our data and building on top of other people’s open data. When we looked to rebuild bbc.co.uk/music we looked at a number of commercial providers of music metadata. They all did a similar job to MusicBrainz (musicbrainz.org) – similar models, similar data quality etc. But choosing to go with a commercial provider would have precluded our ability to provide any kind of machine friendly (API if you must) views. The decision to publish JSON or vanilla XML or RDF would have been a decision to give the 3rd party business model away. So we went with the open alternative – an open, public domain provider, one that is more in keeping with our public service remit and one that represents better value for money for the license fee payer – which has to be a lesson to someone.

Without ever explicitly talking RDF we’d built a site that complied with Tim Berners-Lee’s four principles for Linked Data:

Use URIs as names for things. – CHECK
Use HTTP URIs so that people can look up those names. – CHECK
When someone looks up a URI, provide useful information. – Well, if we’re only talking HTML, RSS, ATOM, JSON etc. CHECK
Include links to other URIs. so that they can discover more things. – Again if we’re talking HTML only CHECK

By keeping everything in its right place we’d also built a sane, maintainable, scalable, accessible site that search engines love and could be easily evolved to add new features and functionality. So to anyone considering how best to build websites we’d recommend you throw out the Photoshop and embrace Domain Driven Design and the Linked Data approach every time. Even if you never intend to publish RDF it just works.

Around this time we met by chance with some people from the Linking Open Data community and the two worlds collided. Obviously TBL wasn’t talking only HTML in the last 2 principles but aside from that the parallels were striking. We set about converting our programmes domain model into an RDF ontology which we’ve since published under a Creative Commons License (www.bbc.co.uk/ontologies/programmes/). Which took one person about a week. The trick here isn’t the RDF mapping – it’s having a well thought through and well expressed domain model. And if you’re serious about building web sites that’s something you need anyway. Using this ontology we began to add RDF views to /programmes (e.g. www.bbc.co.uk/programmes/b00f91wz.rdf). Again the work needed was minimal.

So for those considering the Linked Data approach we’d say that 95% of the work is work you should be doing just to build for the (non-semantic) web. Get the fundamentals right and the leap to the Semantic Web is really more of a hop.

Why bother with RDF?

For all the pages we’ve published we’ve only had a limited success at making this information available for others to use, to hack with and to build new services with. While we’ve not done a very good job of making bbc.co.uk a coherent experience for people the situation is worse for machines.

It is our belief that rather than publishing proprietary APIs it is better to use the ubiquitous technologies of URIs and HTTP. This approach supports the generative nature of the Web, making it easy for third parties to build with BBC metadata without learning BBC specific APIs and at the same time providing the BBC and its users with immediate benefits.

Services like Flickr, Twitter and the like have in many, many ways followed the same principles we adopted for programmes and music — or if they didn’t then the end results look pretty similar — they are wonderful services. However, if as a third party developer you want to deal with the semantics, accessing the data via the Giant Global Graph to find everything about a certain person, place or topic and you wanted to include data from Flickr then you will need to deal with the specifics of Flickr. I suspect that it wouldn’t be that difficult for Flickr to add RDF representations – if they did then Flickr content would be part of a common way of doing things. We want BBC data to be part of a common way of doing things.

Our hope in making BBC data available as RDF is that we will make it as generative as possible – helping others to do interesting things with our data. The BBC has a public service remit, a remit that means it should look beyond its internal business needs to help create public value around useful technologies and around its content for others to benefit from. The longer term aim of this work is to not only expose BBC data but to ensure that it is contextually linked to the wider web. We have started along this path by linking to Wikipedia (DBpedia in the RDF view) and MusicBrainz from the artist pages but this could be extended for programmes and events.