Linking bbc.co.uk to the Linked Data cloud

I’ve been doing a few talks recently – most recently at the somewhat confused OKCon (Open Knowledge) Conference. The audience was extremely diverse and so I tried to not only talk about what we’ve done but also introduce the concept of Linked Data and explain what it is.

Linked Data is a grassroots project to use web technologies to expose data on the web. It is for many people  synonymous with the semantic web – and while this isn’t quite true. It does, as far as I’m concerned, represent a very large subset of the semantic web project. Interestingly, it can also be thought of as the ‘the web done right’, the web as it was originally designed to be.

But what is it?

Well it can be described with 4 simple rules.

1. Use URIs to identify things not only documents

The web was designed to be a web of things with documents making assertions about those real-world things. Just as a passport or driving license, in the real world, can be thought of as providing an identifier for a person making an assertion about who they are, so URIs can be thought of as providing identifiers for people, concepts or things on the web.

Minting URIs for things rather than pages helps make the web more human literate because it means we are identifying those things that people care about.

2. Use HTTP URIs – they are globally unique and anyone can dereference them

The beauty of the web is its ubiquitous nature – it is decentralised and able to function on any platform. This is because of TimBL’s key invention the HTTP URI.

URI’s are globally unique, open to all and decentralised. Don’t go using DOI or any other identifier – on the web all you need is an HTTP URI.

3. Provide useful information [in RDF] when someone looks up a URI

And obviously you need to provide some information at that URI. When people dereference it you need to give them some data – ideally as RDF as well as HTML. Providing the data as RDF means that machines can process that information for people to use. Making it more useful.

4. Include links to other URIs to let people discover related information

And of course you also need to provide links to other resources so people can continue their journey, and that means contextual links to other resources elsewhere on the web, not just your site.

And that’s it.

Pretty simple really and other than the RDF bit, I would argue that these principles should be followed for any website – they just make sense.

But why?

Before the Web people still networked their computers – but to access those computers you needed to know about the network, the routing and the computers themselves.

For those in their late 30s you’ll probably remember the film War Games – because this was written before the Web had been invented David and Jennifer the two ‘hackers’ had to find and connect directly to each computer; they had to know about the computer’s location.

Phoning up another computer
War Games, 1983

The joy of the web is that it adds a level of abstraction – freeing you from the networking, routing and server location – it lets you focus on the document.

Following the principles of Linked Data allows us to add a further level of abstraction – freeing us from the document and letting us focus on the things, people and stuff that matters to people. It helps us design a system that is more human literate, and more useful.

This is possible because we are identifying real world stuff and the relationships between them.

Free information from data silos

Of course there are other ways of achieving this – lots of sites now provide APIs which is good just not great. Each of those APIs tend to be proprietary and specific to the site. As a result there’s an overhead every time someone wants to add that data source.

These APIs give you access to the silo – but the silo still remains. Using RDF and Linked Data means there is a generic method to access data on the web.

What are we doing at the BBC?

First up it’s worth pointing out the obvious: the BBC is a big place and so it would be wrong to assume that everything we’re doing online is following these principles. But there’s quite a lot of stuff going on that does.

We do have – BBC’s programme support, music discovery and, soon, natural history content all adopting these principles. In other words persistent HTTP URIs that can be dereferenced to HTML, RDF, JSON and mobile views for programmes, artists, species and habitats.

We want HTTP URIs for every concept, not HTML webpage – an individual page is made up of multiple resource, multiple concepts. So for example an artist page transcludes the resource ‘/:artist/news’ and ‘/:artist/reviews’ – but those resources also have their own URIs. If they didn’t they wouldn’t be on the web.

Also because there’s only one web we only have one URI for a resource but a number of different representation for that resource. So the URI for the proggramme ‘Nature’s Great Events’ is:

bbc.co.uk/programmes/b00ht655#programme

Through content negotiation we will able to server an HTML, RDF, or mobile document to represent that programme.

We then need to link all of this stuff up within the BBC. So that, for example, you can go from a tracklist on an episode page of Jo Whiley on the Radio 1 site to the U2 artist page and then from there to all episodes of Chris Evans which have played U2. Or from an episode of Nature’s Great Events to the page about Brown Bears to all BBC TV programmes about Brown Bears.

But obviously the BBC is only one corner of the web. So we also need to link with the rest of the web.

Because we’re now thinking on a webscale we’ve started to think about the web as a CMS.

Where URIs already exist to represent that concept we are using it rather than minting our own. The new music site transcludes and links back to Wikipedia to provide biographical information about an artist. Rather than minting our own URI for artist biographic info we use Wikipedia’s.

Likewise when we want to add music metadata to the music site we add MusicBrainz.

9 responses to “Linking bbc.co.uk to the Linked Data cloud”

  1. […] other words I want daytum.com to be following the Linked Data principles rather than an ajax only […]

  2. […] Scott has a presentation on Linking bbc.co.uk to the Linked Data cloud and the article  DBpedia Examples using Linked Data and Sparql provides a simple example of using […]

  3. […] you’re new to the concepts of linked open data I can recommend a couple of great blog posts, but I’m also going to attempt to give a basic overview […]

  4. […] data in the PBS enterprise. It makes me think that perhaps having a session on what the BBC are doing with Linked Data would be a useful? This was written by ed. Posted on Monday, March 22, 2010, at […]

  5. […] information, and relationships between objects. Further, the BBC have done or enabled some exciting linked-data based projects that expose the programme catalog, mash-up BBC content with user-generated content, and […]

  6. Fascinating. I appreciate the human-literate approach to data connectedness. This is a movement I can join.

  7. Hi Tom,

    I am interested in the alternate serialisations you implemented in the BBC website. Allowing to add .xml or .json at the end of the URI to offer the content in those formats.

    Could you please point me at some resources of how to develop this? I am using a LAMP platform not your cool Perl-on-Rails :-)

    Thanks a lot

  8. Ian thanks for your cenmomts to my useless and distracting blog post. Indeed marketers are a nuisance because one really doesn’t need marketing if the product sells itself. Look at Apple etc. So a few qualifications first: number one I have been a believer in the definitional boundaries as well as stack of the semantic web via RDF and OWL etc. And I agree with you that we should really focus on what we know how to achieve. In that process though I believe we need some crisp definition or label (the good old elevator pitch for business-close nuisances like me) so that more companies, more sites, more consumers understand both the value of breaking the data siloes and adopt it. But I also believe that 90 percent of the existing web isn’t quite there yet. And frankly if we intend to be successful at this to ears that aren’t either tech or haven’t read Weaving the Web, we might just want to pitch it without the word URI in it. Not because we should be ashamed but only because one cannot expect this type of audience to be sold on something they must first understand what the stack is in the first place. Are marketing talk bs? Perhaps. Does one need marketing when she has the best product in the world? Nope. But what about if we don’t quite have the best product or have to sell it to other groups and not just each other? My intention with hosting the debate on my blog wasn’t to increase traffic nor to dig the old hatch of Lsemantic web definitional wormholes or even web 3.0. Frankly if were successful in turning around this into a mainstream success I don’t care what you call it and as long as it achieves what its supposed to achieve we can call it the unnamed (I should probably switch to this for the remainder of my months posts, you know similar to the artist formerly known as semantic web ). But my fear is that while on the road to success and while doing what works we might need to explain it in a fair consistent and crisp way. If Linked Data, so be it but my hunch is that as Kingsley mentioned in many a private and public talk, having data open and linked (RDF or not) is a prerequisite to be able to do anything serendipitous and of more information richness and meaning. Only the bottom of the stack not the whole of it. To anyone saying if you’re not RDF you’re anti-web is both unrealistic and counterproductive. In fact all you’d want at a minimum for you to be even more mainstream is for these old web companies and data open on the web. What does it matter if its in RDF or not? Your company and I am sure Kingsleys as well can process and convert that data to RDF is you strongly believe that’s what needs to happen from a technology perspective. In fact I would surmise itd be better for you for them, your data sources, to only be on the web and open. You can do the rest and in fact you can make both more money than those fools. So in other words: the needed here is open data. The semantic , linked etc pieces of this puzzle aren’t needed from your clients or data sources because you can semanticize or link it yourself in your own web. This of course wouldn’t be the best but would be a better start than to convince people there is only one way or the highway. I still think that discussions like this if bound by non emotional or sectarist arguments can be pretty useful brain and argument teaser. Its about the process as its just as important as its outcome, the product. And quite frankly as much as we like to command what is the best way to do a certain thing, the unnammed thing will both name itself and define itself one way or another. That’s the beauty of the web whether semantic or linked or not. Unless we adopt a very Che stance (counterproductive as well) why shouldn’t we work and partner with our audiences (whther consumers or potential clients or anti-web sites) to get them to achieve together what we all see as the value of open data? In the end its not about RDF or URIs is it? Its really about open data on the web. Dumb marketers like me with million dollar budgets and executive power to do it must understand the business value (meaning what’s the ultimate value whether were using a technical implementation or another) BEFORE I can commit to one particular way to implement.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: