March 2009
Mon Tue Wed Thu Fri Sat Sun
« Feb   Apr »
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

Month March 2009

Linking bbc.co.uk to the Linked Data cloud

I’ve been doing a few talks recently – most recently at the somewhat confused OKCon (Open Knowledge) Conference. The audience was extremely diverse and so I tried to not only talk about what we’ve done but also introduce the concept of Linked Data and explain what it is.

Linked Data is a grassroots project to use web technologies to expose data on the web. It is for many people  synonymous with the semantic web – and while this isn’t quite true. It does, as far as I’m concerned, represent a very large subset of the semantic web project. Interestingly, it can also be thought of as the ‘the web done right’, the web as it was originally designed to be.

But what is it?

Well it can be described with 4 simple rules.

1. Use URIs to identify things not only documents

The web was designed to be a web of things with documents making assertions about those real-world things. Just as a passport or driving license, in the real world, can be thought of as providing an identifier for a person making an assertion about who they are, so URIs can be thought of as providing identifiers for people, concepts or things on the web.

Minting URIs for things rather than pages helps make the web more human literate because it means we are identifying those things that people care about.

2. Use HTTP URIs – they are globally unique and anyone can dereference them

The beauty of the web is its ubiquitous nature – it is decentralised and able to function on any platform. This is because of TimBL’s key invention the HTTP URI.

URI’s are globally unique, open to all and decentralised. Don’t go using DOI or any other identifier – on the web all you need is an HTTP URI.

3. Provide useful information [in RDF] when someone looks up a URI

And obviously you need to provide some information at that URI. When people dereference it you need to give them some data – ideally as RDF as well as HTML. Providing the data as RDF means that machines can process that information for people to use. Making it more useful.

4. Include links to other URIs to let people discover related information

And of course you also need to provide links to other resources so people can continue their journey, and that means contextual links to other resources elsewhere on the web, not just your site.

And that’s it.

Pretty simple really and other than the RDF bit, I would argue that these principles should be followed for any website – they just make sense.

But why?

Before the Web people still networked their computers – but to access those computers you needed to know about the network, the routing and the computers themselves.

For those in their late 30s you’ll probably remember the film War Games – because this was written before the Web had been invented David and Jennifer the two ‘hackers’ had to find and connect directly to each computer; they had to know about the computer’s location.

Phoning up another computer

War Games, 1983

The joy of the web is that it adds a level of abstraction – freeing you from the networking, routing and server location – it lets you focus on the document.

Following the principles of Linked Data allows us to add a further level of abstraction – freeing us from the document and letting us focus on the things, people and stuff that matters to people. It helps us design a system that is more human literate, and more useful.

This is possible because we are identifying real world stuff and the relationships between them.

Free information from data silos

Of course there are other ways of achieving this – lots of sites now provide APIs which is good just not great. Each of those APIs tend to be proprietary and specific to the site. As a result there’s an overhead every time someone wants to add that data source.

These APIs give you access to the silo – but the silo still remains. Using RDF and Linked Data means there is a generic method to access data on the web.

What are we doing at the BBC?

First up it’s worth pointing out the obvious: the BBC is a big place and so it would be wrong to assume that everything we’re doing online is following these principles. But there’s quite a lot of stuff going on that does.

We do have – BBC’s programme support, music discovery and, soon, natural history content all adopting these principles. In other words persistent HTTP URIs that can be dereferenced to HTML, RDF, JSON and mobile views for programmes, artists, species and habitats.

We want HTTP URIs for every concept, not HTML webpage – an individual page is made up of multiple resource, multiple concepts. So for example an artist page transcludes the resource ‘/:artist/news’ and ‘/:artist/reviews’ – but those resources also have their own URIs. If they didn’t they wouldn’t be on the web.

Also because there’s only one web we only have one URI for a resource but a number of different representation for that resource. So the URI for the proggramme ‘Nature’s Great Events’ is:

bbc.co.uk/programmes/b00ht655#programme

Through content negotiation we will able to server an HTML, RDF, or mobile document to represent that programme.

We then need to link all of this stuff up within the BBC. So that, for example, you can go from a tracklist on an episode page of Jo Whiley on the Radio 1 site to the U2 artist page and then from there to all episodes of Chris Evans which have played U2. Or from an episode of Nature’s Great Events to the page about Brown Bears to all BBC TV programmes about Brown Bears.

But obviously the BBC is only one corner of the web. So we also need to link with the rest of the web.

Because we’re now thinking on a webscale we’ve started to think about the web as a CMS.

Where URIs already exist to represent that concept we are using it rather than minting our own. The new music site transcludes and links back to Wikipedia to provide biographical information about an artist. Rather than minting our own URI for artist biographic info we use Wikipedia’s.

Likewise when we want to add music metadata to the music site we add MusicBrainz.

What does the history of the web tell us about its future?

Following my invitation to speak at the WWW@20 celebrations [my bit starts about 133 minutes into the video] – this is my attempt to squash the most interesting bits into a somewhat coherent 15 minute presentation.

20 years ago Tim Berners-Lee was working, as a computer scientist, at CERN. What he noticed was that, much like the rest of the world, sharing information between research groups was incredibly difficult. Everyone had their own document management solution, running on their own flavour of hardware over different protocols.  His solution to the problem was a lightweight method of linking up existing (and new) stuff over IP – a hypertext solution – which he dubbed the World Wide Web – and documented in a memo “Information Management: A Proposal“.

Then for a year or so nothing happened. Nothing happened for a number of reasons, including the fact that IP, and the ARPANET before that, was popular in America but less so in Europe. Indeed senior managers at CERN had recently sent out a memo to all department heads reminding them that IP wasn’t a supported protocol – people were being told not to use it!

Also because CERN was full of engineers everyone thought they could build their own solution, do better than what was already there – no one wanted to play together. And of course because CERN was there to do particle physics not information management.

Then TimBL got his hands on a NeXT Cube – officially he was evaluating the machine not building a web server – but, with the support of his manager, that’s what he did — build the first web server and client. There then ensued a period of negotiation to get the idea out freely, for everyone to use, which happened in 1993. This coincided, more or less, with the University of Minnesota’s decision to charge a license fee for Gopher. Then the web took off especially in the US where IP was already popular.

first web server

The Worlds first Webserver.

The beauty of TimBL’s proposal was it’s simplicity – it was designed to work on any platform and importantly with the existing technology. The team knew that to make it work it had to be as easy as possible. He only wanted people to do one thing, that one thing was to give their resources identifiers – links – URIs; so information could be linked and discovered.

This is then is the key invention – the URL.

To make this work URLs were designed to work with existing protocols, in particular it needed to work with FTP and Gopher. That’s why there’s a colon in the URL — so that URLs can be given for stuff that’s already available via other protocols. As an aside, TimBL’s said his biggest mistake was the inclusion of // in the URL — the idea was that one slash meant the resource is on the local machine and two somewhere else on the web, but because everyone used http://foo.bar it means the second / is redundant. I love that this is TimBL’s biggest mistake.

He also implemented a quick tactical solution to get things up and running and demonstrate what he was talking about — HTML. HTML was originally just one of a number of supported doctypes – it wasn’t intended to be the doctype but HTML took off because it was easy. Apparently the plan was to implement a mark-up language that worked a bit like the NeXT application builder. But they didn’t get round to it before Mosaic came along with the first browser (TimBL’s first client was a browser-editor) and then it was all too late. And we’ve been left with something so ugly I doubt even it’s parents love it.

The curious thing, however, is that if you read the original memo — despite its simplicity — it’s clear that we’re still implementing it, we’re still working on the the original spec. Its just that we’ve tended to forget what it said or decided to get sidetracked for a while with some other stuff. So forget about Web 2.0.

For example, the original Web was read-write. Not only that but it used style sheets and a WYSIWYG editing interface — no tags, no mark-up. They didn’t think anyone would want to edit the raw mark-up.

The first web site was read and write

The first web site was read and write

You can also see that the URL’s hidden, you get to it via a property dialog.

This is because the whole point of the web is that it provides a level of abstraction, allowing you to forget about the infrastructure, the servers and the routing. You only needed to worry about the document. For those who remember the film War Games — you will remember that they had to ‘phone up individual computers — they needed this networking information to access the computer, they needed to know its location before they could use it. The beauty of the Web and the URL is that the location shouldn’t matter to the end user.

URIs are there to provide persistent identifiers across the web — they’re not a function of ownership, branding, look and feel, platform or anything else for that matter.

The original team described CERN’s IT ecosystem as a zoo because there were so many different flavours of hardware, different operating systems and protocols in use. The purpose of the web was to be ubiquitous, to work on any machine, open to everyone. It was designed to work no matter what machine or operating system you’re running. This is, of course, achieved by having one identifier, one HTTP URI and defererence that to the appropriate document based on the capacities of that machine.

We should be adopting the same approach today when it comes to delivery to mobile, IPTV, connected devices etc. — we should have one URI for a resource and allow the client to request the document it needs. As Tim intended. The technology is there to do this — we just don’t using it very often.

The original memo also talked about linking people, documents, things and concepts, and data. But we are only now getting around to building it. Through technologies such as OpenID and FOAF we can give people identifiers on the web and describe their social graph, the relationships between those people. And through RDF we can publish information so that machines can process it, describing the nature of and the relationship between the different nodes of data.

Information Management: A Proposal

Information Management: A Proposal by Tim Berners-Lee

The original memo described, and the original server supported, link typing so that you could describe not only real word things but also the nature of the relationship between those things. Like RDF and HTML 5 now does, 20 years later. This focus on data is all a good idea because it lets you treat the web like a giant database. Making computers human literate by linking up bits of data so that the tools, devices and apps connected to the web can do more of the work for you, making it easier to find the things that interest you.

The semantic web project – and TimBL’s original memo – is all about helping people access data in a standard fashion so that we can add another level of abstraction – letting people focus on the things that matter to them. This is what, I believe, we should be striving for for the web’s future because I agree with Dan Brickley, to understand the future of the web you first need to understand it’s origins.

Don’t think about HTML documents – think about the things and concepts that matter to people and give each it’s own identifier, it’s own URI and then put in place the technology to dereference that URI to the document appropriate to the device. Whether that be a desktop PC, a mobile device, an IPTV or third party app.

Interesting stuff from around the web 2009-03-20

Ben Seagal, Tim Berners-Lee and Robert Calliau with the WWW proposal and first webserver at the WWW@20 celebrations, CERN

Ben Seagal, Tim Berners-Lee and Robert Calliau with TimBL's original proposal and first webserver at the WWW@20 celebrations, CERN

Semantic web news

Linked Data? Web of Data? Semantic Web? WTF? [Tom Heath]
“Think about HTML documents; when people started weaving these together with hyperlinks we got a Web of documents. Now think about data. When people started weaving individual bits of data together with RDF triples (that expressed the relationship between these bits of data) we saw the emergence of a Web of data. Linked Data is no more complex than this – connecting related data across the Web using URIs, HTTP and RDF.”

The Programmes Ontology [BBC]
Yves has updated the programmes ontology to handle “temporal annotations” tracklistings and segments and outlets etc.

Twitter news

The Twitter Global Mind [Rocketboom]
Don’t understand what all the fuss about Twitter? Watch this. Yes it’s about social networking and communication but it’s also about realtime search.

Twitter to begin charging brands for commercial use [Brand Republic News]
Co-founder Biz Stone told Marketing: ‘We are noticing more companies using Twitter and individuals following them. We can identify ways to make this experience even more valuable and charge for commercial accounts.’ He would not be drawn on the level of charges.

Some interesting visualisations

Depressing Project of the Day: Stock Market, Set to Music with Microsoft Songsmith [Create Digital Music]
Thanks to Yves. The failing economy set to music.

Periodic Table of Typefaces on the Behance Network [behance.net]
“The Periodic Table of Typefaces is obviously in the style of all the thousands of over-sized Periodic Table of Elements posters hanging in schools and homes around the world. This particular table lists 100 of the most popular, influential and notorious typefaces today. As with traditional periodic tables, this table presents the subject matter grouped categorically. The Table of Typefaces groups by families and classes of typefaces: san-serif, serif, script, blackletter, glyphic, display, grotesque, realist, didone, garalde, geometric, humanist, slab-serif and mixed.”

The open web

What is the Open Platform? [guardian.co.uk]
“The Open Platform is the suite of services that make it possible for guardian.co.uk to build applications with the Guardian…” very nice, I hope others follow. I also wish the Beeb recognized it’s open projects (recognized internally that is).

RadioAunty feature update – twitter, scheduling and much more [whomwah]
RadioAunty is Mac app that allows you to listen to live and catchup BBC Radio. It’s a lovely app and is built on an open BBC platform :)

Monty Python DVD sales soar thanks to YouTube clips [guardian.co.uk]
“Within days of the launch of the official Monty Python YouTube channel, sales of the DVD box set had gone up by 16,000% on Amazon”

Designing for your least able user [BBC Radio Labs]
Michael’s mighty post on SEO, accessibility and the joy of links. Read it.

Follow

Get every new post delivered to your Inbox.

Join 1,236 other followers

%d bloggers like this: