I’ve really been neglecting this blog recently – apologies but my attention has been elsewhere recently. Anyway, while I get round to actually writing something here’s a presentation I gave at the Online Information Conference recently.
The presentation is largely based upon the article Michael and I wrote for Nodalities this time last year.
The BBC’s Natural History Unit is responsible for some of the BBC’s most loved TV and radio programming — unfortunately until now it’s only been accessible as part of the regular schedule or via iPlayer. I say until now because today we launched the first phase of a new project which brings clips from the best of the NHU’s programmes online.
Over the last few months we’ve been plundering the NHU’s archive to find the best bits — segmenting the TV programmes, tagging them (with DBpedia terms) and then aggregating them around URIs for the key concepts within the natural history domain; so that you can discover those programme segments via both the originating programme and via concepts within the natural history domain — species, habitats, adaptations and the like.
The segments/ clips ‘belong’ to their originating programme — and as a result we’ve been adding information, about a bunch of programmes from thearchive, to PIPs (the underlying database behind iPlayer and /programmes). The clip pages aren’t yet linked in with their owning episode, but they will be soon.
In addition to being able to discover these clips from within the context of the programme we are also providing URIs to aggregate information around the natural history domain, that is URIs for species, habitats, adaptations and ecozones.
Our hope is that by providing highly inter-linked, URIs we can help people gain a greater understanding of the natural world. For example, by being able to see the different animals and habitats that live within differentecozones you can gain an understanding of the diversity of of life in different parts of the world; or what different animals make up the Mammal or BirdClass; or more about a particular adaptation.
You might have noticed that the slugs for our URIs (the last bit of the URL) are the same as those used by Wikipedia and DBpedia that’s because I believe in the simple joy of webscale identifiers, you will also see that much like the BBC’s music site we are transcluding the introductory text from Wikipedia to provide background information for most things. This also means that we are creating and editing Wikipedia articles where they need improving (of course you are also more than welcome to improve upon the articles).
Adopting this approach means that we are able to contribute distinctive content to the Web while at the same time helping people find what is already there.
There is a lot more we need to do, including linking in with current programmes and making everything available as RDF, JSON and for mobile devices. That’s all on it’s way but in the meantime I hope you find what’s there useful, informative and entertaining.
Digital Revolution, a new BBC TV programme, was launched last Friday. Due to be broadcast next year, the programme will be looking back over the first 20 years of the web and considering what the future might hold. The show will be considering how the web has changed society and the implications for things like security, privacy and the economy.
Unlike — well probably every other TV programme I’ve ever come across — each programme will be influenced and debated on the web during it’s production. Some of rushes and interviews will be made available on the web (under permissive terms) so that anyone can contribute to the debate, helping to shape the final programme.
Anyway… the presentations were very cool, and while I tweeted the best bits on the day I thought I would write up a short post summing it all up. You know, contributing to the debate and all that.
The thing that struck me most were the discussions and points made around the way in which the web has provided a platform for creativity, and the risks to it’s future because of governments’ failure to understand it (OK, the failure to understand it is my interpretation, not the view expressed by the speakers).
To misquote TimBL: The web should be like paper. Government should be able to prosecute if you misuse it, but they shouldn’t limit what you are able to do with it. When you buy paper you aren’t limited in what can be written or drawn on it, the and like paper the Internet shouldn’t be set up in such a way as to constrain it’s use.
The reason this is important is because it helps to preserve the web’s generative nature. TimBL points out that people are creative, they simply need platform for that creativity, and if that platform is to be the Web then it needs to support everyone, anyone should be able to express that creativity and that means it needs to be open.
As an aside there was a discussion as to whether or not access to the Internet is a ‘human right’ — I’m not sure whether it is or not, but it’s worth considering whether or not if everyone had access to the Web whether it could be used to solve problems in the developing world. For example, by allowing communities to share information on how to dig wells and maintain irrigation systems, information on health care and generally providing educational material. It is very easy, for us in the West to think of the Web as synonymous with the content and services currently provided on it and whether they would be useful in developing countries. But the point really should be if anyone, anywhere in the world where able to create and share information what would they do with it? My hope would be that the services offered would reflect local needs — whether that be social networking in US colleges or water purification in East Africa.
Of course being open and free for all to use doesn’t mean that everything on the web will be wonderful, or indeed legal; no more so than paper ensures wonderful prose because it is open. Or as TimBL puts it:
Just because you can read everything out there doesn’t mean you should. If you found a piece of paper blowing in the wind you wouldn’t expect it to be edifying.
But what does open mean?
Personally I think that an open web is one that seeks to preserve it’s generative nature. But the discussion last Friday also focused on the implications for privacy and snooping.
Governments the world over, including to our shame the current UK Government, are seeking to limit the openness of the web; that is rather than addressing the specific activities that happen on the web, they are seeking to limit the very platform itself. ISPs around the world, at the behest of governments, are being asked to track and record what you do on the web, everything you do on the web. Elsewhere, content is being filtered, traffic shaped and sites blocked.
The sorts of information being collected can include your search terms (pinned to your IP address) and the sites you visit. Now for sure this might, sometime include a bunch of URIs that point to illegal and nefarious activity, but it might also include (indeed it’s more likely to include) URIs relating to a medical condition or legal advice or a hundred and one other, perfectly legal but equally personal bits of information.
Should a government, its agencies or an ISP be able to capture, store and analyses this data? Personally I think not. And should you think that I’m just being a scaremonger have a read of Bill’s post “The digital age of rights” about the French government’s HADOPI legislation.
On the day Bill Thompson (who, by the way, was on blinding form) summed up the reason why when he summed up his hopes for the web thus:
I hoped that the web would help us know our neighbours better, so that we didn’t go and kill them. That hasn’t happened but it does now mean it’s much harder to get away with it – the world will now know if you do kill them.
Governments know this, which is why some now try to lock down access to the Internet when there is civil unrest in their country. And it is also why the rest of the web tries to help them break though.
Few Western governments, would condone the activities of such Totalitarian states. But it is interesting to consider whether Western governments would support North Korea or Iran setting up the kinds of databases currently being debated in Europe and the States. Now they might point out that the comparison isn’t a fair one since they are nice, democratic governments not nasty oppressive ones. But isn’t that painfully myopic? How do they know who will be in power in the future? How do they know how future governments might seek to use the information they are gathering now?
Seeking to prevent snooping on the Internet aside there is another reason why the web should remain open, and it is the reason why it’s important to fight for One Web.
Susan Greenfield quite rightly pointed out that ‘Knowledge is to be found by creating context, links between facts; it’s the context that counts’. Although she was making the point in an attempt to take a swipe at the Web, trying to suggest that the web is no more than a collection of facts devoid of context, it seems to me that in fact the web is the ultimate context machine. (One sometimes wonders whether she has ever actually used any of the services she complains about, indeed I wonder if she uses the web at all).
The web is, as the name suggest, a set of interconnected links. Those URIs and the links between, as TimBL reminded us, are made by people, they are followed by people and as such you can legitimately think of the Web as humanity connected.
URIs are incredibly powerful, particularly when they are used to identify things in addition to documents. When they are used to identify things (dereferencing to the appropriate data or document format) they can lead to entirely new ways to access information. An example highlighted by TimBL is the impact they might have on TV channels and schedules.
He suggested that the concept of a TV channel was limited and that it would be replaced with complete random access. When anyone, anywhere in the world, can follow a URI to a persistent resource (note he didn’t say click on a link) then the TV channel as a means of discovery and recommendation will be replaced with a trust network. “My friends have watched this, most of them like it…” sort of thing.
Of course to get there we need to change the way we think about the web and the way in which we publish things. And here TimBL pointed to the history of the web, suggesting that the next digital revolution will operate in a similar fashion.
The web originally happened not because senior management thought it was a good idea – it happened because people who ‘got it’ thought it was cool, that it was the right thing and that they were lucky enough to have managers that didn’t get in the way. Indeed this is exactly what happened when TimBL wrote the first web server and client and then when the early web pioneers started publishing web pages. They didn’t do it because they were told to, they didn’t do it because there was any immediate benefit. They did it because they thought that by doing it it would enable cool things to happen. The last couple of years suggests that we are on the cusp of a similar revolution as people start to publish linked data which will in turn result in a new digital revolution.
It’s starting to feel like the world has suddenly woken up to the whole Linked Data thing — and that’s clearly a very, very good thing. Not only are Google (and Yahoo!) now using RDFa but a whole bunch of other things are going on, all rather exciting, below is a round up of some of the best. But if you don’t know what I’m talking about you might like to start off with TimBL’s talk at TED.
The BBC has announced a couple SPARQL endpoints, hosted by talis and openlink
Both platforms allow you to search and query the BBC data in a number of different ways, including SPARQL — the standard query language for semantic web data. If you’re not familiar with SPARQL, the Talis folk have published a tutorial that uses some NASA data.
A social semantic BBC?
Nice presentation from Simon and Ben on how social discovery of content could work… “show me the radio programmes my friends have listen to, show me the stuff my friends like that I’ve not seen” all built on people’s existing social graph. People meet content via activity.
PriceWaterhouseCooper’s spring technology forecast focuses on Linked Data [pwc.com]
“Linked Data is all about supply and demand. On the demand side, you gain access to the comprehensive data you need to make decisions. On the supply side, you share more of your internal data with partners, suppliers, and—yes—even the public in ways they can take the best advantage of. The Linked Data approach is about confronting your data silos and turning your information management efforts in a different direction for the sake of scalability. It is a component of the information mediation layer enterprises must create to bridge the gap between strategy and operations… The term “Semantic Web” says more about how the technology works than what it is. The goal is a data Web, a Web where not only documents but also individual data elements are linked.”
For the last five or so months I’ve been working on a new set of sites under the umbrella of “BBC Earth” — a programme of work aimed at giving everyone access to some of the best natural history content in the world. The project is made up of three complementary and interlinked projects, the first couple of which recently went live.
The first site to go live, “Out of the Wild” aims to bring you a view on the natural world from the perspective of our crews while on location; a sort of “From our correspondent” for the natural world. The stories — a mix of short video clips, slideshows and text based stories — are all grouped around the expeditions, the people on location and the originating programmes. Our hope is that you will enjoy this more personal view of the natural world brought to you from some of the most amazingpartoftheworld by the worlds best wildlife documentaries makers.
We then launched “Earth News” which does pretty much what is says on the tin — news about the natural world.
We’re both aggregating natural history news articles from elsewhere on the BBC news site as well as new articles (some unique) written for Earth News, such as the story of the adult king penguin which kidnapped a skua chick and then attempted to raise it.
The final part of BBC Earth will see us starting to open up the BBC archive in, what I hope, will be interesting and useful ways.
I’ve been doing a few talks recently – most recently at the somewhat confused OKCon (Open Knowledge) Conference. The audience was extremely diverse and so I tried to not only talk about what we’ve done but also introduce the concept of Linked Data and explain what it is.
Linked Data is a grassroots project to use web technologies to expose data on the web. It is for many people synonymous with the semantic web – and while this isn’t quite true. It does, as far as I’m concerned, represent a very large subset of the semantic web project. Interestingly, it can also be thought of as the ‘the web done right’, the web as it was originally designed to be.
But what is it?
Well it can be described with 4 simple rules.
1. Use URIs to identify things not only documents
The web was designed to be a web of things with documents making assertions about those real-world things. Just as a passport or driving license, in the real world, can be thought of as providing an identifier for a person making an assertion about who they are, so URIs can be thought of as providing identifiers for people, concepts or things on the web.
Minting URIs for things rather than pages helps make the web more human literate because it means we are identifying those things that people care about.
2. Use HTTP URIs – they are globally unique and anyone can dereference them
The beauty of the web is its ubiquitous nature – it is decentralised and able to function on any platform. This is because of TimBL’s key invention the HTTP URI.
URI’s are globally unique, open to all and decentralised. Don’t go using DOI or any other identifier – on the web all you need is an HTTP URI.
3. Provide useful information [in RDF] when someone looks up a URI
And obviously you need to provide some information at that URI. When people dereference it you need to give them some data – ideally as RDF as well as HTML. Providing the data as RDF means that machines can process that information for people to use. Making it more useful.
4. Include links to other URIs to let people discover related information
And of course you also need to provide links to other resources so people can continue their journey, and that means contextual links to other resources elsewhere on the web, not just your site.
And that’s it.
Pretty simple really and other than the RDF bit, I would argue that these principles should be followed for any website – they just make sense.
Before the Web people still networked their computers – but to access those computers you needed to know about the network, the routing and the computers themselves.
For those in their late 30s you’ll probably remember the film War Games – because this was written before the Web had been invented David and Jennifer the two ‘hackers’ had to find and connect directly to each computer; they had to know about the computer’s location.
The joy of the web is that it adds a level of abstraction – freeing you from the networking, routing and server location – it lets you focus on the document.
Following the principles of Linked Data allows us to add a further level of abstraction – freeing us from the document and letting us focus on the things, people and stuff that matters to people. It helps us design a system that is more human literate, and more useful.
This is possible because we are identifying real world stuff and the relationships between them.
Free information from data silos
Of course there are other ways of achieving this – lots of sites now provide APIs which is good just not great. Each of those APIs tend to be proprietary and specific to the site. As a result there’s an overhead every time someone wants to add that data source.
These APIs give you access to the silo – but the silo still remains. Using RDF and Linked Data means there is a generic method to access data on the web.
What are we doing at the BBC?
First up it’s worth pointing out the obvious: the BBC is a big place and so it would be wrong to assume that everything we’re doing online is following these principles. But there’s quite a lot of stuff going on that does.
We do have – BBC’s programme support, music discovery and, soon, natural history content all adopting these principles. In other words persistent HTTP URIs that can be dereferenced to HTML, RDF, JSON and mobile views for programmes, artists, species and habitats.
We want HTTP URIs for every concept, not HTML webpage – an individual page is made up of multiple resource, multiple concepts. So for example an artist page transcludes the resource ‘/:artist/news’ and ‘/:artist/reviews’ – but those resources also have their own URIs. If they didn’t they wouldn’t be on the web.
Also because there’s only one web we only have one URI for a resource but a number of different representation for that resource. So the URI for the proggramme ‘Nature’s Great Events’ is:
Through content negotiation we will able to server an HTML, RDF, or mobile document to represent that programme.
We then need to link all of this stuff up within the BBC. So that, for example, you can go from a tracklist on an episode page of Jo Whiley on the Radio 1 site to the U2 artist page and then from there to all episodes of Chris Evans which have played U2. Or from an episode of Nature’s Great Events to the page about Brown Bears to all BBC TV programmes about Brown Bears.
But obviously the BBC is only one corner of the web. So we also need to link with the rest of the web.
Because we’re now thinking on a webscale we’ve started to think about the web as a CMS.
Where URIs already exist to represent that concept we are using it rather than minting our own. The new music site transcludes and links back to Wikipedia to provide biographical information about an artist. Rather than minting our own URI for artist biographic info we use Wikipedia’s.
Likewise when we want to add music metadata to the music site we add MusicBrainz.
Last Friday saw the 20th anniversary of the Web — well if not the web as such then TimBL’s proposal for an information management system. To celebrate the occasision CERNhosted a celebration which I was honoured to be invited to speak at, by the big man no less! I’ll write up some more about the event itself, but in the meantime here are my slides.
I’ve also posted some photos of the event up on Flickr.