Category Work

Interesting stuff from around the web 2009-04-22

Amazing render job by Alessandro Prodan

Amazing render job by Alessandro Prodan

The open web

Does OpenID need to be hard? [factoryjoe.com]
Chris considers “the big fat stinking elephant in the room: OpenID usability and the paradox of choice” as usual it’s a good read.

I wonder whether restricting the OpenID providers displayed based on visited link would help? i.e. hide those that haven’t been visited? It clearly wouldn’t be perfect – Google isn’t my OpenID provider but I visit google.com lots, but it should cut down some of the clutter.

Security flaw leads Twitter, others to pull OAuth support [cnet.com]
The hole makes it possible for a hacker to use social-engineering tactics to trick users into exposing their data. The OAuth protocol itself requires tweaking to remove the vulnerability, and a source close to OAuth’s development team said that there have been no known violations, that it has been aware of it for a few days now, and has been coordinating responses with vendors. A solution should be announced soon.

Twitter and social networks

Relationship Symmetry in Social Networks: Why Facebook will go Fully Asymmetric [bokardo.com]
Asymmetric model better mimics how real attention works…and how it has always worked. Any person using Twitter can have a larger number of followers than followees, effectively giving them more attention than they give. This attention inequality is the foundation of the Twitter service… The IA of Facebook does not allow this. Facebook has designed a service that forces you to keep track of your friends, whether you want to or not. Facebook is modeling personal relationships, not relationships based on attention. That’s the crucial difference between Facebook and Twitter at the moment.

When Twitter Gets Weird… [Dave Gorman]
“The difference between following someone and replying to them is the difference between stopping to chat with someone in the street or giving them a badge declaring that you know them. One is actual interaction. The other is just something you can show your friends.” Blimey – Dave Gorman clearly has a much better grasp of life, the web and being a human than the two people who attacked him for not following them on Twitter. As Dave points out he hopes that Twiiter doesn’t descend into the MySpace “thanks for the add’ nonsense”. Me too.

Google profiles included in search results [googleblog]
A new “Profile results” section will appear at the bottom of a Google search page, when it finds a strong match in response to a name-based search. But only in the US. To help things along remember to use rel=me elsewhere (here’s how).

Shortlisted for a BAFTA, launch of clickable tracklistings and the start of BBC Earth

Look, look clickable tracklistings, w00t!
Few will every know the pain to get this useful little (cross domain) feature live.

We’ve been shortlisted for an Interactive Innovation BAFTA
The /programmes aka Automated Programme Support project. So proud.

Out of the Wild [bbc.co.uk]
Our first tentative steps towards improving the BBC’s online natural history offering. Out of The Wild seeks to bring you stories from BBC crews on location. Eventually this should all form part of an integrated programme offer.

Stuff

Biological Taxonomy Vocabulary
An RDF vocabulary for the taxonomy of all forms of life.

On url shorteners [joshua.schachter.org]
Joshua Schachter considers the issues associated with URL shortening. Similar argument to the one I put forward in “The URL shortening antipattern” but with some useful recommendations: “One important conclusion is that services providing transit (or at least require a shortening service) should at least log all redirects, in case the shortening services disappear. If the data is as important as everyone seems to think, they should own it. And websites that generate very long URLs, such as map sites, could provide their own shortening services. Or, better yet, take steps to keep the URLs from growing monstrous in the first place.”

Linking bbc.co.uk to the Linked Data cloud

I’ve been doing a few talks recently – most recently at the somewhat confused OKCon (Open Knowledge) Conference. The audience was extremely diverse and so I tried to not only talk about what we’ve done but also introduce the concept of Linked Data and explain what it is.

Linked Data is a grassroots project to use web technologies to expose data on the web. It is for many people  synonymous with the semantic web – and while this isn’t quite true. It does, as far as I’m concerned, represent a very large subset of the semantic web project. Interestingly, it can also be thought of as the ‘the web done right’, the web as it was originally designed to be.

But what is it?

Well it can be described with 4 simple rules.

1. Use URIs to identify things not only documents

The web was designed to be a web of things with documents making assertions about those real-world things. Just as a passport or driving license, in the real world, can be thought of as providing an identifier for a person making an assertion about who they are, so URIs can be thought of as providing identifiers for people, concepts or things on the web.

Minting URIs for things rather than pages helps make the web more human literate because it means we are identifying those things that people care about.

2. Use HTTP URIs – they are globally unique and anyone can dereference them

The beauty of the web is its ubiquitous nature – it is decentralised and able to function on any platform. This is because of TimBL’s key invention the HTTP URI.

URI’s are globally unique, open to all and decentralised. Don’t go using DOI or any other identifier – on the web all you need is an HTTP URI.

3. Provide useful information [in RDF] when someone looks up a URI

And obviously you need to provide some information at that URI. When people dereference it you need to give them some data – ideally as RDF as well as HTML. Providing the data as RDF means that machines can process that information for people to use. Making it more useful.

4. Include links to other URIs to let people discover related information

And of course you also need to provide links to other resources so people can continue their journey, and that means contextual links to other resources elsewhere on the web, not just your site.

And that’s it.

Pretty simple really and other than the RDF bit, I would argue that these principles should be followed for any website – they just make sense.

But why?

Before the Web people still networked their computers – but to access those computers you needed to know about the network, the routing and the computers themselves.

For those in their late 30s you’ll probably remember the film War Games – because this was written before the Web had been invented David and Jennifer the two ‘hackers’ had to find and connect directly to each computer; they had to know about the computer’s location.

Phoning up another computer

War Games, 1983

The joy of the web is that it adds a level of abstraction – freeing you from the networking, routing and server location – it lets you focus on the document.

Following the principles of Linked Data allows us to add a further level of abstraction – freeing us from the document and letting us focus on the things, people and stuff that matters to people. It helps us design a system that is more human literate, and more useful.

This is possible because we are identifying real world stuff and the relationships between them.

Free information from data silos

Of course there are other ways of achieving this – lots of sites now provide APIs which is good just not great. Each of those APIs tend to be proprietary and specific to the site. As a result there’s an overhead every time someone wants to add that data source.

These APIs give you access to the silo – but the silo still remains. Using RDF and Linked Data means there is a generic method to access data on the web.

What are we doing at the BBC?

First up it’s worth pointing out the obvious: the BBC is a big place and so it would be wrong to assume that everything we’re doing online is following these principles. But there’s quite a lot of stuff going on that does.

We do have – BBC’s programme support, music discovery and, soon, natural history content all adopting these principles. In other words persistent HTTP URIs that can be dereferenced to HTML, RDF, JSON and mobile views for programmes, artists, species and habitats.

We want HTTP URIs for every concept, not HTML webpage – an individual page is made up of multiple resource, multiple concepts. So for example an artist page transcludes the resource ‘/:artist/news’ and ‘/:artist/reviews’ – but those resources also have their own URIs. If they didn’t they wouldn’t be on the web.

Also because there’s only one web we only have one URI for a resource but a number of different representation for that resource. So the URI for the proggramme ‘Nature’s Great Events’ is:

bbc.co.uk/programmes/b00ht655#programme

Through content negotiation we will able to server an HTML, RDF, or mobile document to represent that programme.

We then need to link all of this stuff up within the BBC. So that, for example, you can go from a tracklist on an episode page of Jo Whiley on the Radio 1 site to the U2 artist page and then from there to all episodes of Chris Evans which have played U2. Or from an episode of Nature’s Great Events to the page about Brown Bears to all BBC TV programmes about Brown Bears.

But obviously the BBC is only one corner of the web. So we also need to link with the rest of the web.

Because we’re now thinking on a webscale we’ve started to think about the web as a CMS.

Where URIs already exist to represent that concept we are using it rather than minting our own. The new music site transcludes and links back to Wikipedia to provide biographical information about an artist. Rather than minting our own URI for artist biographic info we use Wikipedia’s.

Likewise when we want to add music metadata to the music site we add MusicBrainz.

Making computers human literate WWW@20

Last Friday saw the 20th anniversary of the Web — well if not the web as such then TimBL’s proposal for an information management system. To celebrate the occasision CERN hosted a celebration which I was honoured to be invited to speak at, by the big man no less! I’ll write up some more about the event itself, but in the meantime here are my slides.

I’ve also posted some photos of the event up on Flickr.

Building coherence at bbc.co.uk

Michael and I have written an article for the latest addition [pdf] of Talis’s magazine Nodalities, reproduced below. If you are interested in the process behind this then I can’t recommend enough Michael’s awesome post ”How we make website“ over on the BBC’s Radio Lab blog.

—-

Telling (non-linear) stories

For the past 86 years the BBC has plied its trade as a storytelling organisation. In the world of linear broadcasting we’ve even gotten very good at it. Guiding the audience through complex news story lines, explaining the natural world and, interleaved narrative arcs and the plotlines of  drama  has become our forte. But storytelling in a linear world is different from storytelling in the non-linear, hypertext world of the web.

Joining www.bbc.co.uk to the rest of the web (of course as @gkob points out those should be dbpedia URIs)

With the exception of BBC News Online (news.bbc.co.uk) the online world has often been seen as a supporting adjunct to the linear broadcast world. Over the years we’ve commissioned and built sites to provide online support for programmes; but we’ve too often taken our linear storytelling expertise and attempted to replicate the same techniques on the web – with mixed success. Unlike linear broadcast storylines the web doesn’t provide people with a predicted and controlled linear journey. Instead we dip in and out of any given website — following different journeys — to find the information we want at that time.

Many of our programme support sites have been commissioned and developed in isolation. So you see an Archers site and an Eastenders site and a Top Gear site which are internally coherent but which fail to link up other than via editorially determined cross promotions. Want to see who presents Top Gear? No problem, we can do that. Want to see what else those people present? Sorry, can’t do that. By developing self-contained microsites the BBC has produced some good stuff but it has also been unable to reach its full potential because it hasn’t managed to join up all of its resources. By failing to link up the content (on both a data and a user experience level) the stuff we publish can never becomes greater than the sum of its parts. Without these links we can’t make bbc.co.uk a coherent experience. As a user, it’s very difficult to find everything the BBC has published about any given subject, nor can you easily navigate across BBC domains following a particular semantic thread. For example, you can’t yet navigate from a page about a musician to a page with all the programmes that have played that artist.

So how do you tell stories on a web scale? We could stick with the easy option and try to control ‘user journeys’ across the site. Provide links to where we think the user should go next. But that’s little better than those flip a dice, go to page 30 dungeons and dragons books we all had as kids. We had to recognise that non-linear storytelling puts the narrative arc into the hands of the user. What to read, what to click, where to go next is really up to you. So storylines split and merge, meta-narratives emerge and fracture; ‘user journeys’ slip out of (editorial) control.

All of this comes from the power of the link – back to basics. But we can only provide precisely targeted links at the user experience level if those links exist at a data level. And that’s the difficult part. The organic growth of our sites has been mirrored in the organic growth of our content and data management systems. We currently have a range of systems across the business for managing different bits of content throughout the production chain. And like our public facing sites none of these speak the same language or share the same identifiers. A typical episode of Top Gear might have 6 separate identifiers on it’s way from scriptwriter to airwaves to archive. Once you’ve solved this problem you hit the problem of multiple identifiers for James May and once you’ve got one canonical James May you’re back to the problem of multiple identifiers for all the other programmes he’s presented…

Solving these problems makes for a more linked, more coherent bbc.co.uk. But an internally coherent bbc.co.uk isn’t enough. bbc.co.uk needs to be weaved into the rest of the web, not merely on the web. It needs to be linked in to all those other Top Gear / James May pages out there… Luckily the tips, tricks and techniques pioneered by the Linked Data community give us some clues here.

Add into this mix the fact that there’s some data the BBC can never hope to provide. So we know when an artist is played on radio or TV. But we can’t hope to know when they were born, or where they were born, or which bands they’ve been in, or who they’re married to etc. If we want to tell stories around music all this is important data. And we can only get it by tapping into the collective knowledge of the web.

BBC in the web of data

I’d like to claim that when we set out to develop /programmes we had the warm embrace of the semantic web in mind. But that would be a lie. We were however building on very similar philosophical foundations.

In the work leading up to bbc.co.uk/programmes we were all too aware of the importance of persistent web identifiers, permanent URIs and the importance of links as a way to build meaning. To achieve all this we broke with BBC tradition by designing from the domain model up rather than the interface down. The domain model provided us with a set of objects (brands, series, episodes, versions, ondemands, broadcasts etc) and their sometimes tangled interrelationships.

We were also convinced that the value in programme websites lay not in the implicit metadata of the domain model but rather in the way this domain model overlapped and intersected with other domains. As ever the links are more important than the nodes because that’s where the context lives: programmes:segment <features> music:track, programmes:segment <features> food:recipe etc. In this way we could weave new ‘user journeys’ into and out of /programmes, into and out of bbc.co.uk. From archive episodes no longer available online, to a recipe page, to a chef, to another recipe and back to a recent episode. Using well targeted content specific links we could not only escape the dead end content silos that characterised bbc.co.uk but point users back to programmes that would hopefully inform, educate and of course entertain.

Finally we believed in the merits of opening our data and building on top of other people’s open data. When we looked to rebuild bbc.co.uk/music we looked at a number of commercial providers of music metadata. They all did a similar job to MusicBrainz (musicbrainz.org) – similar models, similar data quality etc. But choosing to go with a commercial provider would have precluded our ability to provide any kind of machine friendly (API if you must) views. The decision to publish JSON or vanilla XML or RDF would have been a decision to give the 3rd party business model away. So we went with the open alternative – an open, public domain provider, one that is more in keeping with  our public service remit and one that represents better value for money for the license fee payer – which has to be a lesson to someone.

Without ever explicitly talking RDF we’d built a site that complied with Tim Berners-Lee’s four principles for Linked Data:

  1. Use URIs as names for things. – CHECK
  2. Use HTTP URIs so that people can look up those names. – CHECK
  3. When someone looks up a URI, provide useful information. – Well, if we’re only talking HTML, RSS, ATOM, JSON etc. CHECK
  4. Include links to other URIs. so that they can discover more things. – Again if we’re talking HTML only CHECK

By keeping everything in its right place we’d also built a sane, maintainable, scalable, accessible site that search engines love and could be easily evolved to add new features and functionality. So to anyone considering how best to build websites we’d recommend you throw out the Photoshop and embrace Domain Driven Design and the Linked Data approach every time. Even if you never intend to publish RDF it just works.

Around this time we met by chance with some people from the Linking Open Data community and the two worlds collided. Obviously TBL wasn’t talking only HTML in the last 2 principles but aside from that the parallels were striking. We set about converting our programmes domain model into an RDF ontology which we’ve since published under a Creative Commons License (www.bbc.co.uk/ontologies/programmes/). Which took one person about a week. The trick here isn’t the RDF mapping – it’s having a well thought through and well expressed domain model. And if you’re serious about building web sites that’s something you need anyway. Using this ontology we began to add RDF views to /programmes (e.g. www.bbc.co.uk/programmes/b00f91wz.rdf). Again the work needed was minimal.

So for those considering the Linked Data approach we’d say that 95% of the work is work you should be doing just to build for the (non-semantic) web. Get the fundamentals right and the leap to the Semantic Web is really more of a hop.

Why bother with RDF?

For all the pages we’ve published we’ve only had a limited success at making this information available for others to use, to hack with and to build new services with. While we’ve not done a very good job of making bbc.co.uk a coherent experience for people the situation is worse for machines.

It is our belief that rather than publishing proprietary APIs it is better to use the ubiquitous technologies of URIs and HTTP. This approach supports the generative nature of the Web, making it easy for third parties to build with BBC metadata without learning BBC specific APIs and at the same time providing the BBC and its users with immediate benefits.

Services like Flickr, Twitter and the like have in many, many ways followed the same principles we adopted for programmes and music — or if they didn’t then the end results look pretty similar — they are wonderful services. However, if as a third party developer you want to deal with the semantics, accessing the data via the Giant Global Graph to find everything about a certain person, place or topic and you wanted to include data from Flickr then you will need to deal with the specifics of Flickr. I suspect that it wouldn’t be that difficult for Flickr to add RDF representations – if they did then Flickr content would be part of a common way of doing things. We want BBC data to be part of a common way of doing things.

Our hope in making BBC data available as RDF is that we will make it as generative as possible – helping others to do interesting things with our data. The BBC has a public service remit, a remit that means it should look beyond its internal business needs to help create public value around useful technologies and around its content for others to benefit from. The longer term aim of this work is to not only expose BBC data but to ensure that it is contextually linked to the wider web. We have started along this path by linking to Wikipedia (DBpedia in the RDF view) and MusicBrainz from the artist pages but this could be extended for programmes and events.

Follow

Get every new post delivered to your Inbox.

Join 819 other followers