By a mile the highlight of last week or so was the 2nd Linked Data meet-up. Silver and Georgi did a great job of organising the day and I came away with a real sense that not only are we on the cusp of seeing a lot of data on the web but also that the UK is at the centre of this particular revolution. All very exciting.
In terms of Wildlife Finder there are a few things that I wanted to highlight:
If you’re interested in the RDF and how we’re modelling the data we’ve documented the wildlife ontology here. In addition to the ontology itself we’ve also included some background on why we modelled the information in the way we have.
If you want to get you’re hands on the RDF/XML then either add .rdf to the end of most of our URLs (more on this later) or configure your client to request RDF/XML – we’ve implemented content negotiation so you’ll just get the data.
But… we’ve not implemented everything just yet. Specifically the adaptations aren’t published as RDF – this is because we’re making a few changes to the structure of this information and I didn’t want to publish the data and then change it. Nor have we published information on the species conservation status that’s simply because we’ve not finish yet (sorry).
It’s not all RDF – we are also marking-up our taxa pages with the species microformat which gives more structure to the common and scientific names.
I’ve really been neglecting this blog recently – apologies but my attention has been elsewhere recently. Anyway, while I get round to actually writing something here’s a presentation I gave at the Online Information Conference recently.
The presentation is largely based upon the article Michael and I wrote for Nodalities this time last year.
It’s starting to feel like the world has suddenly woken up to the whole Linked Data thing — and that’s clearly a very, very good thing. Not only are Google (and Yahoo!) now using RDFa but a whole bunch of other things are going on, all rather exciting, below is a round up of some of the best. But if you don’t know what I’m talking about you might like to start off with TimBL’s talk at TED.
The BBC has announced a couple SPARQL endpoints, hosted by talis and openlink
Both platforms allow you to search and query the BBC data in a number of different ways, including SPARQL — the standard query language for semantic web data. If you’re not familiar with SPARQL, the Talis folk have published a tutorial that uses some NASA data.
A social semantic BBC?
Nice presentation from Simon and Ben on how social discovery of content could work… “show me the radio programmes my friends have listen to, show me the stuff my friends like that I’ve not seen” all built on people’s existing social graph. People meet content via activity.
PriceWaterhouseCooper’s spring technology forecast focuses on Linked Data [pwc.com]
“Linked Data is all about supply and demand. On the demand side, you gain access to the comprehensive data you need to make decisions. On the supply side, you share more of your internal data with partners, suppliers, and—yes—even the public in ways they can take the best advantage of. The Linked Data approach is about confronting your data silos and turning your information management efforts in a different direction for the sake of scalability. It is a component of the information mediation layer enterprises must create to bridge the gap between strategy and operations… The term “Semantic Web” says more about how the technology works than what it is. The goal is a data Web, a Web where not only documents but also individual data elements are linked.”
I wonder whether restricting the OpenID providers displayed based on visited link would help? i.e. hide those that haven’t been visited? It clearly wouldn’t be perfect – Google isn’t my OpenID provider but I visit google.com lots, but it should cut down some of the clutter.
Security flaw leads Twitter, others to pull OAuth support [cnet.com]
The hole makes it possible for a hacker to use social-engineering tactics to trick users into exposing their data. The OAuth protocol itself requires tweaking to remove the vulnerability, and a source close to OAuth’s development team said that there have been no known violations, that it has been aware of it for a few days now, and has been coordinating responses with vendors. A solution should be announced soon.
Twitter and social networks
Relationship Symmetry in Social Networks: Why Facebook will go Fully Asymmetric [bokardo.com]
Asymmetric model better mimics how real attention works…and how it has always worked. Any person using Twitter can have a larger number of followers than followees, effectively giving them more attention than they give. This attention inequality is the foundation of the Twitter service… The IA of Facebook does not allow this. Facebook has designed a service that forces you to keep track of your friends, whether you want to or not. Facebook is modeling personal relationships, not relationships based on attention. That’s the crucial difference between Facebook and Twitter at the moment.
When Twitter Gets Weird… [Dave Gorman]
“The difference between following someone and replying to them is the difference between stopping to chat with someone in the street or giving them a badge declaring that you know them. One is actual interaction. The other is just something you can show your friends.” Blimey – Dave Gorman clearly has a much better grasp of life, the web and being a human than the two people who attacked him for not following them on Twitter. As Dave points out he hopes that Twiiter doesn’t descend into the MySpace “thanks for the add’ nonsense”. Me too.
Out of the Wild [bbc.co.uk]
Our first tentative steps towards improving the BBC’s online natural history offering. Out of The Wild seeks to bring you stories from BBC crews on location. Eventually this should all form part of an integrated programme offer.
On url shorteners [joshua.schachter.org]
Joshua Schachter considers the issues associated with URL shortening. Similar argument to the one I put forward in “The URL shortening antipattern” but with some useful recommendations: “One important conclusion is that services providing transit (or at least require a shortening service) should at least log all redirects, in case the shortening services disappear. If the data is as important as everyone seems to think, they should own it. And websites that generate very long URLs, such as map sites, could provide their own shortening services. Or, better yet, take steps to keep the URLs from growing monstrous in the first place.”
I’ve been doing a few talks recently – most recently at the somewhat confused OKCon (Open Knowledge) Conference. The audience was extremely diverse and so I tried to not only talk about what we’ve done but also introduce the concept of Linked Data and explain what it is.
Linked Data is a grassroots project to use web technologies to expose data on the web. It is for many people synonymous with the semantic web – and while this isn’t quite true. It does, as far as I’m concerned, represent a very large subset of the semantic web project. Interestingly, it can also be thought of as the ‘the web done right’, the web as it was originally designed to be.
But what is it?
Well it can be described with 4 simple rules.
1. Use URIs to identify things not only documents
The web was designed to be a web of things with documents making assertions about those real-world things. Just as a passport or driving license, in the real world, can be thought of as providing an identifier for a person making an assertion about who they are, so URIs can be thought of as providing identifiers for people, concepts or things on the web.
Minting URIs for things rather than pages helps make the web more human literate because it means we are identifying those things that people care about.
2. Use HTTP URIs – they are globally unique and anyone can dereference them
The beauty of the web is its ubiquitous nature – it is decentralised and able to function on any platform. This is because of TimBL’s key invention the HTTP URI.
URI’s are globally unique, open to all and decentralised. Don’t go using DOI or any other identifier – on the web all you need is an HTTP URI.
3. Provide useful information [in RDF] when someone looks up a URI
And obviously you need to provide some information at that URI. When people dereference it you need to give them some data – ideally as RDF as well as HTML. Providing the data as RDF means that machines can process that information for people to use. Making it more useful.
4. Include links to other URIs to let people discover related information
And of course you also need to provide links to other resources so people can continue their journey, and that means contextual links to other resources elsewhere on the web, not just your site.
And that’s it.
Pretty simple really and other than the RDF bit, I would argue that these principles should be followed for any website – they just make sense.
Before the Web people still networked their computers – but to access those computers you needed to know about the network, the routing and the computers themselves.
For those in their late 30s you’ll probably remember the film War Games – because this was written before the Web had been invented David and Jennifer the two ‘hackers’ had to find and connect directly to each computer; they had to know about the computer’s location.
The joy of the web is that it adds a level of abstraction – freeing you from the networking, routing and server location – it lets you focus on the document.
Following the principles of Linked Data allows us to add a further level of abstraction – freeing us from the document and letting us focus on the things, people and stuff that matters to people. It helps us design a system that is more human literate, and more useful.
This is possible because we are identifying real world stuff and the relationships between them.
Free information from data silos
Of course there are other ways of achieving this – lots of sites now provide APIs which is good just not great. Each of those APIs tend to be proprietary and specific to the site. As a result there’s an overhead every time someone wants to add that data source.
These APIs give you access to the silo – but the silo still remains. Using RDF and Linked Data means there is a generic method to access data on the web.
What are we doing at the BBC?
First up it’s worth pointing out the obvious: the BBC is a big place and so it would be wrong to assume that everything we’re doing online is following these principles. But there’s quite a lot of stuff going on that does.
We do have – BBC’s programme support, music discovery and, soon, natural history content all adopting these principles. In other words persistent HTTP URIs that can be dereferenced to HTML, RDF, JSON and mobile views for programmes, artists, species and habitats.
We want HTTP URIs for every concept, not HTML webpage – an individual page is made up of multiple resource, multiple concepts. So for example an artist page transcludes the resource ‘/:artist/news’ and ‘/:artist/reviews’ – but those resources also have their own URIs. If they didn’t they wouldn’t be on the web.
Also because there’s only one web we only have one URI for a resource but a number of different representation for that resource. So the URI for the proggramme ‘Nature’s Great Events’ is:
Through content negotiation we will able to server an HTML, RDF, or mobile document to represent that programme.
We then need to link all of this stuff up within the BBC. So that, for example, you can go from a tracklist on an episode page of Jo Whiley on the Radio 1 site to the U2 artist page and then from there to all episodes of Chris Evans which have played U2. Or from an episode of Nature’s Great Events to the page about Brown Bears to all BBC TV programmes about Brown Bears.
But obviously the BBC is only one corner of the web. So we also need to link with the rest of the web.
Because we’re now thinking on a webscale we’ve started to think about the web as a CMS.
Where URIs already exist to represent that concept we are using it rather than minting our own. The new music site transcludes and links back to Wikipedia to provide biographical information about an artist. Rather than minting our own URI for artist biographic info we use Wikipedia’s.
Likewise when we want to add music metadata to the music site we add MusicBrainz.
Linked Data? Web of Data? Semantic Web? WTF? [Tom Heath]
“Think about HTML documents; when people started weaving these together with hyperlinks we got a Web of documents. Now think about data. When people started weaving individual bits of data together with RDF triples (that expressed the relationship between these bits of data) we saw the emergence of a Web of data. Linked Data is no more complex than this – connecting related data across the Web using URIs, HTTP and RDF.”
The Programmes Ontology [BBC]
Yves has updated the programmes ontology to handle “temporal annotations” tracklistings and segments and outlets etc.
The Twitter Global Mind [Rocketboom]
Don’t understand what all the fuss about Twitter? Watch this. Yes it’s about social networking and communication but it’s also about realtime search.
Periodic Table of Typefaces on the Behance Network [behance.net]
“The Periodic Table of Typefaces is obviously in the style of all the thousands of over-sized Periodic Table of Elements posters hanging in schools and homes around the world. This particular table lists 100 of the most popular, influential and notorious typefaces today. As with traditional periodic tables, this table presents the subject matter grouped categorically. The Table of Typefaces groups by families and classes of typefaces: san-serif, serif, script, blackletter, glyphic, display, grotesque, realist, didone, garalde, geometric, humanist, slab-serif and mixed.”
The open web
What is the Open Platform? [guardian.co.uk]
“The Open Platform is the suite of services that make it possible for guardian.co.uk to build applications with the Guardian…” very nice, I hope others follow. I also wish the Beeb recognized it’s open projects (recognized internally that is).
Last Friday saw the 20th anniversary of the Web — well if not the web as such then TimBL’s proposal for an information management system. To celebrate the occasision CERNhosted a celebration which I was honoured to be invited to speak at, by the big man no less! I’ll write up some more about the event itself, but in the meantime here are my slides.
I’ve also posted some photos of the event up on Flickr.