One for the Linked Data community this one – you can now get your BBC programme data as RDF and the current and next three programmes, per service, as plain text, XML, JSON or YAML.
Last May, at XTech,Nick and I gave a paper outlining ourwork to make data available for other development teams, outside the BBC, to use. At the time we didn’t have the RDF views launched – since then we’ve launched RDF for the new artist pages and today for programmes.
As usual all you need to do is add .rdf to the end of a URL for a brand, series, episode or version. For example:
To help you find programmes, no matter which station or channel they are broadcast on, we’ve started publishing schedules for all our genres (sport, music etc.). These views are being used as part of the BBC’s Olympics coverage – specifically to drive the TV schedule and individual sport pages. But as you might be able to guess a little bit of URL hacking gives you more. All you need to do is add /schedules to the end of any genre aggregation, so for example:
We (well Nick, Patrick, Tony, Deanna, Sacha and Guy) have recently been working on a major rewrite of the BBC’s music site [beta] – and since its just gone live as a public beta I thought it would be a good time to explain a bit about what we’ve done and what we hope to do with the site. I would also love to hear what you think about what we’ve done so far – especially as we move from public beta towards a replacement for the current site.
Our work so far has focused on providing core information for every artist the BBC plays on our daytime radio network shows. That’s shows like Annie Mac and Chris Moyles on Radio 1, or Steve Lamacq on 6Music. We’ve focused on these shows because we’ve integrated the site with the radio playout system VCS dira! which are basically giant iPods in the basement of Broadcasting House. Unfortunately the specialist shows as well as national and local radio shows don’t use these so we haven’t got track listing data in the right format for these programmes. But we have a plan and will be adding more shows in due course.
So what have done so far?
For starters we decided that we wanted to have a single canonical page for every artist. We decided to do this because we want to aggregate everything we know about an artist at a single URL. But this means that we need unique, persistent, unambiguous URLs. Thankfully, as Michael has already discussed, Musicbrainz gives us unique identifiers that allow us to provide just such a set of URLs.
The core of the new site then is built around Musicbrainz. In addition to giving us web scale identifiers Musicbrainz is also being used to give core music metadata e.g. discographies, related artists and related links. Some of those related links are Wikipedia links and we are using those to go and fetch the introductory text for each artist’s biography from Wikipedia.
The approach we’re using to keep the Wikipedia data up to date is, I think, quite neat. Patrick has written a bot to monitor the Wikipedia IRC channel for updates – when it spots an update we fetch the new Wikipedia content. Oh and obviously all this data is rendered dynamically using the same MVC framework we are using for /programmes which means that updates happen almost instantaneously.
This brings me to the next major feature – integration with /programmes. As I’ve said we have integrated with VCS to give us track listing information for our daytime radio shows – we are matching this data with both our internal database of programme metadata and Musicbrainz. This lets us know which radio stations and shows have played which artists.
We also want to make this data available for others to use and so have designed the site to provide a RESTful API, following the principles of Linked Data:
…namely thinking of URIs as more than just locations for documents. Instead using them to identify anything, from a particular person to a particular programme. These resources in-turn have representations, which can be machine-processable (through the use of RDF, Microformats, RDFa, etc.), and these representations can hold links towards further web resources, allowing agents to jump from one data-set to another.
If you would like to have a play with these we have RDF, XML, JSON and YAML representations of the resources – just add .xml.rdf.json or .yaml to the end of the artist url.
Nick, Michael and I have previous spoken about our plans for linking programmes, music, events, topics and users. Well this is our first foray into this world. Information about programmes and music is interesting, it’s useful; but it’s not as interesting nor as useful as when the two are intelligently linked. Joining the two worlds means that you can aggregate information about the programmes that have played an artist [as we’ve done], you can put track listings on episode pages, you can have charts of which artists are played most on all BBC Radio programmes, on Radio 1, by Zane Lowe. You can also aggregate all episodes that you can currently listen to that feature a given artist. Or show which programme first played a given artist and how often the BBC has played them since. Basically the interesting stuff happens at the joins between the nodes because that’s where the context lives.
By exposing the information that is created by joining programmes and music we can provide context and serendipity. We can help you find out about the music you’ve just listened to and introduce you to new shows that also features the music you like. So that’s what we’re working on.
We also need to provide more data about each artist – from both inside and outside the BBC. That means bringing the album reviews into the fold, hooking up external news feeds and the like. But whatever we do it’s worth bearing in mind that this is on a much larger scale than the current music site. This new site has in the order of 388,398 artist pages, 157,677 external links and 93,912 artist to artist relationships.
It would be great to hear what you think about what we’ve done and our plans for the future. You are welcome to leave a comment here, or via the Backstage mailing list.
If you head over to bbc.co.uk/radio/aod/availability/:network.xml then you will get an XML file (updated every 3 hours) with details about what’s available to listen to now and in the next 48 hours. So for example:
Will give you data about Radio 1 data [obviously]. The file contains a bunch of metadata about the episodes including details of the stream URLs.
What you will notice is that we’re not pointing you directly at the URLs for the audio instead we’re directing you to our ‘Media Selector’ which we use to maintain the availability window. So if you follow the media selector link you will get back a lump of XML with details of the available media. By the way you’ll want to use the /mediaSelection/media/@encoding = real. Ignore the MP3 that’s a ‘secure’ stream used in iPlayer.
Programme schedules as XML, YAML, JSON and Text
Duncan has already written about his work to implement iCal views on the Radio Labs blog:
We’re busy doing a load of work on the music site right now which will be launching really soon. When we do in addition to lots of HTML we’ll also me making the data available for machines – including RDF. But if you are into this sort of thing here’s a sneaky peak at what will be released:
We would really welcome your feedback on any of this.
And finally a bit of URL hackery…
Our decision to use opaque IDs for our programmes [episodes, series and programme brands] means that we can provide persistent URL – which is a good thing. The downside is that you can’t guess the URL. To fix this you can now enter URLs like this:
I’ve also written a post over at the BBC’s Radio Lab blog about the machine readable serializations to represent the concepts described within the ontology.
We have been following the Linked Data approach – namely thinking of URIs as more than just locations for documents. Instead using them to identify anything, from a particular person to a particular programme. These resources in-turn have representations, which can be machine-processable (through the use of RDF, Microformats, RDFa, etc.), and these representations can hold links towards further web resources, allowing agents to jump from one dataset to another.