Interesting BBC data to hack with

If you are interested in our on going drive to make our data available in interesting and useful ways you might be interested in Nick, Patrick, Duncan and Sean’s recent work.

XML views of radio AOD availability

If you head over to bbc.co.uk/radio/aod/availability/:network.xml then you will get an XML file (updated every 3 hours) with details about what’s available to listen to now and in the next 48 hours. So for example:

http://www.bbc.co.uk/radio/aod/availability/radio1.xml

Will give you data about Radio 1 data [obviously]. The file contains a bunch of metadata about the episodes including details of the stream URLs.

What you will notice is that we’re not pointing you directly at the URLs for the audio instead we’re directing you to our ‘Media Selector’ which we use to maintain the availability window. So if you follow the media selector link you will get back a lump of XML with details of the available media. By the way you’ll want to use the /mediaSelection/media/@encoding = real. Ignore the MP3 that’s a ‘secure’ stream used in iPlayer.

Programme schedules as XML, YAML, JSON and Text

Duncan has already written about his work to implement iCal views on the Radio Labs blog:

iCalendar is a standard for calendar data exchange. It is most notably used by Apple’s iCal application, Microsoft’s Outlook and Google Calendar, to import and export calendar information. We thought that some of the views in /programmes would also work well in the iCal format, so we have exposed a few for you to subscribe to, and play with.

These are available for your regular schedule and for genres [upcoming cricket programmes for example] and for individual programmes. For example:

Upcoming episodes of Eastenders
webcal://bbc.co.uk/programmes/b006m86d/episodes/upcoming.ics
Upcoming (New) episodes of Eastenders
webcal://bbc.co.uk/programmes/b006m86d/episodes/upcoming/debut.ics
Episodes of Eastenders available to watch again
webcal://bbc.co.uk/programmes/b006m86d/episodes/player.ics

In addition to iCal you can also get this data as plain text, XML, JSON or YAML. So the upcoming drama programmes as XML can be found here:

http://www.bbc.co.uk/programmes/genres/drama/schedules/upcoming.xml

Or the Radio 1 schedule as plain text as:

http://www.bbc.co.uk/radio1/programmes/schedules.txt

Artist pages as RDF

We’re busy doing a load of work on the music site right now which will be launching really soon. When we do in addition to lots of HTML we’ll also me making the data available for machines – including RDF. But if you are into this sort of thing here’s a sneaky peak at what will be released:

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf      = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:rdfs     = "http://www.w3.org/2000/01/rdf-schema#"
         xmlns:foaf     = "http://xmlns.com/foaf/0.1/"
         xmlns:mo       = "http://purl.org/ontology/mo/"
         xmlns:mf       = "http://purl.org/ontology/mo/mf#"
         xmlns:owl      = "http://www.w3.org/2002/07/owl#"
         xmlns:time     = "http://www.w3.org/2006/time#"
         xmlns:dc       = "http://purl.org/dc/elements/1.1/"
         xmlns:timeline = "http://purl.org/NET/c4dm/timeline.owl#"
         xmlns:event    = "http://purl.org/NET/c4dm/event.owl#">

<rdf:Description rdf:about="/music/artists/cc197bad-dc9c-440d-a5b5-d52ba2e14234.rdf">
  <rdfs:label>Description of the artist Coldplay</rdfs:label>
</rdf:Description>

<mo:MusicGroup rdf:about="/music/artists/cc197bad-dc9c-440d-a5b5-d52ba2e14234#artist">
  <foaf:name>Coldplay</foaf:name>

  <mo:image rdf:resource="/music/images/artists/7col_in/cc197bad-dc9c-440d-a5b5-d52ba2e14234.jpg" />

  <mo:musicbrainz rdf:resource="http://musicbrainz.org/artist/cc197bad-dc9c-440d-a5b5-d52ba2e14234" />
  <mo:homepage rdf:resource="http://www.coldplay.com/" />
  <mo:fanpage rdf:resource="http://www.pleasureunit.com/coldplay/index.php" />
  <mo:wikipedia rdf:resource="http://en.wikipedia.org/wiki/Coldplay" />
  <mo:imdb rdf:resource="http://www.imdb.com/name/nm1095892/" />
  <mo:myspace rdf:resource="http://www.myspace.com/coldplay" />

  <mo:member rdf:resource="/music/artists/18690715-59fa-4e4d-bcf3-8025cf1c23e0#artist" />
  <mo:member rdf:resource="/music/artists/d156ceb2-fd90-4e82-baea-829bbdf1c127#artist" />
  <mo:member rdf:resource="/music/artists/6953c4db-7214-4724-a140-e87550bde420#artist" />
  <mo:member rdf:resource="/music/artists/98d1ec5a-dd97-4c0b-9c83-7928aac89bca#artist" />

</mo:MusicGroup>

<mo:SoloMusicArtist rdf:about="/music/artists/18690715-59fa-4e4d-bcf3-8025cf1c23e0#artist">
  <foaf:name>Guy Berryman</foaf:name>
</mo:SoloMusicArtist>
<mo:SoloMusicArtist rdf:about="/music/artists/d156ceb2-fd90-4e82-baea-829bbdf1c127#artist">
  <foaf:name>Jon Buckland</foaf:name>
</mo:SoloMusicArtist>
<mo:SoloMusicArtist rdf:about="/music/artists/6953c4db-7214-4724-a140-e87550bde420#artist">
  <foaf:name>Will Champion</foaf:name>
</mo:SoloMusicArtist>
<mo:SoloMusicArtist rdf:about="/music/artists/98d1ec5a-dd97-4c0b-9c83-7928aac89bca#artist">
  <foaf:name>Chris Martin</foaf:name>
</mo:SoloMusicArtist>

</rdf:RDF>

We would really welcome your feedback on any of this.

And finally a bit of URL hackery…

Our decision to use opaque IDs for our programmes [episodes, series and programme brands] means that we can provide persistent URL – which is a good thing. The downside is that you can’t guess the URL. To fix this you can now enter URLs like this:

www.bbc.co.uk/programmes/eastenders and you will be redirected to www.bbc.co.uk/programmes/a-z/by/eastenders/all

That’s a disambiguation page for all programmes with Eastenders in the title. If however there’s just one programme with that title, like for example www.bbc.co.uk/programmes/chrismoyles then you will be redirected to that brand page (www.bbc.co.uk/programmes/b006wkqb).

Photo: Data storage - old and new, by Ian-s. Used under licence.

23 responses to “Interesting BBC data to hack with”

  1. XML views (http://www.bbc.co.uk/radio/aod/availability/radio1.xml) are great. However, are you considering adding feeds for some of the old Radio Player categories (eg “Comedy”, “Jazz” etc)?

  2. And, of course, there’s the second question: Is there a similar feed for the TV channels?

  3. That is great.
    Is there another XML that lists the available networks (to avoid hard-coding network names)?

  4. I tried emailing aodfeed@bbc.co.uk as given in the xml radio1 feed but it bounces.

    Could you post more details about the feeds or an email address where this may be obtained?

  5. […] Scott has some BBC data to hack with. Image from Ian-S on […]

  6. @rtsh : right now we’re only making these available for radio and only grouped by station. This is primarily because with the switch over to iPlayer we needed to ensure WiFi radio can continue to easily access our data. However, I’m keen to hear about how folk want to use / are using these so we can make subsequent releases that help meet your needs. Obviously that doesn’t mean we can do everything everyone wants but it does mean we’ll try to do the right thing.

    @Paul Webster – no there isn’t – but that’s a good idea.

    @Triode – sorry about that. The email should have been up and running. We will post more info – on the Radio Labs blog (bbc.co.uk/blogs/radiolabs/) shortly.

  7. @Tom: Ideally I’d like to see the radio ones grouped by genre as well. I’d like to see similar menu structures to the old aod pages if possible as without this page scraping is always going to give richer results. [ideally one genre per feed as this allows me to bookmark them in my application]

    I’ve found that several of the episodes only return the flash stream or possibly no streams at all. Is this work in progress?

    Anyway thanks for making them available – working well for me and much easier than web scraping…

  8. @Tom: I’ve discovered I can get the by genre details I am looking for in html – e.g. http://www.bbc.co.uk/radio/programmes/genres/entertainmentandcomedy/spoof/player

    Is there any chance all of these pages could be made available in machine readable format (xml or json)?

  9. @Tom – yes that would probably work well. Am I ok to assume that I can always get the media stream from:
    http://www.bbc.co.uk/mediaselector/4/mtis/stream/ of the episode?

    (thats the downside of using these feeds, it does not provide a definitive link to the mediaselector)

    Does the schedule having -7 days mean it always gets all of the espisodes currently available?

  10. @Tom – actually that doesn’t seem to work as I can’t take the pid and form a url for mediaSelector which returns the stream url. This may just be a bug in media selector as I can get the iPlayer page for a given pid which contains the stream (but it would be nice to avoid using this)

    [are you able to discuss this via email?]

  11. Also http://www.bbc.co.uk/radio/programmes/genres/childrens/entertainmentandcomedy/schedules.json only give 1 days worth of schedule – is there a way to extend this for the previous 7 days (ie what is available to stream)

  12. Looks like the week-end gremlins have crept in. The AOD XML listings have no significant content.
    e.g. BBC Radio 4

    BBC Audio On Demand Availability Schedule
    Usage of this feed does not authorise you to use items of BBC copyright or trademarks (eg. the BBC Logo, BBC Radio brands).
    Please email aodfeed@bbc.co.uk for full details – graphical assets and correct naming techniques can be provided once you agree to our brand usage and linking terms.
    aodfeed@bbc.co.uk
    British Broadcasting Corporation [c] 2008

  13. ah – the blog tool has removed all of the special XML … anyway – you should get the idea.

  14. @Paul Webster : Sorry about that – bug is in the process of being squashed.

  15. @Paul Webster – just to let you know we’ve fixed the bug. Everything should be working normally again.

  16. Thanks – looks like BBC Radio 4 has a bit more in it compared to last week so maybe some other things have been fixed in the background (e.g. Broadcasting House RealAudio now appears)

  17. […] Interesting BBC data to hack with [Derivadow.com]…including the first public release of at least one of these things. Nice. […]

  18. Things are broken again.
    The BBC Radio XML has not been updated since 8th August (now 14th).
    For example – from Radio 4 …
    schedule start_date=”2008-08-08T12:00:14Z” updated=”2008-08-08T12:01:06Z” end_date=”2008-08-10T12:00:14Z” network=”radio4″

    Can you put something in the process that checks and reports when things do not get published?

  19. ah – I see it has now been updated – timestamps show about 30 minutes after I posted the comment above.

  20. Links to RAM (Real Audio) material seem to have gone missing.
    Example from BBC Radio 4 XML from lunchtime today.
    Today’s Farming Today (appropriate for RAM) does not have one
    http://www.bbc.co.uk/mediaselector/4/mtis/stream/b00d6h62

    Also – can you post a link to a good place to report such issues?

  21. Re: And finally a bit of URL hackery…
    I noticed this with e.g. looseends by hacking around and then found your blog …

    For looseends
    http://www.bbc.co.uk/programmes/looseends
    does indeed redirect to the Brand Page
    http://www.bbc.co.uk/programmes/b006qjym

    But it falls down for some programmes so that this great trick cannot be generally applied e.g.
    http://www.bbc.co.uk/programmes/starttheweek unfortunately goes to
    http://www.bbc.co.uk/programmes/a-z/by/starttheweek/all
    which shows 3 entries. That last 2 entries are references to single programmes – e.g. shouldn’t really be there – is this data likely to be cleared up in future

  22. Re: And finally a bit of URL hackery…
    I noticed this with e.g. looseends by hacking around and then found your blog …

    For looseends
    http://www.bbc.co.uk/programmes/looseends
    does indeed redirect to the Brand Page
    http://www.bbc.co.uk/programmes/b006qjym

    But it falls down for some programmes so that this great trick cannot be generally applied e.g.
    http://www.bbc.co.uk/programmes/starttheweek unfortunately goes to
    http://www.bbc.co.uk/programmes/a-z/by/starttheweek/all
    which shows 3 entries.
    That last 2 entries are references to single programmes – i.e. they shouldn’t really be there – is this data likely to be cleared up in future?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: