Apis and APIS a wildlife ontology

By a mile the highlight of last week or so was the 2nd Linked Data meet-up. Silver and Georgi did a great job of organising the day and I came away with a real sense that not only are we on the cusp of seeing a lot of data on the web but also that the UK is at the centre of this particular revolution. All very exciting.

For my part I presented the work we’ve been doing on Wildlife Finder – how we’re starting to publish and consume data on the web. Ed Summers has a great write up of what we’re doing I’ve also published my slides here:

I also joined Paul Miller, Jeni Tennison, Ian Davis and Timo Hannay on a panel session discussing Linked Data in the enterprise.

In terms of Wildlife Finder there are a few things that I wanted to highlight:

  1. If you’re interested in the RDF and how we’re modelling the data we’ve documented theĀ wildlife ontology here. In addition to the ontology itself we’ve also included some background on why we modelled the information in the way we have.
  2. If you want to get you’re hands on the RDF/XML then either add .rdf to the end of most of our URLs (more on this later) or configure your client to request RDF/XML – we’ve implemented content negotiation so you’ll just get the data.
  3. But… we’ve not implemented everything just yet. Specifically the adaptations aren’t published as RDF – this is because we’re making a few changes to the structure of this information and I didn’t want to publish the data and then change it. Nor have we published information on the species conservation status that’s simply because we’ve not finish yet (sorry).
  4. It’s not all RDF – we are also marking-up our taxa pages with the species microformat which gives more structure to the common and scientific names.

Anyway I hope you find this useful.

Opening up the BBC’s natural history archive

The BBC’s Natural History Unit is responsible for some of the BBC’s most loved TV and radio programming — unfortunately until now it’s only been accessible as part of the regular schedule or via iPlayer. I say until now because today we launched the first phase of a new project which brings clips from the best of the NHU’s programmes online.

Pages for habitats, taxa and adaptations
URIs for habitats, taxa and adaptations

Over the last few months we’ve been plundering the NHU’s archive to find the best bits — segmenting the TV programmes, tagging them (with DBpedia terms) and then aggregating them around URIs for the key concepts within the natural history domain; so that you can discover those programme segments via both the originating programme and via concepts within the natural history domain — species, habitats, adaptations and the like.

The segments/ clips ‘belong’ to their originating programme — and as a result we’ve been adding information, about a bunch of programmes from the archive, to PIPs (the underlying database behind iPlayer and /programmes). The clip pages aren’t yet linked in with their owning episode, but they will be soon.

In addition to being able to discover these clips from within the context of the programme we are also providing URIs to aggregate information around the natural history domain, that is URIs for species, habitats, adaptations and ecozones.

URIs for species such as the Bush Elephant
URIs for species such as the Bush Elephant

Our hope is that by providing highly inter-linked, URIs we can help people gain a greater understanding of the natural world. For example, by being able to see the different animals and habitats that live within different ecozones you can gain an understanding of the diversity of of life in different parts of the world; or what different animals make up the Mammal or Bird Class; or more about a particular adaptation.

Ovoviviparous - what it is, what animals do it and BBC archived content about it
Ovoviviparous - what it is, what animals do it and BBC archived content about it

Of course we are doing more than providing access to programme segments, we have also plundered our sound archive so you can hear what the different habitats and species sound like (and obviously those sounds are separately addressable), we are then aggregating content from the other ‘BBC Earth’ projectsEarth News and Out of the Wild and elsewhere on the web.

It’s not just about BBC content.

You might have noticed that the slugs for our URIs (the last bit of the URL) are the same as those used by Wikipedia and DBpedia that’s because I believe in the simple joy of webscale identifiers, you will also see that much like the BBC’s music site we are transcluding the introductory text from Wikipedia to provide background information for most things. This also means that we are creating and editing Wikipedia articles where they need improving (of course you are also more than welcome to improve upon the articles).

We are also publishing data from bunch of other organisations. Information about habitats, ecozones and species distribution is provided by WWF’s Wildfinder; the species conservation status by IUCN’s Red List of Threatened Species and (where available) information about why a species is at threat coming for Zoological Society of London’s EDGE of Existence programme. Finally information about a species adaptations and behaviours are provided by Animal Diversity Web.

Adopting this approach means that we are able to contribute distinctive content to the Web while at the same time helping people find what is already there.

There is a lot more we need to do, including linking in with current programmes and making everything available as RDF, JSON and for mobile devices. That’s all on it’s way but in the meantime I hope you find what’s there useful, informative and entertaining.

Humanity Connected

Digital Revolution, a new BBC TV programme, was launched last Friday. Due to be broadcast next year, the programme will be looking back over the first 20 years of the web and considering what the future might hold. The show will be considering how the web has changed society and the implications for things like security, privacy and the economy.

Tim Berners-Lee. Photograph by Documentally, some rights reserved.
Tim Berners-Lee. Photograph by Documentally, some rights reserved.

Unlike — well probably every other TV programme I’ve ever come across — each programme will be influenced and debated on the web during it’s production. Some of rushes and interviews will be made available on the web (under permissive terms) so that anyone can contribute to the debate, helping to shape the final programme.

To kick all this off the BBC hosted a debate chaired by Aleks Krotoski with Tim Berners-Lee, Bill Thompson, Susan Greenfield and Chris Anderson. The audience was almost as impressive as the folks up on stage a great mix of geeks and journalists, and luckily I managed to wangle an invite (probably because I’ve had a tiny, tiny role on the project).

Anyway… the presentations were very cool, and while I tweeted the best bits on the day I thought I would write up a short post summing it all up. You know, contributing to the debate and all that.

The thing that struck me most were the discussions and points made around the way in which the web has provided a platform for creativity, and the risks to it’s future because of governments’ failure to understand it (OK, the failure to understand it is my interpretation, not the view expressed by the speakers).

I’ve written previously about how the web’s generative nature has helped enable an eruption of creativity, spawning a new economy in it’s wake; and how governments have failed to grasp that it’s the people that use the medium that need policing not the medium itself. But as you might expect from such an illustrious bunch of people the panel managed to nail the point much better than I ever could.

To misquote TimBL: The web should be like paper. Government should be able to prosecute if you misuse it, but they shouldn’t limit what you are able to do with it. When you buy paper you aren’t limited in what can be written or drawn on it, the and like paper the Internet shouldn’t be set up in such a way as to constrain it’s use.

The reason this is important is because it helps to preserve the web’s generative nature. TimBL points out that people are creative, they simply need platform for that creativity, and if that platform is to be the Web then it needs to support everyone, anyone should be able to express that creativity and that means it needs to be open.

As an aside there was a discussion as to whether or not access to the Internet is a ‘human right’ — I’m not sure whether it is or not, but it’s worth considering whether or not if everyone had access to the Web whether it could be used to solve problems in the developing world. For example, by allowing communities to share information on how to dig wells and maintain irrigation systems, information on health care and generally providing educational material. It is very easy, for us in the West to think of the Web as synonymous with the content and services currently provided on it and whether they would be useful in developing countries. But the point really should be if anyone, anywhere in the world where able to create and share information what would they do with it? My hope would be that the services offered would reflect local needs — whether that be social networking in US colleges or water purification in East Africa.

Of course being open and free for all to use doesn’t mean that everything on the web will be wonderful, or indeed legal; no more so than paper ensures wonderful prose because it is open. Or as TimBL puts it:

Just because you can read everything out there doesn’t mean you should. If you found a piece of paper blowing in the wind you wouldn’t expect it to be edifying.

But what does open mean?

Personally I think that an open web is one that seeks to preserve it’s generative nature. But the discussion last Friday also focused on the implications for privacy and snooping.

Governments the world over, including to our shame the current UK Government, are seeking to limit the openness of the web; that is rather than addressing the specific activities that happen on the web, they are seeking to limit the very platform itself. ISPs around the world, at the behest of governments, are being asked to track and record what you do on the web, everything you do on the web. Elsewhere, content is being filtered, traffic shaped and sites blocked.

The sorts of information being collected can include your search terms (pinned to your IP address) and the sites you visit. Now for sure this might, sometime include a bunch of URIs that point to illegal and nefarious activity, but it might also include (indeed it’s more likely to include) URIs relating to a medical condition or legal advice or a hundred and one other, perfectly legal but equally personal bits of information.

Should a government, its agencies or an ISP be able to capture, store and analyses this data? Personally I think not. And should you think that I’m just being a scaremonger have a read of Bill’s post “The digital age of rights” about the French government’s HADOPI legislation.

On the day Bill Thompson (who, by the way, was on blinding form) summed up the reason why when he summed up his hopes for the web thus:

I hoped that the web would help us know our neighbours better, so that we didn’t go and kill them. That hasn’t happened but it does now mean it’s much harder to get away with it – the world will now know if you do kill them.

Governments know this, which is why some now try to lock down access to the Internet when there is civil unrest in their country. And it is also why the rest of the web tries to help them break though.

Few Western governments, would condone the activities of such Totalitarian states. But it is interesting to consider whether Western governments would support North Korea or Iran setting up the kinds of databases currently being debated in Europe and the States. Now they might point out that the comparison isn’t a fair one since they are nice, democratic governments not nasty oppressive ones. But isn’t that painfully myopic? How do they know who will be in power in the future? How do they know how future governments might seek to use the information they are gathering now?

Seeking to prevent snooping on the Internet aside there is another reason why the web should remain open, and it is the reason why it’s important to fight for One Web.

Susan Greenfield quite rightly pointed out that ‘Knowledge is to be found by creating context, links between facts; it’s the context that counts’. Although she was making the point in an attempt to take a swipe at the Web, trying to suggest that the web is no more than a collection of facts devoid of context, it seems to me that in fact the web is the ultimate context machine. (One sometimes wonders whether she has ever actually used any of the services she complains about, indeed I wonder if she uses the web at all).

The web is, as the name suggest, a set of interconnected links. Those URIs and the links between, as TimBL reminded us, are made by people, they are followed by people and as such you can legitimately think of the Web as humanity connected.

URIs are incredibly powerful, particularly when they are used to identify things in addition to documents. When they are used to identify things (dereferencing to the appropriate data or document format) they can lead to entirely new ways to access information. An example highlighted by TimBL is the impact they might have on TV channels and schedules.

He suggested that the concept of a TV channel was limited and that it would be replaced with complete random access. When anyone, anywhere in the world, can follow a URI to a persistent resource (note he didn’t say click on a link) then the TV channel as a means of discovery and recommendation will be replaced with a trust network. “My friends have watched this, most of them like it…” sort of thing.

Of course to get there we need to change the way we think about the web and the way in which we publish things. And here TimBL pointed to the history of the web, suggesting that the next digital revolution will operate in a similar fashion.

The web originally happened not because senior management thought it was a good idea – it happened because people who ‘got it’ thought it was cool, that it was the right thing and that they were lucky enough to have managers that didn’t get in the way. Indeed this is exactly what happened when TimBL wrote the first web server and client and then when the early web pioneers started publishing web pages. They didn’t do it because they were told to, they didn’t do it because there was any immediate benefit. They did it because they thought that by doing it it would enable cool things to happen. The last couple of years suggests that we are on the cusp of a similar revolution as people start to publish linked data which will in turn result in a new digital revolution.