Apis and APIS a wildlife ontology

By a mile the highlight of last week or so was the 2nd Linked Data meet-up. Silver and Georgi did a great job of organising the day and I came away with a real sense that not only are we on the cusp of seeing a lot of data on the web but also that the UK is at the centre of this particular revolution. All very exciting.

For my part I presented the work we’ve been doing on Wildlife Finder – how we’re starting to publish and consume data on the web. Ed Summers has a great write up of what we’re doing I’ve also published my slides here:

I also joined Paul Miller, Jeni Tennison, Ian Davis and Timo Hannay on a panel session discussing Linked Data in the enterprise.

In terms of Wildlife Finder there are a few things that I wanted to highlight:

  1. If you’re interested in the RDF and how we’re modelling the data we’ve documented the wildlife ontology here. In addition to the ontology itself we’ve also included some background on why we modelled the information in the way we have.
  2. If you want to get you’re hands on the RDF/XML then either add .rdf to the end of most of our URLs (more on this later) or configure your client to request RDF/XML – we’ve implemented content negotiation so you’ll just get the data.
  3. But… we’ve not implemented everything just yet. Specifically the adaptations aren’t published as RDF – this is because we’re making a few changes to the structure of this information and I didn’t want to publish the data and then change it. Nor have we published information on the species conservation status that’s simply because we’ve not finish yet (sorry).
  4. It’s not all RDF – we are also marking-up our taxa pages with the species microformat which gives more structure to the common and scientific names.

Anyway I hope you find this useful.

Lego, Wombles and Linked Data

As a child I loved Lego. I could let my imagination run riot, design and build cars, space stations, castles and airplanes.

Blue lego brick

My brother didn’t like Lego, instead preferring to play with Action Men and toy cars. These sorts of toys did nothing for me, and from the perspective of an adult I can understand why. I couldn’t modify them, I couldn’t create anything new. Perhaps I didn’t have a good enough imagination because I needed to make my ideas real. I wanted to build things, I still do.

Then the most exciting thing happened. My dad bought a BBC micro.

Obviously computers such as the BBC Micro were in many, many ways different from today’s Macs and if you must PCs. Obviously they were several orders of magnitude less powerful than today’s computers but, and importantly, they were designed to be programmed by the user, you were encouraged to do so. It was expected that that’s what you would do. So from a certain perspective they were more powerful.

BBC Micro’s didn’t come preloaded with word processors, spreadsheets and graphics editors and they certainly weren’t WIMPs.

What they did come with was BBC BASIC and Assembly Language.

They also came with two thick manuals. One telling you how to set the computer up; the other how to programme it.

This was all very exciting, I suddenly had something with which I could build incredibly complex things. I could, in theory at least, build something that was more complex than the planes, spaceships and cars which I modelled with Lego a few years before.

Like so many children of my age I cut my computing teeth on the BBC Micro. Learnt to programme computers, and played a lot of games!

Unfortunately all was not well. You see I wasn’t very good at programming my BBC micro. I could never actually build the things I had pictured in my mind’s eye, I just wasn’t talented enough.

You see Lego hit a sweet spot which those early computers on the one hand and Action Man on the other missed.

What Lego provided was reusable bits.

When Christmas or my birthdays came around I would start off by building everything suggested by the sets I was given. But I would then dismantle the models and reuse those bricks to build something new, whatever was in my head. By reusing bricks from lots of different sets I could build different models. The more sets I got given, the more things I could build.

Action men simply didn’t offer any of those opportunities, I couldn’t create anything new.

Early computers where certainly very capable of providing a creative platform; but they lacked the reusable bricks, it was more like being given an infinite supply of clay. And clay is harder to reuse than bricks.

Today, with the online world we are in a similar place but with digital bits and bytes rather than moulded plastic bits and bricks.

The Web allows people to create their own stories – it allows people to follow their nose to create threads through the information about the things that interest them, commenting, and discussing it on the way. But the Web also allows developers to reuse previously published information within new, different context to tell new stories.

But only if we build it right.

Most Lego bricks are designed to allow you to stick one brick to another. But not all bricks can be stuck to all others. Some can only be put at the top – these are the tiles and pointy bricks to build your spires, turrets and roofs. These bricks are important, but they can only be used at the end because you can’t build on top of them.

The same is true of the Web – we need to start by building the reusable bits, then the walls and only then the towers and spires and twiddly bits.

But this can be difficult – the shinny towers are seductive and the draw to start with the shiny towers can be strong; only to find out that you then need to knock it down and start again when you want to reuse the bits inside.

We often don’t give ourselves the best opportunity to womble with what we’ve got – to reuse what others make, to reuse what we make ourselves. Or to let others outside our organisations build with our stuff. If you want to take these opportunities then publish your data the webby way.

Humanity Connected

Digital Revolution, a new BBC TV programme, was launched last Friday. Due to be broadcast next year, the programme will be looking back over the first 20 years of the web and considering what the future might hold. The show will be considering how the web has changed society and the implications for things like security, privacy and the economy.

Tim Berners-Lee. Photograph by Documentally, some rights reserved.
Tim Berners-Lee. Photograph by Documentally, some rights reserved.

Unlike — well probably every other TV programme I’ve ever come across — each programme will be influenced and debated on the web during it’s production. Some of rushes and interviews will be made available on the web (under permissive terms) so that anyone can contribute to the debate, helping to shape the final programme.

To kick all this off the BBC hosted a debate chaired by Aleks Krotoski with Tim Berners-Lee, Bill Thompson, Susan Greenfield and Chris Anderson. The audience was almost as impressive as the folks up on stage a great mix of geeks and journalists, and luckily I managed to wangle an invite (probably because I’ve had a tiny, tiny role on the project).

Anyway… the presentations were very cool, and while I tweeted the best bits on the day I thought I would write up a short post summing it all up. You know, contributing to the debate and all that.

The thing that struck me most were the discussions and points made around the way in which the web has provided a platform for creativity, and the risks to it’s future because of governments’ failure to understand it (OK, the failure to understand it is my interpretation, not the view expressed by the speakers).

I’ve written previously about how the web’s generative nature has helped enable an eruption of creativity, spawning a new economy in it’s wake; and how governments have failed to grasp that it’s the people that use the medium that need policing not the medium itself. But as you might expect from such an illustrious bunch of people the panel managed to nail the point much better than I ever could.

To misquote TimBL: The web should be like paper. Government should be able to prosecute if you misuse it, but they shouldn’t limit what you are able to do with it. When you buy paper you aren’t limited in what can be written or drawn on it, the and like paper the Internet shouldn’t be set up in such a way as to constrain it’s use.

The reason this is important is because it helps to preserve the web’s generative nature. TimBL points out that people are creative, they simply need platform for that creativity, and if that platform is to be the Web then it needs to support everyone, anyone should be able to express that creativity and that means it needs to be open.

As an aside there was a discussion as to whether or not access to the Internet is a ‘human right’ — I’m not sure whether it is or not, but it’s worth considering whether or not if everyone had access to the Web whether it could be used to solve problems in the developing world. For example, by allowing communities to share information on how to dig wells and maintain irrigation systems, information on health care and generally providing educational material. It is very easy, for us in the West to think of the Web as synonymous with the content and services currently provided on it and whether they would be useful in developing countries. But the point really should be if anyone, anywhere in the world where able to create and share information what would they do with it? My hope would be that the services offered would reflect local needs — whether that be social networking in US colleges or water purification in East Africa.

Of course being open and free for all to use doesn’t mean that everything on the web will be wonderful, or indeed legal; no more so than paper ensures wonderful prose because it is open. Or as TimBL puts it:

Just because you can read everything out there doesn’t mean you should. If you found a piece of paper blowing in the wind you wouldn’t expect it to be edifying.

But what does open mean?

Personally I think that an open web is one that seeks to preserve it’s generative nature. But the discussion last Friday also focused on the implications for privacy and snooping.

Governments the world over, including to our shame the current UK Government, are seeking to limit the openness of the web; that is rather than addressing the specific activities that happen on the web, they are seeking to limit the very platform itself. ISPs around the world, at the behest of governments, are being asked to track and record what you do on the web, everything you do on the web. Elsewhere, content is being filtered, traffic shaped and sites blocked.

The sorts of information being collected can include your search terms (pinned to your IP address) and the sites you visit. Now for sure this might, sometime include a bunch of URIs that point to illegal and nefarious activity, but it might also include (indeed it’s more likely to include) URIs relating to a medical condition or legal advice or a hundred and one other, perfectly legal but equally personal bits of information.

Should a government, its agencies or an ISP be able to capture, store and analyses this data? Personally I think not. And should you think that I’m just being a scaremonger have a read of Bill’s post “The digital age of rights” about the French government’s HADOPI legislation.

On the day Bill Thompson (who, by the way, was on blinding form) summed up the reason why when he summed up his hopes for the web thus:

I hoped that the web would help us know our neighbours better, so that we didn’t go and kill them. That hasn’t happened but it does now mean it’s much harder to get away with it – the world will now know if you do kill them.

Governments know this, which is why some now try to lock down access to the Internet when there is civil unrest in their country. And it is also why the rest of the web tries to help them break though.

Few Western governments, would condone the activities of such Totalitarian states. But it is interesting to consider whether Western governments would support North Korea or Iran setting up the kinds of databases currently being debated in Europe and the States. Now they might point out that the comparison isn’t a fair one since they are nice, democratic governments not nasty oppressive ones. But isn’t that painfully myopic? How do they know who will be in power in the future? How do they know how future governments might seek to use the information they are gathering now?

Seeking to prevent snooping on the Internet aside there is another reason why the web should remain open, and it is the reason why it’s important to fight for One Web.

Susan Greenfield quite rightly pointed out that ‘Knowledge is to be found by creating context, links between facts; it’s the context that counts’. Although she was making the point in an attempt to take a swipe at the Web, trying to suggest that the web is no more than a collection of facts devoid of context, it seems to me that in fact the web is the ultimate context machine. (One sometimes wonders whether she has ever actually used any of the services she complains about, indeed I wonder if she uses the web at all).

The web is, as the name suggest, a set of interconnected links. Those URIs and the links between, as TimBL reminded us, are made by people, they are followed by people and as such you can legitimately think of the Web as humanity connected.

URIs are incredibly powerful, particularly when they are used to identify things in addition to documents. When they are used to identify things (dereferencing to the appropriate data or document format) they can lead to entirely new ways to access information. An example highlighted by TimBL is the impact they might have on TV channels and schedules.

He suggested that the concept of a TV channel was limited and that it would be replaced with complete random access. When anyone, anywhere in the world, can follow a URI to a persistent resource (note he didn’t say click on a link) then the TV channel as a means of discovery and recommendation will be replaced with a trust network. “My friends have watched this, most of them like it…” sort of thing.

Of course to get there we need to change the way we think about the web and the way in which we publish things. And here TimBL pointed to the history of the web, suggesting that the next digital revolution will operate in a similar fashion.

The web originally happened not because senior management thought it was a good idea – it happened because people who ‘got it’ thought it was cool, that it was the right thing and that they were lucky enough to have managers that didn’t get in the way. Indeed this is exactly what happened when TimBL wrote the first web server and client and then when the early web pioneers started publishing web pages. They didn’t do it because they were told to, they didn’t do it because there was any immediate benefit. They did it because they thought that by doing it it would enable cool things to happen. The last couple of years suggests that we are on the cusp of a similar revolution as people start to publish linked data which will in turn result in a new digital revolution.

Interesting semantic web stuff

It’s starting to feel like the world has suddenly woken up to the whole Linked Data thing — and that’s clearly a very, very good thing. Not only are Google (and Yahoo!) now using RDFa but a whole bunch of other things are going on, all rather exciting, below is a round up of some of the best. But if you don’t know what I’m talking about you might like to start off with TimBL’s talk at TED.

"Semantic Web Rubik's Cube" by dullhunk. Some rights reserved.
"Semantic Web Rubik's Cube" by dullhunk. Some rights reserved.

TimBL is working with the UK Cabinet Office (as an advisor) to make our information more open and accessible on the web [cabinetoffice.gov.uk]
The blog states that he’s working on:

  • overseeing the creation of a single online point of access and work with departments to make this part of their routine operations.
  • helping to select and implement common standards for the release of public data
  • developing Crown Copyright and ‘Crown Commons’ licenses and extending these to the wider public sector
  • driving the use of the internet to improve consultation processes.
  • working with the Government to engage with the leading experts internationally working on public data and standards

The Guardian has an article on the appointment.

Closer to home there have been a few interesting developments

Media Meets Semantic Web – How the BBC Uses DBpedia and Linked Data to Make Connections [pdf]
Our paper at this years European Semantic Web Conference (ESWC2009) looking at how the BBC has adopted semantic web technologies, including DBpedia, to help provide a better, more coherent user experience. For which we won best paper of the in-use track – congratulations to Silver and Georgie.

The BBC has announced a couple SPARQL endpoints, hosted by talis and openlink
Both platforms allow you to search and query the BBC data in a number of different ways, including SPARQL — the standard query language for semantic web data. If you’re not familiar with SPARQL, the Talis folk have published a tutorial that uses some NASA data.

A social semantic BBC?
Nice presentation from Simon and Ben on how social discovery of content could work… “show me the radio programmes my friends have listen to, show me the stuff my friends like that I’ve not seen” all built on people’s existing social graph. People meet content via activity.

PriceWaterhouseCooper’s spring technology forecast focuses on Linked Data [pwc.com]
“Linked Data is all about supply and demand. On the demand side, you gain access to the comprehensive data you need to make decisions. On the supply side, you share more of your internal data with partners, suppliers, and—yes—even the public in ways they can take the best advantage of. The Linked Data approach is about confronting your data silos and turning your information management efforts in a different direction for the sake of scalability. It is a component of the information mediation layer enterprises must create to bridge the gap between strategy and operations… The term “Semantic Web” says more about how the technology works than what it is. The goal is a data Web, a Web where not only documents but also individual data elements are linked.”

Including an interview with me!

You should also check out…

sameas.org a service to help link up equivalent URIs
It helps you to find co-references between different data sets. Interestingly it’s also licenced under CC0 which means all copyright and related or neighboring rights are waived.

URL shortening it’s nasty but it’s also unnecessary

URL shortening is just wrong and it’s not just me that thinks so Joshua Schachter thinks so too and Simon Willison has a partial solution. The reason various folk are worried about URL shortening and think that it’s largely evil is because it breaks the web.

"The weakest link" by Darwin Bell. Some rights reserved.
"The weakest link" by Darwin Bell. Some rights reserved.

URLs need to be persistent and that’s not so likely when you use these services. But the ever increasing popularity of Twitter, who impose a 140 character limit on tweets, means that more and more URLs are getting shortened. The ridiculous thing is it isn’t even necessary.

In addition to the rev=”canonical” fix that Kellan proposed Michael has also recently come across longurl.org which

…could solve at least some of these problems. It provides a service to expand short urls from many, many providers into long urls

That’s cool because:

it caches the expansion so has a persistent store of short <> long mappings. They plan to expose these mappings on the web which would also solve [reliance on 3rd party – if they go out of business links break]

Of course what would be extra cool would be if, in addition to the source code being open sourced, so was the underlying database. That way if anything happened to longurl.org someone else could resurrect the service.

All good stuff. But the really ironic thing is that none of this should be neccessary. The ‘in 140 characters or less’ thing isn’t true. As Michael points out:

if i write a tweet to the 140 limit that includes a link then <a href=”whatever”>whatever</a> will be added to the message. so whilst the visible part of the message is limited to 140 chars the message source isn’t. There’s no reason twitter couldn’t use the long url in the href whilst keeping the short url as the link text…

All Twitter really needs to do is provide their own shortening service – if you enter anything that starts “http://&#8221; it gets shortened in the visable message. Of course it doesn’t really need to actually provide a unique, hashed URL, it could convert the anchor text to “link” or the first few letters of the title of the target page while retaining the full-fat, canonical URL in the href.

Rich Snippets

As everyone knows last night Google announced that they are now supporting RDFa and microformats to add ‘Rich Snippets’ to their search results page.

Rich Snippets give users convenient summary information about their search results at a glance. We are currently supporting data about reviews and people. When searching for a product or service, users can easily see reviews and ratings, and when searching for a person, they’ll get help distinguishing between people with the same name…

To display Rich Snippets, Google looks for markup formats (microformats and RDFa) that you can easily add to your own web pages.

That’s good right? Google gets a higher click through rate because, as their user testing shows, the more useful and relevant information people see from a results page, the more likely they are to click through; sites that support these technologies make their content more discoverable and everyone else gets to what they need more easily. Brilliant, and to make life even better because Google have adopted RDFa and microformats

…you not only make your structured data available for Google’s search results, but also for any service or tool that supports the same standard. As structured data becomes more widespread on the web, we expect to find many new applications for it, and we’re excited about the possibilities.

Those Google guys, they really don’t do evil. Well actually no, not so much. Actually Google are being a little bit evil here.

Doctor Evil
Doctor Evil

Here’s the problem. When Google went and implemented RDFa support they adopted the syntax but decided not to adopt the vocabularies – they went and reinvented their own. And as Ian points out it’s the vocabularies that matters. What Google decided to do is little support those properties and classes defined at data-vocabulary.org rather than supporting the existing ontologies such as: FOAF, vCard and vocab.org/review.

Now in some ways this doesn’t matter too much, after all it’s easy enough to do this sort of thing:

rel=”foaf:name google:name”

And Google do need to make Rich Snippets work on their search results, they need to control which vocabularies to support so that webmaster know what to do and so they can render the data appropriatley. But by starting off with a somewhat broken vocabulary they are providing a pretty big incentive to Web Masters to implement a broken version of RDFa. And they will implement the broken version because Google Juice is so important to the success of their site.

Google have taken an open standard and inserted a slug of proprietary NIH into it and that’s a shame, they could have done so much better. Indeed they could have supported RDFa as well as they support microformats.

Perhaps we shouldn’t be surprised, Google are a commercial operation – by adopting RDFa they get a healthy dose of “Google and the Semantic Web” press coverage while at the same time making their search results that bit better. And lets be honest the semweb community hasn’t done a great job at getting those vocabularies out and into the mainstream so Google’s decision won’t hurt it’s bottom line. Just don’t be fooled this isn’t Google supporting RDFa, it’s Google adding Rich Snippets.

Daytum I love you but please join the web

I’ve been lucky enough to have been a beta testers for daytum.com, a service for collecting and communicating personal data, and I love it. As you might expect from Ryan Case and Nicholas Feltron it’s a lovely piece of interaction and graphic design. You can record and visualise all sorts of qualitative and quantitative data – personally I’m recording information about what I eat, drink, how much I sleep and communicate (emails, blog posts, talks, tweets etc.) but others record the music they listen to, how far they run, gigs they’ve been to, books they’ve read. All sorts of things.

OK I probably drink too much coffee
OK I probably drink too much coffee

And now you too can record and visualise whatever you want because this weekend the service came out of beta. Now here’s the thing, as much as I love the service I wish it were more, well born of the web. You see I have a few problems with daytum.

My main problem is that I can’t point to the stuff I’m recording. That graphic at the top of this post doesn’t have a URL so I can’t link to it or the underlying data; and because I can’t point to it it limits what can be done with it. If I can’t link to to, I can’t embed it elsewhere, I can’t link it to other data sources and mash it up. And that’s a problem because the only possible URI for this sort of information about me is locked away in the daytum interface. Why isn’t there a nice RESTful URL for each ‘display’. Something like:


Once everything has a URL then I want each of those resources to be made available in a variety of different representations – as JSON, RDF and ATOM for starters – that way the data can be used, not just visualised.

And finally I want to be able to use URIs to describe what I’m measuring, not just strings. I want to be able to point to stuff out there on the web and say “at this time I consumed another one of those”. I’m not suggesting that everything should have to be described like this, but if there’s a URI to represent something I want to be able to point to it so everyone knows what I’m talking about.

In other words I want daytum.com to be following the Linked Data principles rather than an ajax only interface.

If you have a look at Felton’s own annual reports you will see that they group and aggregate all sorts of information but to achieve something similar (conceptually if not visually) then you will need a lot more from daytum than currently being offered.

Felton Annual Report 2008
Felton Annual Report 2008

The other big gap is the lack of an API to update information. Keeping daytum.com up to date is actually quite hard work and certainly to be able to collect the sort of data Nicholas Felton does to put together his annual reports would be onerous to say the least, but it needn’t be.

If daytum.com provided an API that allowed me to post information from other services that would be a great start, but actually it’s not always necessary, nor even that desirable. The Web already knows quite a lot about us, for example Fire Eagle and Dopplr know where I am/ been, delicious knows what I think is interesting on the web, and how I describe those things, Twitter and this blog what I doing and thinking about; for others Last.fm knows what music they are listening to. Daytum doesn’t need to replicate all of that data, indeed it shouldn’t, it could simply request that data when needed — to visualise it. (it shouldn’t store it because it makes it harder to manage access to it).

The one thing I don’t want, however, is yet another social networking site, I don’t want social features to be part of daytum. I don’t want them because I don’t need them – there are already loads of places integrated into my social graph, whether that be Twitter, Flickr, Facebook or this blog. I really don’t want to have to import and then maintain another social graph. I do however want to be able to squirt the data I’m collecting or aggregating here at daytum into my existing social graph; much as Fire Eagle adds location brokerage to existing services so I want a service that adds personal data to existing social networking sites.