Interesting semantic web stuff

2009 June 11
by Tom Scott

It’s starting to feel like the world has suddenly woken up to the whole Linked Data thing — and that’s clearly a very, very good thing. Not only are Google (and Yahoo!) now using RDFa but a whole bunch of other things are going on, all rather exciting, below is a round up of some of the best. But if you don’t know what I’m talking about you might like to start off with TimBL’s talk at TED.

"Semantic Web Rubik's Cube" by dullhunk. Some rights reserved.

"Semantic Web Rubik's Cube" by dullhunk. Some rights reserved.

TimBL is working with the UK Cabinet Office (as an advisor) to make our information more open and accessible on the web [cabinetoffice.gov.uk]
The blog states that he’s working on:

  • overseeing the creation of a single online point of access and work with departments to make this part of their routine operations.
  • helping to select and implement common standards for the release of public data
  • developing Crown Copyright and ‘Crown Commons’ licenses and extending these to the wider public sector
  • driving the use of the internet to improve consultation processes.
  • working with the Government to engage with the leading experts internationally working on public data and standards

The Guardian has an article on the appointment.

Closer to home there have been a few interesting developments

Media Meets Semantic Web – How the BBC Uses DBpedia and Linked Data to Make Connections [pdf]
Our paper at this years European Semantic Web Conference (ESWC2009) looking at how the BBC has adopted semantic web technologies, including DBpedia, to help provide a better, more coherent user experience. For which we won best paper of the in-use track – congratulations to Silver and Georgie.

The BBC has announced a couple SPARQL endpoints, hosted by talis and openlink [welcomebackstage.com]
Both platforms allow you to search and query the BBC data in a number of different ways, including SPARQL — the standard query language for semantic web data. If you’re not familiar with SPARQL, the Talis folk have published a tutorial that uses some NASA data.

A social semantic BBC? [slideshare]
Nice presentation from Simon and Ben on how social discovery of content could work… “show me the radio programmes my friends have listen to, show me the stuff my friends like that I’ve not seen” all built on people’s existing social graph. People meet content via activity.

PriceWaterhouseCooper’s spring technology forecast focuses on Linked Data [pwc.com]
“Linked Data is all about supply and demand. On the demand side, you gain access to the comprehensive data you need to make decisions. On the supply side, you share more of your internal data with partners, suppliers, and—yes—even the public in ways they can take the best advantage of. The Linked Data approach is about confronting your data silos and turning your information management efforts in a different direction for the sake of scalability. It is a component of the information mediation layer enterprises must create to bridge the gap between strategy and operations… The term “Semantic Web” says more about how the technology works than what it is. The goal is a data Web, a Web where not only documents but also individual data elements are linked.”

Including an interview with me!

You should also check out…

sameas.org a service to help link up equivalent URIs
It helps you to find co-references between different data sets. Interestingly it’s also licenced under CC0 which means all copyright and related or neighboring rights are waived.

URL shortening it’s nasty but it’s also unnecessary

2009 June 9
by Tom Scott

URL shortening is just wrong and it’s not just me that thinks so Joshua Schachter thinks so too and Simon Willison has a partial solution. The reason various folk are worried about URL shortening and think that it’s largely evil is because it breaks the web.

"The weakest link" by Darwin Bell. Some rights reserved.

"The weakest link" by Darwin Bell. Some rights reserved.

URLs need to be persistent and that’s not so likely when you use these services. But the ever increasing popularity of Twitter, who impose a 140 character limit on tweets, means that more and more URLs are getting shortened. The ridiculous thing is it isn’t even necessary.

In addition to the rev=”canonical” fix that Kellan proposed Michael has also recently come across longurl.org which

…could solve at least some of these problems. It provides a service to expand short urls from many, many providers into long urls

That’s cool because:

it caches the expansion so has a persistent store of short <> long mappings. They plan to expose these mappings on the web which would also solve [reliance on 3rd party – if they go out of business links break]

Of course what would be extra cool would be if, in addition to the source code being open sourced, so was the underlying database. That way if anything happened to longurl.org someone else could resurrect the service.

All good stuff. But the really ironic thing is that none of this should be neccessary. The ‘in 140 characters or less’ thing isn’t true. As Michael points out:

if i write a tweet to the 140 limit that includes a link then <a href=”whatever”>whatever</a> will be added to the message. so whilst the visible part of the message is limited to 140 chars the message source isn’t. There’s no reason twitter couldn’t use the long url in the href whilst keeping the short url as the link text…

All Twitter really needs to do is provide their own shortening service – if you enter anything that starts “http://” it gets shortened in the visable message. Of course it doesn’t really need to actually provide a unique, hashed URL, it could convert the anchor text to “link” or the first few letters of the title of the target page while retaining the full-fat, canonical URL in the href.

First steps towards a more coherent online natural history offer at the BBC

2009 May 19
by Tom Scott

For the last five or so months I’ve been working on a new set of sites under the umbrella of “BBC Earth” — a programme of work aimed at giving everyone access to some of the best natural history content in the world. The project is made up of three complementary and interlinked projects, the first couple of which recently went live.

Out of the wild

Kakpo -- Out of the wild

The first site to go live, “Out of the Wild” aims to bring you a view on the natural world from the perspective of our crews while on location; a sort of “From our correspondent” for the natural world. The stories — a mix of short video clips, slideshows and text based stories — are all grouped around the expeditions, the people on location and the originating programmes. Our hope is that you will enjoy this more personal view of the natural world brought to you from some of the most amazing part of the world by the worlds best wildlife documentaries makers.

We then launched “Earth News” which does pretty much what is says on the tin — news about the natural world.

We’re both aggregating natural history news articles from elsewhere on the BBC news site as well as new articles (some unique) written for Earth News, such as the story of the adult king penguin which kidnapped a skua chick and then attempted to raise it.

The final part of BBC Earth will see us starting to open up the BBC archive in, what I hope, will be interesting and useful ways.

We then, of course, need to make all of this available in nice machine representation so that others can start to hack with the data.

Rich Snippets

2009 May 13
tags:
by Tom Scott

As everyone knows last night Google announced that they are now supporting RDFa and microformats to add ‘Rich Snippets’ to their search results page.

Rich Snippets give users convenient summary information about their search results at a glance. We are currently supporting data about reviews and people. When searching for a product or service, users can easily see reviews and ratings, and when searching for a person, they’ll get help distinguishing between people with the same name…

To display Rich Snippets, Google looks for markup formats (microformats and RDFa) that you can easily add to your own web pages.

That’s good right? Google gets a higher click through rate because, as their user testing shows, the more useful and relevant information people see from a results page, the more likely they are to click through; sites that support these technologies make their content more discoverable and everyone else gets to what they need more easily. Brilliant, and to make life even better because Google have adopted RDFa and microformats

…you not only make your structured data available for Google’s search results, but also for any service or tool that supports the same standard. As structured data becomes more widespread on the web, we expect to find many new applications for it, and we’re excited about the possibilities.

Those Google guys, they really don’t do evil. Well actually no, not so much. Actually Google are being a little bit evil here.

Doctor Evil

Doctor Evil

Here’s the problem. When Google went and implemented RDFa support they adopted the syntax but decided not to adopt the vocabularies – they went and reinvented their own. And as Ian points out it’s the vocabularies that matters. What Google decided to do is little support those properties and classes defined at data-vocabulary.org rather than supporting the existing ontologies such as: FOAF, vCard and vocab.org/review.

Now in some ways this doesn’t matter too much, after all it’s easy enough to do this sort of thing:

rel=”foaf:name google:name”

And Google do need to make Rich Snippets work on their search results, they need to control which vocabularies to support so that webmaster know what to do and so they can render the data appropriatley. But by starting off with a somewhat broken vocabulary they are providing a pretty big incentive to Web Masters to implement a broken version of RDFa. And they will implement the broken version because Google Juice is so important to the success of their site.

Google have taken an open standard and inserted a slug of proprietary NIH into it and that’s a shame, they could have done so much better. Indeed they could have supported RDFa as well as they support microformats.

Perhaps we shouldn’t be surprised, Google are a commercial operation – by adopting RDFa they get a healthy dose of “Google and the Semantic Web” press coverage while at the same time making their search results that bit better. And lets be honest the semweb community hasn’t done a great job at getting those vocabularies out and into the mainstream so Google’s decision won’t hurt it’s bottom line. Just don’t be fooled this isn’t Google supporting RDFa, it’s Google adding Rich Snippets.

Interesting stuff from around the web 2009-04-22

2009 April 22
by Tom Scott
Amazing render job by Alessandro Prodan

Amazing render job by Alessandro Prodan

The open web

Does OpenID need to be hard? [factoryjoe.com]
Chris considers “the big fat stinking elephant in the room: OpenID usability and the paradox of choice” as usual it’s a good read.

I wonder whether restricting the OpenID providers displayed based on visited link would help? i.e. hide those that haven’t been visited? It clearly wouldn’t be perfect – Google isn’t my OpenID provider but I visit google.com lots, but it should cut down some of the clutter.

Security flaw leads Twitter, others to pull OAuth support [cnet.com]
The hole makes it possible for a hacker to use social-engineering tactics to trick users into exposing their data. The OAuth protocol itself requires tweaking to remove the vulnerability, and a source close to OAuth’s development team said that there have been no known violations, that it has been aware of it for a few days now, and has been coordinating responses with vendors. A solution should be announced soon.

Twitter and social networks

Relationship Symmetry in Social Networks: Why Facebook will go Fully Asymmetric [bokardo.com]
Asymmetric model better mimics how real attention works…and how it has always worked. Any person using Twitter can have a larger number of followers than followees, effectively giving them more attention than they give. This attention inequality is the foundation of the Twitter service… The IA of Facebook does not allow this. Facebook has designed a service that forces you to keep track of your friends, whether you want to or not. Facebook is modeling personal relationships, not relationships based on attention. That’s the crucial difference between Facebook and Twitter at the moment.

When Twitter Gets Weird… [Dave Gorman]
“The difference between following someone and replying to them is the difference between stopping to chat with someone in the street or giving them a badge declaring that you know them. One is actual interaction. The other is just something you can show your friends.” Blimey – Dave Gorman clearly has a much better grasp of life, the web and being a human than the two people who attacked him for not following them on Twitter. As Dave points out he hopes that Twiiter doesn’t descend into the MySpace “thanks for the add’ nonsense”. Me too.

Google profiles included in search results [googleblog]
A new “Profile results” section will appear at the bottom of a Google search page, when it finds a strong match in response to a name-based search. But only in the US. To help things along remember to use rel=me elsewhere (here’s how).

Shortlisted for a BAFTA, launch of clickable tracklistings and the start of BBC Earth

Look, look clickable tracklistings, w00t!
Few will every know the pain to get this useful little (cross domain) feature live.

We’ve been shortlisted for an Interactive Innovation BAFTA
The /programmes aka Automated Programme Support project. So proud.

Out of the Wild [bbc.co.uk]
Our first tentative steps towards improving the BBC’s online natural history offering. Out of The Wild seeks to bring you stories from BBC crews on location. Eventually this should all form part of an integrated programme offer.

Stuff

Biological Taxonomy Vocabulary
An RDF vocabulary for the taxonomy of all forms of life.

On url shorteners [joshua.schachter.org]
Joshua Schachter considers the issues associated with URL shortening. Similar argument to the one I put forward in “The URL shortening antipattern” but with some useful recommendations: “One important conclusion is that services providing transit (or at least require a shortening service) should at least log all redirects, in case the shortening services disappear. If the data is as important as everyone seems to think, they should own it. And websites that generate very long URLs, such as map sites, could provide their own shortening services. Or, better yet, take steps to keep the URLs from growing monstrous in the first place.”