As everyone knows last night Google announced that they are now supporting RDFa and microformats to add ‘Rich Snippets’ to their search results page.
Rich Snippets give users convenient summary information about their search results at a glance. We are currently supporting data about reviews and people. When searching for a product or service, users can easily see reviews and ratings, and when searching for a person, they’ll get help distinguishing between people with the same name…
To display Rich Snippets, Google looks for markup formats (microformats and RDFa) that you can easily add to your own web pages.
That’s good right? Google gets a higher click through rate because, as their user testing shows, the more useful and relevant information people see from a results page, the more likely they are to click through; sites that support these technologies make their content more discoverable and everyone else gets to what they need more easily. Brilliant, and to make life even better because Google have adopted RDFa and microformats
…you not only make your structured data available for Google’s search results, but also for any service or tool that supports the same standard. As structured data becomes more widespread on the web, we expect to find many new applications for it, and we’re excited about the possibilities.
Those Google guys, they really don’t do evil. Well actually no, not so much. Actually Google are being a little bit evil here.
Here’s the problem. When Google went and implemented RDFa support they adopted the syntax but decided not to adopt the vocabularies – they went and reinvented their own. And as Ian points out it’s the vocabularies that matters. What Google decided to do is little support those properties and classes defined at data-vocabulary.org rather than supporting the existing ontologies such as: FOAF, vCard and vocab.org/review.
Now in some ways this doesn’t matter too much, after all it’s easy enough to do this sort of thing:
And Google do need to make Rich Snippets work on their search results, they need to control which vocabularies to support so that webmaster know what to do and so they can render the data appropriatley. But by starting off with a somewhat broken vocabulary they are providing a pretty big incentive to Web Masters to implement a broken version of RDFa. And they will implement the broken version because Google Juice is so important to the success of their site.
Google have taken an open standard and inserted a slug of proprietary NIH into it and that’s a shame, they could have done so much better. Indeed they could have supported RDFa as well as they support microformats.
Perhaps we shouldn’t be surprised, Google are a commercial operation – by adopting RDFa they get a healthy dose of “Google and the Semantic Web” press coverage while at the same time making their search results that bit better. And lets be honest the semweb community hasn’t done a great job at getting those vocabularies out and into the mainstream so Google’s decision won’t hurt it’s bottom line. Just don’t be fooled this isn’t Google supporting RDFa, it’s Google adding Rich Snippets.
The lovable Mr Stephen Fry recently noted [iTunes link] that the challenge isn’t to help people become “computer literate” instead it is to make computers “human literate”. And when you think of the last 25 years, as an industry, we’ve done a pretty good job.
1984 saw Apple launch the Macintosh and with it the world was introduced to the GUI. And then in 1989TimBL invented the web and changed the world. I’m not suggesting for a moment that everything is OK in the world of interaction design, just that we have come a very long way.
The genius of the Web and the Macintosh is their ability to abstract information to make it more useful, as TimBL put it when talking about the Giant Global Graph:
[The Net] made it simpler because [instead] of having to navigate phone lines from one computer to the next, you could write programs as though the net were just one big cloud, where messages went in at your computer and came out at the destination one. The realization was, “It isn’t the cables, it is the computers which are interesting”. The Net was designed to allow the computers to be seen without having to see the cables.
Simpler, more powerful. Obvious, really.
And then with the development of the Web we could go one step further:
“It isn’t the computers, but the documents which are interesting”. Now you could browse around a sea of documents without having to worry about which computer they were stored on. Simpler, more powerful. Obvious, really.
That’s where we are, more or less, right now, except there’s a realisation that we can keep going, keep making the web more useful and easier to use because:
“It’s not the documents, it is the things they are about which are important”
To achieve this we need to be able to identify those things we’re interested in and the relationship between them, in a way that is above the level of documents, if we do this then we get reuse of data around the concept. That’s just what Linked Data is all about, allowing us to break free of the document layer by focusing on URLs.
By thinking about the web as a web of (identifiers for) interconnected things, not a web of pages means that when I watch a TV programme online it’s not the page on iPlayer (other players are available) that matters to me instead it’s the URI of the programme and it’s that URI that I bookmark. This means that whatever device I use, my iPhone, laptop or IP enabled TV, it will use the device appropriate view. But because we’re talking about URIs and HTTP isn’t just a different way of tuning into a set of presets it also means, as Nicholas Negroponte puts it:
My VCR of the future will say to me when I come home, “Nicholas, I looked at five thousand hours of television while you were out and recorded six segments for you which total forty minutes. Your high school classmate was on the ‘Today’ show, there was a documentary on the Dodecanese Islands, etc…” It will do this by looking at the headers. The bits about the bits change broadcasting totally. They give you a handle by which to grab what interests you and provide the network with a means to ship them into any nook or cranny that wants them.
Designing the web in this way, by thinking about what real world objects people care about, giving them all URIs and then linking them up and linking them to the rest of the web – building the web the linked data way – means you can use the network to not only deliver content but also let people discover more content, mash content together to create new stories.
This as I see it, abstracting the problem above the document layer, is a very sensible way to help make computers more ‘human literate’ because people can stop thinking about webpages and instead start thinking about the stuff that matters to them – whether that be a TV programme, a music track, a book, a person, or a holiday. And whether they access that thing on their desktop computer, mobile phone or IP enabled TV set.
However, while it is great news that Twitter will be implementing OAuth soon, they haven’t yet and there are plenty of other services that don’t use it, it’s therefore worth pausing for a moment to consider how we’ve got here and what the issues are, because while it will be great — right now — it’s a bit rubbish.
We shouldn’t assume that either Twitter or the developers responsible for the third-party apps (those requesting your credentials) are trying to do anything malicious — far from it — as Chris Messinaexplains:
The difference between run-of-the-mill phishing and password anti-pattern cases is intent. Most third parties implement the anti-pattern out of necessity, in order to provide an enhanced service. The vast majority don’t do it to be malicious or because they intend to abuse their customers — quite the contrary! However, by accepting and storing customer credentials, these third parties are putting themselves in a potentially untenable situation: servers get hacked, data leaks and sometimes companies — along with their assets — are sold off with untold consequences for the integrity — or safety — of the original customer data.
The folks at Twitter are very aware of the risks associated with their users giving out usernames and passwords. But they also have concerns about the fix:
The downside is that OAuth suffers from many of the frustrating user experience issues and phishing scenarios that OpenID does. The workflow of opening an application, being bounced to your browser, having to login to twitter.com, approving the application, and then bouncing back is going to be lost on many novice users, or used as a means to phish them. Hopefully in time users will be educated, particularly as OAuth becomes the standard way to do API authentication.
Another downside is that OAuth is a hassle for developers. BasicAuth couldn’t be simpler (heck, it’s got “basic” in the name). OAuth requires a new set of tools. Those tools are currently semi-mature, but again, with time I’m confident they’ll improve. In the meantime, OAuth will greatly increase the barrier to entry for the Twitter API, something I’m not thrilled about.
It also doesn’t change the fact that someone could sell OAuth tokens, although OAuth makes it easier to revoke credentials for a single application or site, rather than changing your password, which revokes credentials to all applications.
Digital identities exist to enable human experiences online and if you store someone’s Identity you have a relationship. So when you force third party apps into collecting usernames, passwords (and any other snippet of someone’s Identity) it forces those users into having a relationship with that company — whether the individual or the company wants it. If you store someones identity you have a relationship with them.
With technology we tend not to enable trust in the way most people use the term. Trust is based on relationships. In close relationships we make frequent, accurate observations that lead to a better understanding and close relationships, this process however, requires investment and commitment. That said a useful, good relationship provides value for all parties. Jamie Lewis has suggested that there are three types of relationship (on the web):
Custodial Identities — identities are directly maintained by an organisation and a person has a direct relationship with the organisation;
Contextual Identities — third parties are allowed to use some parts of an identity for certain purposes;
Transactional Identities — credentials are passed for a limited time for a specific purpose to a third party.
Of course there are also some parts to identity which are shared and not wholly owned by any one party.
This mirrors how real world identities work. Our banks, employers and governments maintain custodial identities; whereas a pub, validating your age before serving alcohol need only have the yes/no question answered — are you over 18?
Twitter acts as a custodian for part of my online identity and I don’t want third party applications that use the Twitter API to also act as custodians but the lack of OAuth support means that whether I or they like it they have to. They should only have my transactional identity. Forcing them to hold a custodial identity places both parties (me and the service using the Twitter API) at risk and places unnecessary costs on the third party service (whether they realise it or not!).
But, if I’m honest, I don’t really want Twitter to act as Custodian for my Identity either — I would rather they held my Contextual Identity and my OpenID provider provided the Custodial Identity. That way I can pick a provider I trust to provide a secure identity service and then authorise Twitter to use part of my identity for a specific purpose, in this case micro-blogging. Services using the Twitter API then either use a transactional identity or reuse the contextual identity. I can then control my online identity, those organisations that have invested in appropriate security can provide Custodial Identity services and an ecosystem of services can be built on top of that.
Just wanted to correct a couple of mistakes, as pointed out by Chris, below:
1. Twitter was hacked with a dictionary attack against an admin’s account. Not from phishing, and not from a third-party’s database with Twitter credentials.
2. The phishing scam worked because it tricked people into thinking that they received a real email from Twitter.
Neither OpenID nor OAuth would have prevented this (although that not to say Twitter shouldn’t implement OAuth). Sorry about that.