Rich Snippets

As everyone knows last night Google announced that they are now supporting RDFa and microformats to add ‘Rich Snippets’ to their search results page.

Rich Snippets give users convenient summary information about their search results at a glance. We are currently supporting data about reviews and people. When searching for a product or service, users can easily see reviews and ratings, and when searching for a person, they’ll get help distinguishing between people with the same name…

To display Rich Snippets, Google looks for markup formats (microformats and RDFa) that you can easily add to your own web pages.

That’s good right? Google gets a higher click through rate because, as their user testing shows, the more useful and relevant information people see from a results page, the more likely they are to click through; sites that support these technologies make their content more discoverable and everyone else gets to what they need more easily. Brilliant, and to make life even better because Google have adopted RDFa and microformats

…you not only make your structured data available for Google’s search results, but also for any service or tool that supports the same standard. As structured data becomes more widespread on the web, we expect to find many new applications for it, and we’re excited about the possibilities.

Those Google guys, they really don’t do evil. Well actually no, not so much. Actually Google are being a little bit evil here.

Doctor Evil
Doctor Evil

Here’s the problem. When Google went and implemented RDFa support they adopted the syntax but decided not to adopt the vocabularies – they went and reinvented their own. And as Ian points out it’s the vocabularies that matters. What Google decided to do is little support those properties and classes defined at data-vocabulary.org rather than supporting the existing ontologies such as: FOAF, vCard and vocab.org/review.

Now in some ways this doesn’t matter too much, after all it’s easy enough to do this sort of thing:

rel=”foaf:name google:name”

And Google do need to make Rich Snippets work on their search results, they need to control which vocabularies to support so that webmaster know what to do and so they can render the data appropriatley. But by starting off with a somewhat broken vocabulary they are providing a pretty big incentive to Web Masters to implement a broken version of RDFa. And they will implement the broken version because Google Juice is so important to the success of their site.

Google have taken an open standard and inserted a slug of proprietary NIH into it and that’s a shame, they could have done so much better. Indeed they could have supported RDFa as well as they support microformats.

Perhaps we shouldn’t be surprised, Google are a commercial operation – by adopting RDFa they get a healthy dose of “Google and the Semantic Web” press coverage while at the same time making their search results that bit better. And lets be honest the semweb community hasn’t done a great job at getting those vocabularies out and into the mainstream so Google’s decision won’t hurt it’s bottom line. Just don’t be fooled this isn’t Google supporting RDFa, it’s Google adding Rich Snippets.

Identity, relationships and why OAuth and OpenID matter

Twitter hasn’t had a good start to 2009, it was hacked via a phishing scam and then there were concerns that your passwords were up for sale and that’s not a good thing; except there may be a silver lining to Twitter’s cloud because it has also reopened the password anti-pattern debate and the use of OAuth as a solution to the problem. Indeed it does now looks like Twitter will be implementing OAuth as a result. W00t!

touch by Meredith Farmer (Flickr). Some rights reserved.
Day 68 :: touch by Meredith Farmer (Flickr). Some rights reserved.

However, while it is great news that Twitter will be implementing OAuth soon, they haven’t yet and there are plenty of other services that don’t use it, it’s therefore worth pausing for a moment to consider how we’ve got here and what the issues are, because while it will be great — right now — it’s a bit rubbish.

We shouldn’t assume that either Twitter or the developers responsible for the third-party apps (those requesting your credentials) are trying to do anything malicious — far from it — as Chris Messina explains:

The difference between run-of-the-mill phishing and password anti-pattern cases is intent. Most third parties implement the anti-pattern out of necessity, in order to provide an enhanced service. The vast majority don’t do it to be malicious or because they intend to abuse their customers — quite the contrary! However, by accepting and storing customer credentials, these third parties are putting themselves in a potentially untenable situation: servers get hacked, data leaks and sometimes companies — along with their assets — are sold off with untold consequences for the integrity — or safety — of the original customer data.

The folks at Twitter are very aware of the risks associated with their users giving out usernames and passwords. But they also have concerns about the fix:

The downside is that OAuth suffers from many of the frustrating user experience issues and phishing scenarios that OpenID does. The workflow of opening an application, being bounced to your browser, having to login to twitter.com, approving the application, and then bouncing back is going to be lost on many novice users, or used as a means to phish them. Hopefully in time users will be educated, particularly as OAuth becomes the standard way to do API authentication.

Another downside is that OAuth is a hassle for developers. BasicAuth couldn’t be simpler (heck, it’s got “basic” in the name). OAuth requires a new set of tools. Those tools are currently semi-mature, but again, with time I’m confident they’ll improve. In the meantime, OAuth will greatly increase the barrier to entry for the Twitter API, something I’m not thrilled about.

Alex also points out that OAuth isn’t a magic bullet.

It also doesn’t change the fact that someone could sell OAuth tokens, although OAuth makes it easier to revoke credentials for a single application or site, rather than changing your password, which revokes credentials to all applications.

This doesn’t even begin to address the phishing threat that OAuth encourages – its own “anti-pattern”. Anyone confused about this would do well to read Lachlan Hardy’s blog post about this from earlier in 2008: http://log.lachstock.com.au/past/2008/4/1/phishing -fools/.

All these are valid points — and Ben Ward has written an excellent post discussing the UX issues and options associated with OAuth — but it also misses something very important. You can’t store someone’s identity without having a relationship.

Digital identities exist to enable human experiences online and if you store someone’s Identity you have a relationship. So when you force third party apps into collecting usernames, passwords (and any other snippet of someone’s Identity) it forces those users into having a relationship with that company — whether the individual or the company wants it. If you store someones identity you have a relationship with them. 

With technology we tend not to enable trust in the way most people use the term. Trust is based on relationships. In close relationships we make frequent, accurate observations that lead to a better understanding and close relationships, this process however, requires investment and commitment. That said a useful, good relationship provides value for all parties. Jamie Lewis has suggested that there are three types of relationship (on the web):

  1. Custodial Identities — identities are directly maintained by an organisation and a person has a direct relationship with the organisation;
  2. Contextual Identities — third parties are allowed to use some parts of an identity for certain purposes;
  3. Transactional Identities — credentials are passed for a limited time for a specific purpose to a third party.

Of course there are also some parts to identity which are shared and not wholly owned by any one party.

This mirrors how real world identities work. Our banks, employers and governments maintain custodial identities; whereas a pub, validating your age before serving alcohol need only have the yes/no question answered — are you over 18?

Twitter acts as a custodian for part of my online identity and I don’t want third party applications that use the Twitter API to also act as custodians but the lack of OAuth support means that whether I or they like it they have to. They should only have my transactional identity. Forcing them to hold a custodial identity places both parties (me and the service using the Twitter API) at risk and places unnecessary costs on the third party service (whether they realise it or not!).

But, if I’m honest, I don’t really want Twitter to act as Custodian for my Identity either — I would rather they held my Contextual Identity and my OpenID provider provided the Custodial Identity. That way I can pick a provider I trust to provide a secure identity service and then authorise Twitter to use part of my identity for a specific purpose, in this case micro-blogging. Services using the Twitter API then either use a transactional identity or reuse the contextual identity. I can then control my online identity, those organisations that have invested in appropriate security can provide Custodial Identity services and an ecosystem of services can be built on top of that.

UPDATE

Just wanted to correct a couple of mistakes, as pointed out by Chris, below:

1. Twitter was hacked with a dictionary attack against an admin’s account. Not from phishing, and not from a third-party’s database with Twitter credentials.
2. The phishing scam worked because it tricked people into thinking that they received a real email from Twitter.

Neither OpenID nor OAuth would have prevented this (although that not to say Twitter shouldn’t implement OAuth). Sorry about that.

Cloud computing going full circle

Richard Stallman, GNU’s founder, recently warned that Cloud Computing is a trap.

One reason you should not use web applications to do your computing is that you lose control, it’s just as bad as using a proprietary program. Do your own computing on your own computer with your copy of a freedom-respecting program. If you use a proprietary program or somebody else’s web server, you’re defenceless. You’re putty in the hands of whoever developed that software.

'IBM's $10 Billion Machine' by jurvetson. Used under License.
IBM's $10 Billion Machine by jurvetson. Used under license.

Before we go any futher I should probably try to explain what I mean by Cloud Computing, especially since Larry Ellison has described it as “complete gibberish“:

Maybe I’m an idiot, but I have no idea what anyone is talking about. What is it? It’s complete gibberish. It’s insane. When is this idiocy going to stop?

For starters it’s important to understand that Cloud computing isn’t about doing anything new, instead it’s about applications that run in the web rather than your desktop. There are four components that make up Cloud Computing, moving down the stack from consumer facing products we have:

Applications – stuff like GMail, Flickr and del.icio.us (yes I know they’ve changed the name).

Application environments – frameworks where you can deploy your own code like Google’s App Engine and Microsoft’s Live Mesh.

Infrastructure, including storage – lower level services that let you run your own applications, virtualized servers, stuff like Amazon’s EC3 and S3.

And then there are also clients – hardware devices that have been specifically designed to deliver cloud services, for example the iPhone and Google’s Android phones.

The reason Richard Stallman dislikes Cloud Computing is the same reason Steven Pemberton suggested we should all have a website at this year’s XTech.

There are inherent dangers for users of Web 2.0. For a start, by putting a lot of work into a Web site, you commit yourself to it, and lock yourself into their data formats. This is similar to data lock-in when you use a proprietary program. You commit yourself and lock yourself in. Moving comes at great cost.

…[Metcalf’s law] postulates that the value of a network is proportional to the square of the number of nodes in the network. Simple maths shows that if you split a network into two, its value is halved. This is why it is good that there is a single email network, and bad that there are many instant messenger networks. It is why it is good that there is only one World Wide Web.

Web 2.0 partitions the Web into a number of topical sub-Webs, and locks you in, thereby reducing the value of the network as a whole.

So does this mean that user contributed content is a Bad Thing? Not at all, it is the method of delivery and storage that is wrong. The future lies in better aggregators.

But we’ve been here before haven’t we? It certainly sounds similar to the pre Web era. Initially with IBM and then with closed networks like CompuServe and America Online we had companies that retained complete control of the environment. Third party developers had limited or no access to the platform and users of the system stored all their data on someone elses hardware. For sure this model provided advantages. If something went wrong there was only one person you needed to contact to get it sorted, someone else (who knew more about this stuff than you) could worry about keeping the system running, backing up your data and so on.

But there was a price to this convenience. You were effectively tied to the one provider (or at the very least it was expensive to move to a different provider), there was very little innovation nor development of new applications – you had email, forums and content, what more would you want? And of course there was censorship – if one of these networks didn’t like what was being said it could pull it.

At the other end of the spectrum there were highly specialised appliances like the Friden Flexowriter. They were designed to do one job and one job only, they couldn’t be upgraded but they were reliable and easy to learn. A bit like the iPhone.

Then along came generalised PC – computers that provided a platform that anyone could own, anyone could write an application for and anyone could use to manage their data. And relatively soon after the advent of pre-assembled computers along came the Web. The ultimate generalised platform, one that provided an environment for anyone to build their own idea on and exploit data in a way never before realised. But there was a problem. Security and stability suffered.

PCs are a classic Disruptive Technology – in the early days they were pretty rubbish, but they let hobbyist tinker and play with the technology. Over time PCs got better (at a faster rate than people’s expectations) and soon you were able to do as much with a PC as you could with a Mainframe but with the added advantage of freedom and much richer application ecosystem.

Another implication of Clayton’s Disruptive Technology theory is that as a technology evolves it moves thought cycles. Initially a technology is unable to meet most people’s expectation and as a result the engineers need to push the limits of what’s possible. The value is in the platform. But as the technology gets better and better so the engineers no longer need to push the limits of what’s possible and the value switches from the platform to the components and speed to market.

That is where we are now – the value is no longer with the platform – it’s with the components, that run on the platform. And it’s no longer about functionality it’s more about performance and reliability. And because the value is with the applications it makes sense for application developers to use Infrastructure or Application Environments supplied by others. And it makes sense for customers to use Computing Cloud Applications because they are reliable and they let you focus on what interests you. A bit like the companies that used IBM Mainframes. But if we make that deal I suspect we will be in the same situation as previous generations found themselves in – we won’t like the deal we’ve made and we will move back to generalised, interoperable systems that let us retain control.

I don’t pay attention to that anymore…

I use to watch Lost – I don’t bother anymore. In fact there are loads of things that I use to pay attention to that I don’t anymore. My tastes change, what I once thought of as good I don’t anymore, and what was once good has just gone downhill.

APML or Attention Profile Markup Language is an open, nonproprietary file format that uses XML to encoded a users interests into a single file.

… consolidated, structured descriptions of people’s interests and dislikes. The information about your interests and how much each means to you (ranking) is stored in a way so that computers and web-based services can easily read it, interpret it, process it and pass it on should you request and permit them to do so.”

more…

What APML gives you then is a file expressing the relative amount of attention you have given various URLs and when you last looked at that those URLs. The idea then is that you can move this file from one location to the next, you can also (because it’s XML) edit this file if you don’t want your profile to include the fact you lingered on something embarrassing.

But what I pay attention to changes over time and therefore having a single file that describes what I pay attention seems a bit wrong headed.

My problem with APML is that it’s based on a view of file transfer and data sharing – one where you copy and move a file from one system to the next. I just don’t believe that that is how the Web works. As Chris Messina puts it (in relation to dataportability.org):

In my mind, when the arena of application is the open, always-on, hyper-connected web, constructing best practices using an offline model of data is fraught with fundamental problems and distractions and is ultimately destined to fail, since the phrase is immediately obsolete, unable to capture in its essence contemporary developments in the cloud concept of computing (which consists of follow-your-nose URIs and URLs rather than discreet harddrives), and in the move towards push-based subscription models that are real-time and addressable.

Attention data is highly time and context sensitive – being able to download and share a file with another system seems all wrong. Instead I think that being able to stream data between (authorised) services is the way to go.

If you enabled data to be streamed then you could make you your attention data available at meaningful URLs. For example, my attention for 2007 might be at something like: tomscott.name/apml/2007 and for today at tomscott.name/apml/2008/05/18.

This approach would allow you to expose your attention data (using the AMPL schema if you wish) at meaningful URLs and in useful time slices. You could then combine it with other forms of linked data – like programmes – to give additional context and additional information to your attention data.

I’m all up for making attention data accessible (via an appropriate secure API) but making it available as a file to be downloaded and imported into another app leaves me a little cold.

Photo: What are you looking at?, by Banksy and 'No life before coffee'. Used under license.

My thoughts on XTech

I’ve just posted a piece on my thoughts about the first couple of days at last weeks XTech over at the BBC’s Internet blog.

Note Book and pen

As David Recordon of Six Apart noted in Wednesday morning’s plenary, open software and hardware have become hip and have given small groups of developers the chance to build interesting web apps – and, more importantly, the chance to get them adopted. This is a new wave of web companies which expose their data via APIs and consume others’ APIs. And what is interesting about these companies is that they are converging on common standards – in particular, OAuth and OpenID.”

more…

There was a lot on data portability and Semantic Web stuff (including our presentation on the Programme’s Ontology) both of which I’m really pleased to report are getting some practical adoption. And as with the Social Graph Foo Camp XMPP appears to be to an important emergent technology. I just hope it can scale.

Photo: 19th February 2005, by Paul Watson. Used under licence.

URLs aren’t just for web pages

We’re all use to using URLs to point at web pages but we too often forget that they can be use for other things too. They can address any resource and that includes: people, documents, images, services (e.g., “today’s weather report for London”), TV or Radio Programmes in fact any abstract concept or entity that can be identified, named and addressed.

Also, because these resources can have representations which can be processed by machines (through the use of RDF, Microformats, RDFa, etc.), you can do interesting things with that information. Some of the most interesting things you can do happen when URLs identify people.

Currently people are normally identified within web apps by their email address. I guess this sort of makes sense because email addresses are unique, just about everyone has one and it means the website can contact you. But URLs are better. URLs are better because they offer the right affordance.

If you have someone’s URL then you can go to that URL and find out stuff about that person – you can assess their provenience (by reading what they’ve said about themselves, by seeing who’s in their social network via tools such as XFN, FOAF and Google’s Social Graph API), you can also discover how to contact them (or ask permission to do so).

With e-mails the affordance is all the wrong way round – if I have your email address I can send you stuff, but I can’t check to see who you are, or even if it is really you. Email addresses are for contacting people they aren’t identifiers; by conflating the two we’ve gots ourselves into trouble because email addresses aren’t very good at identifying people nor can they be shared publicly without exposing folk to spam and the like.

This is in essence the key advantage offered by OpenID which uses URLs to provide digital identifiers for people. If we then add OAuth into the mix we can do all sorts of clear things.

The OAuth protocol can be used to authenticate any request for information (for example sending the person a message), the owner of the URL/OpenID decides whether or not to grant you that privilege. This means that it doesn’t matter if someone gets hold of an URL identifier – unless the owner grants permission (on a per instance basis) they are useless – this is in contrast to what happens with Email identifiers – once I have it I can use it to contact you whether you like it or not.

Also because I can give any service a list of my friend’s URLs without worrying that their contact details will get stolen I can tip up at any web service and find which of my friends are using it without having to share their contact details. In other words by using URLs to identify people I can share my online relationships without sharing or porting my or my friend’s contact data.

You retain control over your data, but we share the relationships (the edges) within our social graph. And that’s the way it should be, after all that all it needs to be. If I have your URL I can find whatever information (email, home phone number, current location, bank details) you decide you want to make public and I can ask you nicely for more if I need it – using OAuth you can give me permission and revoke it if you want.

Photo: Point!, by a2gemma. Used under licence.