Interesting stuff from around the web 2009-02-04

Hippos are more closely related to their whale cousins than they (hippos) are to anything else
Hippos are more closely related to their whale cousins than they (hippos) are to anything else

Tree of Life – evolution interactive – Darwin 200 – Wellcome Trust
Want to know the concestor of two species then this is for you. And they have obviously spent time on the visual and interaction design and it’s great they have released it under a Creative Commons license. But, but because they haven’t provided URLs for each of the taxa it’s lost to the web, which is such a shame.

Google Latitude – see where you friends are in realtime [Google]
A service for sharing (primarily via your mobile phone) your location with friends and family and as such it’s similar to BrightKite and FireEagle. If Google integrate this into existing services, that is it becomes a service sat behind Google search and maps, then this could be a bit of a killer if only because that’s where people’s attention is. That said FireEagle is a generative location exchanging service.

How Twitter Was Born [140 Characters]
Interesting read about the birth and early days of Twitter.

Visualising our SVN commit history [whomwah]
Deeply cool.

Listen to Yourself [xkcd]
YouTube comments are a mess — this could the be answer, so might making the site about people and their videos rather than videos with some comments.

Permanent web IDs or making good web 2.0 citizens

These are the slides for a presentation I gave a little while ago in Broadcasting House at a gathering of radio types – both BBC and commercial radio – as part of James Cridland’s mission to “agree on technology, compete on content“.

The presentation is based on the thinking outlined in my previous post: web design 2.0 it’s all about the resource and its URL.

Media companies should embrace the generative nature of the web

Generativity, the ability to remix different pieces of the web or deploy new code without gatekeepers (so that anyone can repurpose, remix or reuse the original content or service for a different purpose) is going to be at the heart of successful media companies.

Depth of field (Per Foreby)

As Jonathan Zittrain points out in The Future of the Internet (and how to stop it) the web’s success is largely because it is a generative platform.

The Internet is also a generative system to its very core as is each and every layer built upon this core. This means that anyone can build upon the work of those that went before them – this is why the Internet architecture, to this day, is still delivering decentralized innovation.

This is true at a technological level, for example, XMPP, OAuth and OpenID are all technologies that have been invented because the technology layers upon which they are built are open, adaptable and easy for others to reuse and master. It is also true at the content level – Wikipedia is only possible because it is built as a true web citizen, likewise blogging platforms and services such as MusicBrainz – these services allow anyone to create or modify content without the need for strict rules and controls.

But what has this got to do with the success or otherwise of any media company or any content publisher? After all just because the underlying technology stack is generative doesn’t mean that what you build must be generative. There are, after all, plenty of successful walled gardens and tethered appliances out there. The answer, in part, depends on what you believe the future of the Web will look like.

Tim Berners-Lee presents a pretty compelling view in his article on The Giant Global Graph. In it he explains how the evolution of the Internet has seen a move from a network of computers, through the Internet, to a  web of documents and we are now seeing a migration to a ‘web of concepts’.

[The Internet] made life simpler and more powerful. It made it simpler because of having to navigate phone lines from one computer to the next, you could write programs as though the net were just one big cloud, where messages went in at your computer and came out at the destination one. The realization was, “It isn’t the cables, it is the computers which are interesting”. The Net was designed to allow the computers to be seen without having
to see the cables. […]

The WWW increases the power we have as users again. The realization was “It isn’t the computers, but the documents which are interesting”. Now you could browse around a sea of documents without having to worry about which computer they were stored on. Simpler, more powerful. Obvious, really. […]

Now, people are making another mental move. There is realization now, “It’s not the documents, it is the things they are about which are important”. Obvious, really.

If you believe this, if you believe that there is a move from a web of documents to concepts, then you can start to see why media companies will need to start to publish data the right way. Publishing it so that they, and others, can help people find the things they are interested in. How does this happen then? For starters we need a mechanism by which we can identify things and identify the relationship between them – at a level above that of the document. And that’s just what the semantic web technologies are for – they allow different organisations a common way of describing the relationship between things. For example, the Programmes Ontology allows any media company to describe the nature of a programme; the music ontology any artist, release or label.

This implies a couple of different, but related things, firstly it highlights the importance of links. Links are an expression of a person’s interests. I choose what to link to from this blog – which words, which subjects to link from and where to – my choice of links provide you with a view onto how I view the subject beyond what I write here. The links give you insight into who I trust and what I read. And of course it allows others to aggregate my content around those subjects.

It also implies that we need a common way of doing things. A way of doing things that allows others to build with, on top of, the original publishers content. This isn’t about giving up your rights over your content, rather it is about letting it be connected to content from peer sites. It is about joining contextually relevant information from other sites, other applications. As Tim Berners-Lee points out this is similar to the transition we had to make in going from interconnected computers to the Web.

People running Internet systems had to let their computer be used for forwarding other people’s packets, and connecting new applications they had no control over. People making web sites sometimes tried to legally prevent others from linking into the site, as they wanted complete control of the user experience, and they would not link out as they did not want people to escape. Until after a few months they realized how the web works. And the re-use kicked in. And the payoff started blowing people’s minds.

Because the Internet is a generative system it means it has a different philosophy from most other data discovery systems and APIs (including some that are built with Internet technologies), as Ed Summers explains:

…which all differ in their implementation details and require you to digest their API documentation before you can do anything useful. Contrast this with the Web of Data which uses the ubiquitous technologies of URIs and HTTP plus the secret sauce of the RDF triple.

They also often require the owner of the service or API to give permission for third parties to use those services, often mediated via API keys. This is bad, had the Web or the Internet before that adopted a similar approach, rather than the generative approach it did take, we would not have seen the level of innovation we have; and as a result we would not have had the financial, social and political benefits we have derived from it.

Of course there are plenty of examples of where people have been able to work with the web of documents – everything from 800lb gorilla’s like Google through to sites like After Our Time and Speechification – both provide users with a new and distinctive service while also helping to drive traffic and raise brand awareness to the BBC. Just think what would also be possible if transcripts, permanent audio, and research notes where also made available not only as HTML but also as RDF joining content inside and outside the BBC to create a system which, in Zittrain words, provides “a system’s capacity to produce unanticipated change through unfiltered contributions from broad and varied audiences.”

The mobile computing cloud needs OAuth

As Paul Miller notes Cloud Computing is everywhere – we are pushing more and more data and services into the cloud. Particularly when accessed from mobile devices this creates an incredibly powerful and useful user experience. I love it. The way that I can access all sorts of services from my iPhone means that an already wonderful appliance becomes way more powerful. But not all is well in the land of mobile-cloud computing; a nasty anti-pattern is developing. Thankfully there is a solution and it’s OAuth.

"Mobile phone Zombies" by Edward B. Used under licence.
"Mobile phone Zombies" by Edward B. Used under licence.

So what’s the problem then? Since Apple opened up the iPhone to third party developers we have seen a heap of applications that connect you to your online services – there are apps that let you upload photos to Flickr, post to Twitter, see what’s going on in Facebook land all sorts of stuff. The problem is the way some of them are gaining access to these services by making you enter your credentials in the applications rather than seeking to authorise the application from the service.

Probably the best way to explain what I mean is to look at how it should work. The Pownce app is an example of doing it right as is Mobile Foto – these applications rely on OAuth. This is how it works: rather than entering your user-name and password in the application you are sent over to Safari to log into the website and from there you authorise (via OAuth) the application to do its thing.

This might not sound so great – you could argue that the user experience would be better if you were kept within the application. But that would mean that your login credentials would need to be stored on your ‘phone, and that means that you need to disclose those credentials to a third party (the folks that wrote the app).

By using OAuth you log into Flickr, Pownce etc. and from there authorise the application to user the site – your credentials are kept safe and if your iPhone gets stolen you can visit the site and disable access. Everything is where it should be and that means your login details are safe.

To be fair to the iPhone app developers this type of delegated authorisation isn’t always possible. Twitter, for example, still hasn’t implement OAuth and as a result if you want to use one of the growing number of iPhone Twitter app you need to give up your user-name and password. I find this incredible frustrating – especially from a service like Twitter where (according to Biz Stone, Twitter’s co-founder) “the API… has easily 10 times more traffic than the website“.

RA DIOHEA_D / HOU SE OF_C ARDS

Radiohead are miles ahead of the pack when it comes to content innovation. First off when they released their seventh album, In Rainbows, they let customers chose their own price which according to Thom Yorke, outstripped the combined profits from digital downloads of all of the band’s other studio albums. Then with their single Nude folk were given the opportunity to remix the single. And now their new video for ‘House of Cards‘ has been made without any cameras, just lasers and data. And best of all they are giving you the chance to play with the data, under a license that allows remixing.

More details are available at Google Code http://code.google.com/creative/radiohead/ which explains that:

No cameras or lights were used. Instead two technologies were used to capture 3D images: Geometric Informatics and Velodyne LIDAR. Geometric Informatics scanning systems produce structured light to capture 3D images at close proximity, while a Velodyne Lidar system that uses multiple lasers is used to capture large environments such as landscapes. In this video, 64 lasers rotating and shooting in a 360 degree radius 900 times per minute produced all the exterior scenes.

The site also includes a short documentary showing how the video was made and the 3D plotting technologies behind it.

A 3D viewer to explore the data visualization and best of all the the data with instructions on how to create your own visualizations. All very cool.

The URL shortening anti pattern

Along with others I’ve recently started to grok Twitter – it took a while – but I now find it a fantastic way to keep in touch with folk that I know or respect, or catch up on snippets of info from news services around the web. It’s great.

What makes Twitter particularly useful, as a way of keeping in touch with a large number of people, is the limit of 140 characters per ‘tweet’. That’s it, each tweet is 140 character or less. But what this also means is that if you tweet about a URL that URL eats up a lot of those 140 character. To help solve this problem Twitter uses TinyURL to shorten the URL. This is a solution to the problem but unfortunately it also creates a new one.

Example of poor url design

URLs are important. They are at the very heart of the idea behind Linked Data, the semantic web and Web 2.0 because if you can’t point to a resource on the web then it might as well not exist and this means URLs need to be persistent. But URLs are important because they also tell you about the provenance of the resource and that helps you decide how important or trustworthy a resource is likely to be.

URL shortening service such as TinyURL or RURL are very bad news because they break the web. They don’t provide stable references because they are Single Points of Failure acting as they do as another single level of indirection. URL shortening services then are an anti pattern:

In computer science, anti-patterns are specific repeated practices that appear initially to be beneficial, but ultimately result in bad consequences that outweigh the hoped-for advantages.

URL shortening services create opaque URLs – the ultimate destination of the URL is hidden form the user and software. This might not sound such a big deal – but it does mean that it’s easier to send people to spam or malware sites (which is why Qurl and jtty closed – breaking all their shortened URLs in the process). And that highlights the real problem – they introduce a dependency on a third-party that might go belly up. If that third-party closes down all the URLs using that service break, and because they are opaque you’ve no idea where there originally pointed.

And even if the service doesn’t shut down there would be nothing you could do if that service decided to censor content. For example the Chinese Communist Party might demand that TinyURL remap all the URLs it decided were inappropriate to state propaganda pages. You couldn’t stop them.

But of course we don’t need to evoke such Machiavellian scenarios to still have a problem. URL shortening services have a finite number of available URLs. Some shortening services like RURL use 3 character (e.g. http://rurl.org/lbt), this means these more aggressive RUL shortening services have about 250,000 possible unique three-character short URLs, once they’ve all been used they either need to add more characters to their URLs or start to recycle old one. And once you’ve started to recycle old URLs your karma really will suffer (TinyURL uses 6 characters so this problem will take a lot longer to materialise!)

There is an argument that on services such as Twitter the permanence of the URL isn’t such an issue – after all the whole point of Twitter is to provide a transitory, short lived announcement – Twitter isn’t intended to provide an archive. And the fact that the provenance of the URL is obfuscated maybe doesn’t matter too much either, since you know who posted the link. All that’s true, but it still causes a problem when TinyURL goes down, as it did last November and it also reinforces the anti-pattern and that is bad.

Bottom line, URLs should remain naked, providing this level of indirection is just wrong. The Internet isn’t supposed to work via such intermediate services; the Internet was designed to ensure there wasn’t a single single point of failure that can so easily break such large parts of the web.

Of course simply saying don’t use these URL shortening services isn’t going to work. Especially when using services such as Twitter, where there is a need for short URLs. However, what it does mean is that if you’re designing a website you need to think about URL design and that includes the length of the URL. And if you’re linking to something on a forum, wiki, blog or anything that has permenance please don’t shorten the URL, keep them naked. Thanks.

Photo: Example of poor URL design, by Frank Farm. Used under licence.

New prettier /programmes

Over the last few weeks the /programme’s team, but mainly Jamie, have been working to bring the site inline with the BBC’s new online visual language (as rolled at on the BBC’s home page). Today we’ve launched the upgrade.

programmes.png

In addition to the new shinier, wider UI we’ve also linked up the schedules which should be helpful. And fixed and tidied up a bunch of things under the hood which has improved the applications performance.

Episode Page

Hope you like it!