Lego, Wombles and Linked Data

As a child I loved Lego. I could let my imagination run riot, design and build cars, space stations, castles and airplanes.

Blue lego brick

My brother didn’t like Lego, instead preferring to play with Action Men and toy cars. These sorts of toys did nothing for me, and from the perspective of an adult I can understand why. I couldn’t modify them, I couldn’t create anything new. Perhaps I didn’t have a good enough imagination because I needed to make my ideas real. I wanted to build things, I still do.

Then the most exciting thing happened. My dad bought a BBC micro.

Obviously computers such as the BBC Micro were in many, many ways different from today’s Macs and if you must PCs. Obviously they were several orders of magnitude less powerful than today’s computers but, and importantly, they were designed to be programmed by the user, you were encouraged to do so. It was expected that that’s what you would do. So from a certain perspective they were more powerful.

BBC Micro’s didn’t come preloaded with word processors, spreadsheets and graphics editors and they certainly weren’t WIMPs.

What they did come with was BBC BASIC and Assembly Language.

They also came with two thick manuals. One telling you how to set the computer up; the other how to programme it.

This was all very exciting, I suddenly had something with which I could build incredibly complex things. I could, in theory at least, build something that was more complex than the planes, spaceships and cars which I modelled with Lego a few years before.

Like so many children of my age I cut my computing teeth on the BBC Micro. Learnt to programme computers, and played a lot of games!

Unfortunately all was not well. You see I wasn’t very good at programming my BBC micro. I could never actually build the things I had pictured in my mind’s eye, I just wasn’t talented enough.

You see Lego hit a sweet spot which those early computers on the one hand and Action Man on the other missed.

What Lego provided was reusable bits.

When Christmas or my birthdays came around I would start off by building everything suggested by the sets I was given. But I would then dismantle the models and reuse those bricks to build something new, whatever was in my head. By reusing bricks from lots of different sets I could build different models. The more sets I got given, the more things I could build.

Action men simply didn’t offer any of those opportunities, I couldn’t create anything new.

Early computers where certainly very capable of providing a creative platform; but they lacked the reusable bricks, it was more like being given an infinite supply of clay. And clay is harder to reuse than bricks.

Today, with the online world we are in a similar place but with digital bits and bytes rather than moulded plastic bits and bricks.

The Web allows people to create their own stories – it allows people to follow their nose to create threads through the information about the things that interest them, commenting, and discussing it on the way. But the Web also allows developers to reuse previously published information within new, different context to tell new stories.

But only if we build it right.

Most Lego bricks are designed to allow you to stick one brick to another. But not all bricks can be stuck to all others. Some can only be put at the top – these are the tiles and pointy bricks to build your spires, turrets and roofs. These bricks are important, but they can only be used at the end because you can’t build on top of them.

The same is true of the Web – we need to start by building the reusable bits, then the walls and only then the towers and spires and twiddly bits.

But this can be difficult – the shinny towers are seductive and the draw to start with the shiny towers can be strong; only to find out that you then need to knock it down and start again when you want to reuse the bits inside.

We often don’t give ourselves the best opportunity to womble with what we’ve got – to reuse what others make, to reuse what we make ourselves. Or to let others outside our organisations build with our stuff. If you want to take these opportunities then publish your data the webby way.

What does the history of the web tell us about its future?

Following my invitation to speak at the WWW@20 celebrations [my bit starts about 133 minutes into the video] – this is my attempt to squash the most interesting bits into a somewhat coherent 15 minute presentation.

20 years ago Tim Berners-Lee was working, as a computer scientist, at CERN. What he noticed was that, much like the rest of the world, sharing information between research groups was incredibly difficult. Everyone had their own document management solution, running on their own flavour of hardware over different protocols.  His solution to the problem was a lightweight method of linking up existing (and new) stuff over IP – a hypertext solution – which he dubbed the World Wide Web – and documented in a memo “Information Management: A Proposal“.

Then for a year or so nothing happened. Nothing happened for a number of reasons, including the fact that IP, and the ARPANET before that, was popular in America but less so in Europe. Indeed senior managers at CERN had recently sent out a memo to all department heads reminding them that IP wasn’t a supported protocol – people were being told not to use it!

Also because CERN was full of engineers everyone thought they could build their own solution, do better than what was already there – no one wanted to play together. And of course because CERN was there to do particle physics not information management.

Then TimBL got his hands on a NeXT Cube – officially he was evaluating the machine not building a web server – but, with the support of his manager, that’s what he did — build the first web server and client. There then ensued a period of negotiation to get the idea out freely, for everyone to use, which happened in 1993. This coincided, more or less, with the University of Minnesota’s decision to charge a license fee for Gopher. Then the web took off especially in the US where IP was already popular.

first web server
The Worlds first Webserver.

The beauty of TimBL’s proposal was it’s simplicity – it was designed to work on any platform and importantly with the existing technology. The team knew that to make it work it had to be as easy as possible. He only wanted people to do one thing, that one thing was to give their resources identifiers – links – URIs; so information could be linked and discovered.

This is then is the key invention – the URL.

To make this work URLs were designed to work with existing protocols, in particular it needed to work with FTP and Gopher. That’s why there’s a colon in the URL — so that URLs can be given for stuff that’s already available via other protocols. As an aside, TimBL’s said his biggest mistake was the inclusion of // in the URL — the idea was that one slash meant the resource is on the local machine and two somewhere else on the web, but because everyone used http://foo.bar it means the second / is redundant. I love that this is TimBL’s biggest mistake.

He also implemented a quick tactical solution to get things up and running and demonstrate what he was talking about — HTML. HTML was originally just one of a number of supported doctypes – it wasn’t intended to be the doctype but HTML took off because it was easy. Apparently the plan was to implement a mark-up language that worked a bit like the NeXT application builder. But they didn’t get round to it before Mosaic came along with the first browser (TimBL’s first client was a browser-editor) and then it was all too late. And we’ve been left with something so ugly I doubt even it’s parents love it.

The curious thing, however, is that if you read the original memo — despite its simplicity — it’s clear that we’re still implementing it, we’re still working on the the original spec. Its just that we’ve tended to forget what it said or decided to get sidetracked for a while with some other stuff. So forget about Web 2.0.

For example, the original Web was read-write. Not only that but it used style sheets and a WYSIWYG editing interface — no tags, no mark-up. They didn’t think anyone would want to edit the raw mark-up.

The first web site was read and write
The first web site was read and write

You can also see that the URL’s hidden, you get to it via a property dialog.

This is because the whole point of the web is that it provides a level of abstraction, allowing you to forget about the infrastructure, the servers and the routing. You only needed to worry about the document. For those who remember the film War Games — you will remember that they had to ‘phone up individual computers — they needed this networking information to access the computer, they needed to know its location before they could use it. The beauty of the Web and the URL is that the location shouldn’t matter to the end user.

URIs are there to provide persistent identifiers across the web — they’re not a function of ownership, branding, look and feel, platform or anything else for that matter.

The original team described CERN’s IT ecosystem as a zoo because there were so many different flavours of hardware, different operating systems and protocols in use. The purpose of the web was to be ubiquitous, to work on any machine, open to everyone. It was designed to work no matter what machine or operating system you’re running. This is, of course, achieved by having one identifier, one HTTP URI and defererence that to the appropriate document based on the capacities of that machine.

We should be adopting the same approach today when it comes to delivery to mobile, IPTV, connected devices etc. — we should have one URI for a resource and allow the client to request the document it needs. As Tim intended. The technology is there to do this — we just don’t using it very often.

The original memo also talked about linking people, documents, things and concepts, and data. But we are only now getting around to building it. Through technologies such as OpenID and FOAF we can give people identifiers on the web and describe their social graph, the relationships between those people. And through RDF we can publish information so that machines can process it, describing the nature of and the relationship between the different nodes of data.

Information Management: A Proposal
Information Management: A Proposal by Tim Berners-Lee

The original memo described, and the original server supported, link typing so that you could describe not only real word things but also the nature of the relationship between those things. Like RDF and HTML 5 now does, 20 years later. This focus on data is all a good idea because it lets you treat the web like a giant database. Making computers human literate by linking up bits of data so that the tools, devices and apps connected to the web can do more of the work for you, making it easier to find the things that interest you.

The semantic web project – and TimBL’s original memo – is all about helping people access data in a standard fashion so that we can add another level of abstraction – letting people focus on the things that matter to them. This is what, I believe, we should be striving for for the web’s future because I agree with Dan Brickley, to understand the future of the web you first need to understand it’s origins.

Don’t think about HTML documents – think about the things and concepts that matter to people and give each it’s own identifier, it’s own URI and then put in place the technology to dereference that URI to the document appropriate to the device. Whether that be a desktop PC, a mobile device, an IPTV or third party app.

Google Chrome why?

The Internet is all a buzz with Google’s open source web browser Chrome. But you have to ask why and even if it’s a big deal. Not why there’s all the interest but why Google bothered to build their own browser? After all they could have worked with Mozilla to add these features to Firefox – instead Google went and built their own browser.

Introducing Google Chrome
Introducing Google Chrome

So clearly I don’t know, but I wonder whether Google just got a bit fed up of waiting for the features they wanted and went ahead and built their own browser, while leaving the door open to merge these features back into Firefox at a later date. Google are a big supporter of Firefox and the idea of a Google browser has been associated with Firefox in the past; and Sergey Brin has said he is keen to see Firefox and Chrome become more unified in the future.

It is probably worth noting that they (Mozilla Corp) are across the street and they come over here for lunch,” Brin said of Mozzilla employees visits to cafeterias at the Googleplex headquarters. “I hope we will have more and more unity over time”.

But what features are important to Google? After all, as Jon Hicks points out, from an interface point of view, Chrome brings nothing new – all the features are already available in existing browsers. But I don’t think that’s the point and I don’t think that’s why it’s important. Google want to offer much richer and, more importantly, faster web applications.

The current browsers, including Firefox, just can’t cut it. JavaScript isn’t fast enough (thereby limiting the UX), browsers are single threaded and they aren’t stable enough. If Google want to challenge Microsoft (or anyone else for that matter) in the desktop space they needed a better platform. Of course others have sought to solve the same problem – notably Adobe with Air and Microsoft with Silverlight. Google’s solution is I think much neater – build an open source browser that supports multithreading, fast JavaScript execution and stuff Google Gears into the back end so it works offline. Joel Spolsky suggested something similar a while back:

So if history repeats itself, we can expect some standardization of Ajax user interfaces to happen in the same way we got Microsoft Windows. Somebody is going to write a compelling SDK that you can use to make powerful Ajax applications with common user interface elements that work together. And whichever SDK wins the most developer mindshare will have the same kind of competitive stronghold as Microsoft had with their Windows API

Imagine, for example, that you’re Google with GMail, and you’re feeling rather smug. But then somebody you’ve never heard of, some bratty Y Combinator startup, maybe, is gaining ridiculous traction selling NewSDK, which combines a great portable programming language that compiles to JavaScript, and even better, a huge Ajaxy library that includes all kinds of clever interop features. Not just cut ‘n’ paste: cool mashup features like synchronization and single-point identity management (so you don’t have to tell Facebook and Twitter what you’re doing, you can just enter it in one place). And you laugh at them, for their NewSDK is a honking 232 megabytes … 232 megabytes! … of JavaScript, and it takes 76 seconds to load a page. And your app, GMail, doesn’t lose any customers.

But then, while you’re sitting on your googlechair in the googleplex sipping googleccinos and feeling smuggy smug smug smug, new versions of the browsers come out that support cached, compiled JavaScript. And suddenly NewSDK is really fast. And Paul Graham gives them another 6000 boxes of instant noodles to eat, so they stay in business another three years perfecting things.

Of course the big difference is that it’s Google that have gone and launched the new browser that supports cached, compiled JavaScript.

With the release of Chrome, Google can now release versions of their apps that are richer and more responsive. Chrome then isn’t targeted at Firefox I think that Chrome is more of a threat to Silverlight and Air. After all if you can write a web app in JavaScript that’s just as rich and responsive as anything you can write in Silver-Air why would you bother with the proprietary approach?

Chrome is in effect a way to deliver a Google OS to your desktop, one that lets you run fast JavaScript applications. And if you believe Sergey Brin Firefox will, in time, adopt the same technologies as Chrome; which is of course just what Google want – maximum market penetration of those browsers that support their new rich web apps.

Interesting stuff for 2008-08-09

"Dawkins and Darwin" by Kaptain Kobold. Used under licence.
"Dawkins and Darwin" by Kaptain Kobold. Used under licence.

Darwin’s theory of evolution was simple, beautiful, majestic and awe-inspiring [Charlie Brooker – The Guardian]
“But because it contradicts the babblings of a bunch of made-up old books, it’s been under attack since day one. Had the Bible claimed gravity is caused by God pulling objects toward the ground with magic threads, we’d still be debating Newton with idiots.”

The Bible and the Quran agree: Insects have four legs [Dwindling In Unbelief]
Why do people belief this stuff?

BioNumbers – The Database of Useful Biological Numbers
If you’re interested in the number of prokaryotes in cattle rumen world-wide. If not this site might not be for you. :)

On a different note some interesting tech stuff

Open Source implementation of Yahoo! Pipes with added semweb goodness [deri.org]
Inspired by Yahoo’s Pipes, DERI Web Data Pipes implement a generalization which can also deal with formats such as RDF (RDFa), Microformats and generic XML.

ActiveRDF – a library for accessing RDF data from Ruby [activerdf.org]
A library for accessing RDF data from Ruby programs. It can be used as a data layer in Ruby-on-Rails, similar to ActiveRecord (which provides an O/R mapping to relational databases).

Load Balancing & QoS with HAProxy [igvita.com]
The worst thing you can do is queue up another request behind an already long running process. To mitigate the problem HAProxy goes beyond a simple round-robin scheduler, and implements a very handy feature: intelligent request queuing!

RA DIOHEA_D / HOU SE OF_C ARDS

Radiohead are miles ahead of the pack when it comes to content innovation. First off when they released their seventh album, In Rainbows, they let customers chose their own price which according to Thom Yorke, outstripped the combined profits from digital downloads of all of the band’s other studio albums. Then with their single Nude folk were given the opportunity to remix the single. And now their new video for ‘House of Cards‘ has been made without any cameras, just lasers and data. And best of all they are giving you the chance to play with the data, under a license that allows remixing.

More details are available at Google Code http://code.google.com/creative/radiohead/ which explains that:

No cameras or lights were used. Instead two technologies were used to capture 3D images: Geometric Informatics and Velodyne LIDAR. Geometric Informatics scanning systems produce structured light to capture 3D images at close proximity, while a Velodyne Lidar system that uses multiple lasers is used to capture large environments such as landscapes. In this video, 64 lasers rotating and shooting in a 360 degree radius 900 times per minute produced all the exterior scenes.

The site also includes a short documentary showing how the video was made and the 3D plotting technologies behind it.

A 3D viewer to explore the data visualization and best of all the the data with instructions on how to create your own visualizations. All very cool.

URLs aren’t just for web pages

We’re all use to using URLs to point at web pages but we too often forget that they can be use for other things too. They can address any resource and that includes: people, documents, images, services (e.g., “today’s weather report for London”), TV or Radio Programmes in fact any abstract concept or entity that can be identified, named and addressed.

Also, because these resources can have representations which can be processed by machines (through the use of RDF, Microformats, RDFa, etc.), you can do interesting things with that information. Some of the most interesting things you can do happen when URLs identify people.

Currently people are normally identified within web apps by their email address. I guess this sort of makes sense because email addresses are unique, just about everyone has one and it means the website can contact you. But URLs are better. URLs are better because they offer the right affordance.

If you have someone’s URL then you can go to that URL and find out stuff about that person – you can assess their provenience (by reading what they’ve said about themselves, by seeing who’s in their social network via tools such as XFN, FOAF and Google’s Social Graph API), you can also discover how to contact them (or ask permission to do so).

With e-mails the affordance is all the wrong way round – if I have your email address I can send you stuff, but I can’t check to see who you are, or even if it is really you. Email addresses are for contacting people they aren’t identifiers; by conflating the two we’ve gots ourselves into trouble because email addresses aren’t very good at identifying people nor can they be shared publicly without exposing folk to spam and the like.

This is in essence the key advantage offered by OpenID which uses URLs to provide digital identifiers for people. If we then add OAuth into the mix we can do all sorts of clear things.

The OAuth protocol can be used to authenticate any request for information (for example sending the person a message), the owner of the URL/OpenID decides whether or not to grant you that privilege. This means that it doesn’t matter if someone gets hold of an URL identifier – unless the owner grants permission (on a per instance basis) they are useless – this is in contrast to what happens with Email identifiers – once I have it I can use it to contact you whether you like it or not.

Also because I can give any service a list of my friend’s URLs without worrying that their contact details will get stolen I can tip up at any web service and find which of my friends are using it without having to share their contact details. In other words by using URLs to identify people I can share my online relationships without sharing or porting my or my friend’s contact data.

You retain control over your data, but we share the relationships (the edges) within our social graph. And that’s the way it should be, after all that all it needs to be. If I have your URL I can find whatever information (email, home phone number, current location, bank details) you decide you want to make public and I can ask you nicely for more if I need it – using OAuth you can give me permission and revoke it if you want.

Photo: Point!, by a2gemma. Used under licence.