URL shortening it’s nasty but it’s also unnecessary

URL shortening is just wrong and it’s not just me that thinks so Joshua Schachter thinks so too and Simon Willison has a partial solution. The reason various folk are worried about URL shortening and think that it’s largely evil is because it breaks the web.

"The weakest link" by Darwin Bell. Some rights reserved.
"The weakest link" by Darwin Bell. Some rights reserved.

URLs need to be persistent and that’s not so likely when you use these services. But the ever increasing popularity of Twitter, who impose a 140 character limit on tweets, means that more and more URLs are getting shortened. The ridiculous thing is it isn’t even necessary.

In addition to the rev=”canonical” fix that Kellan proposed Michael has also recently come across longurl.org which

…could solve at least some of these problems. It provides a service to expand short urls from many, many providers into long urls

That’s cool because:

it caches the expansion so has a persistent store of short <> long mappings. They plan to expose these mappings on the web which would also solve [reliance on 3rd party – if they go out of business links break]

Of course what would be extra cool would be if, in addition to the source code being open sourced, so was the underlying database. That way if anything happened to longurl.org someone else could resurrect the service.

All good stuff. But the really ironic thing is that none of this should be neccessary. The ‘in 140 characters or less’ thing isn’t true. As Michael points out:

if i write a tweet to the 140 limit that includes a link then <a href=”whatever”>whatever</a> will be added to the message. so whilst the visible part of the message is limited to 140 chars the message source isn’t. There’s no reason twitter couldn’t use the long url in the href whilst keeping the short url as the link text…

All Twitter really needs to do is provide their own shortening service – if you enter anything that starts “http://&#8221; it gets shortened in the visable message. Of course it doesn’t really need to actually provide a unique, hashed URL, it could convert the anchor text to “link” or the first few letters of the title of the target page while retaining the full-fat, canonical URL in the href.

Recessions silver lining is innovation

Following the dot.com boom of the late 1990’s, when anyone and everybody who could code worked all hours to realise their ideas, there was a collapse and loads of people were unemployed. Prior to the collapse some people made loads of money, of course some of them then went and lost it and some people just worked long hours for no real benefit. But that’s not really the point, the point is that after the dot.com bubble burst we saw the emergence of new tech companies that layed the foundation for the whole web 2.0 thing.

"That was supposed to be going up wasn't it?" by rednuht. Some rights reserved.
"That was supposed to be going up wasn't it?" by rednuht. Some rights reserved.

During the late ’90s everyone was  busy, busy, busy doing stuff for paying clients and certainly in the early days there was some genuine innovation. But there was also a lot of dross and all those client demands meant we didn’t always have the time to play with the medium and try out new ideas. But following the collapse in 2001 people were suddenly able to explore the medium, develop and innovate new ideas. That’s not to say that there weren’t economic pressures on those development teams — in many ways the pressures where more acute because there wasn’t a VC buffering your cash flow. And as one of those companies put it you needed to get real:

Getting Real is about skipping all the stuff that represents real (charts, graphs, boxes, arrows, schematics, wireframes, etc.) and actually building the real thing. […]

Getting Real delivers just what customers need and eliminates anything they don’t.

But the lack of other people’s deadlines and the massive number of unemployed geeks playing with stuff and working on what interested them, rather that what was required for the next deadline gave us an amazing range of technologies, including: blogging as we now know it, a rather popular photosharing site, the first social bookmarking site utilising a new design pattern a new MVC framework using an obscure language. Indeed Ruby was itself created during Japan’s recession in the ’90s. And on a much smaller scale Dom Sagolla’s recent post about how Twitter was born shows how similar forces within a company created similar results.

It seems that when we have unemployment among geeks we see true innovation, genuinely new ideas coming to market. During the good times when employment is high we tend to see a raft of “me-toos” commissioned by people that too often don’t really understand the medium, and aren’t motivated to come up with the new ideas instead tending to focus how to make the existing ideas bigger, better, faster because that’s lower risk and for sure this period has it’s advantages, unfortunatley it seems innovation isn’t one of them.

The current economic depression is clearly bad news, potentially very bad news indeed, but what it might mean is we’re in for another period of innovation as more and more geeks find themselves unemployed and starting setting up on their own.

Identity, relationships and why OAuth and OpenID matter

Twitter hasn’t had a good start to 2009, it was hacked via a phishing scam and then there were concerns that your passwords were up for sale and that’s not a good thing; except there may be a silver lining to Twitter’s cloud because it has also reopened the password anti-pattern debate and the use of OAuth as a solution to the problem. Indeed it does now looks like Twitter will be implementing OAuth as a result. W00t!

touch by Meredith Farmer (Flickr). Some rights reserved.
Day 68 :: touch by Meredith Farmer (Flickr). Some rights reserved.

However, while it is great news that Twitter will be implementing OAuth soon, they haven’t yet and there are plenty of other services that don’t use it, it’s therefore worth pausing for a moment to consider how we’ve got here and what the issues are, because while it will be great — right now — it’s a bit rubbish.

We shouldn’t assume that either Twitter or the developers responsible for the third-party apps (those requesting your credentials) are trying to do anything malicious — far from it — as Chris Messina explains:

The difference between run-of-the-mill phishing and password anti-pattern cases is intent. Most third parties implement the anti-pattern out of necessity, in order to provide an enhanced service. The vast majority don’t do it to be malicious or because they intend to abuse their customers — quite the contrary! However, by accepting and storing customer credentials, these third parties are putting themselves in a potentially untenable situation: servers get hacked, data leaks and sometimes companies — along with their assets — are sold off with untold consequences for the integrity — or safety — of the original customer data.

The folks at Twitter are very aware of the risks associated with their users giving out usernames and passwords. But they also have concerns about the fix:

The downside is that OAuth suffers from many of the frustrating user experience issues and phishing scenarios that OpenID does. The workflow of opening an application, being bounced to your browser, having to login to twitter.com, approving the application, and then bouncing back is going to be lost on many novice users, or used as a means to phish them. Hopefully in time users will be educated, particularly as OAuth becomes the standard way to do API authentication.

Another downside is that OAuth is a hassle for developers. BasicAuth couldn’t be simpler (heck, it’s got “basic” in the name). OAuth requires a new set of tools. Those tools are currently semi-mature, but again, with time I’m confident they’ll improve. In the meantime, OAuth will greatly increase the barrier to entry for the Twitter API, something I’m not thrilled about.

Alex also points out that OAuth isn’t a magic bullet.

It also doesn’t change the fact that someone could sell OAuth tokens, although OAuth makes it easier to revoke credentials for a single application or site, rather than changing your password, which revokes credentials to all applications.

This doesn’t even begin to address the phishing threat that OAuth encourages – its own “anti-pattern”. Anyone confused about this would do well to read Lachlan Hardy’s blog post about this from earlier in 2008: http://log.lachstock.com.au/past/2008/4/1/phishing -fools/.

All these are valid points — and Ben Ward has written an excellent post discussing the UX issues and options associated with OAuth — but it also misses something very important. You can’t store someone’s identity without having a relationship.

Digital identities exist to enable human experiences online and if you store someone’s Identity you have a relationship. So when you force third party apps into collecting usernames, passwords (and any other snippet of someone’s Identity) it forces those users into having a relationship with that company — whether the individual or the company wants it. If you store someones identity you have a relationship with them. 

With technology we tend not to enable trust in the way most people use the term. Trust is based on relationships. In close relationships we make frequent, accurate observations that lead to a better understanding and close relationships, this process however, requires investment and commitment. That said a useful, good relationship provides value for all parties. Jamie Lewis has suggested that there are three types of relationship (on the web):

  1. Custodial Identities — identities are directly maintained by an organisation and a person has a direct relationship with the organisation;
  2. Contextual Identities — third parties are allowed to use some parts of an identity for certain purposes;
  3. Transactional Identities — credentials are passed for a limited time for a specific purpose to a third party.

Of course there are also some parts to identity which are shared and not wholly owned by any one party.

This mirrors how real world identities work. Our banks, employers and governments maintain custodial identities; whereas a pub, validating your age before serving alcohol need only have the yes/no question answered — are you over 18?

Twitter acts as a custodian for part of my online identity and I don’t want third party applications that use the Twitter API to also act as custodians but the lack of OAuth support means that whether I or they like it they have to. They should only have my transactional identity. Forcing them to hold a custodial identity places both parties (me and the service using the Twitter API) at risk and places unnecessary costs on the third party service (whether they realise it or not!).

But, if I’m honest, I don’t really want Twitter to act as Custodian for my Identity either — I would rather they held my Contextual Identity and my OpenID provider provided the Custodial Identity. That way I can pick a provider I trust to provide a secure identity service and then authorise Twitter to use part of my identity for a specific purpose, in this case micro-blogging. Services using the Twitter API then either use a transactional identity or reuse the contextual identity. I can then control my online identity, those organisations that have invested in appropriate security can provide Custodial Identity services and an ecosystem of services can be built on top of that.

UPDATE

Just wanted to correct a couple of mistakes, as pointed out by Chris, below:

1. Twitter was hacked with a dictionary attack against an admin’s account. Not from phishing, and not from a third-party’s database with Twitter credentials.
2. The phishing scam worked because it tricked people into thinking that they received a real email from Twitter.

Neither OpenID nor OAuth would have prevented this (although that not to say Twitter shouldn’t implement OAuth). Sorry about that.

Permanent web IDs or making good web 2.0 citizens

These are the slides for a presentation I gave a little while ago in Broadcasting House at a gathering of radio types – both BBC and commercial radio – as part of James Cridland’s mission to “agree on technology, compete on content“.

The presentation is based on the thinking outlined in my previous post: web design 2.0 it’s all about the resource and its URL.

UGC its rude, its wrong and it misses the point

Despite recent reports that blogging is dead traditional media companies are still rushing to embrace UGC – User Generated Content – and in many ways that’s great. Except User Generated Content is the wrong framing and so risks failing to deliver the benefits it might. I also find it a rather rude term.

Graffiti

Newspapers and media companies are all trying to embrace UGC — they are blogging and letting folk comment on some of their articles — and if Adam Dooley of snoo.ws is right with good reason, he suggests that UGC might be saving the newspapers.

I don’t think it’s coincidental that this [growth in] popularity has come since many papers have embraced both the Internet’s immediacy (real time news is the thing) and its ability to foster debate and discussion with readers. It’s also come since major papers such as the New York Times have taken the locks off their content making most or all of it free online.

But depressingly UGC is also seen by some as no more than a way to get content on the cheap from a bunch of mindless amateurs, geeks and attention seekers. This view and indeed the very term itself helps to create a dichotomy between professional journalists and the like on one side and everybody else on the other. As Scott Karp points out:

There is a revolution in media because people who create blogs and MySpace pages ARE publishers, and more importantly, they are now on equal footing with the “big,” “traditional” publishers. There has been a leveling of the playing field that renders largely meaningless the distinction between “users” and “publishers” — we’re all publishers now, and we’re all competing for the finite pie of attention. The problem is that the discourse on trends in online media still clings to the language of “us” and “them,” when it is all about the breakdown of that distinction.

Sure most bloggers don’t have the audience of the online newspapers and media companies and there are plenty of people who, as the New Scientist article points out, are simply attention seekers. But that still doesn’t make them ‘users’ and nor does it mean that they’re ‘generating content’ anymore than any other publisher – indeed one might argue that they are less ‘content generators’ than professional journalists. As I sit here writing this post am I a user? If I am I have no idea what I’m using other than WordPress, and if I am then so must journalists be users of their CMS. I know one thing for sure, I don’t think of myself as a user of someone’s site and I don’t create content for them. I suspect most people are the same.

Bloggers, those that contribute to Wikipedia, or otherwise publish content on the Web are amateur publishers — in the same way that amateur sportsmen and women are amateur athletes, whatever their ability — until they give up their day job. But that doesn’t necessarily make them any less knowledgeable about the subject they are writing about. Indeed an ‘amateur publisher’ might well know much more about the subject they are writing about than a professional journalist because they have direct person experience of their subject matter. Whether that be a technical blog by someone who helps make the technology, a news story written on Wikinews or BreakingNewsOn by someone that was there and experienced the events being written about, or even the man that invented the Web. Are any of these people doing UGC? I don’t know what they think – but I know that when I write for this blog, or upload a photo to Flickr – I don’t think I’m generating user content, I’m not doing UGC.

It seems to me that newspapers and media companies need to work to understand how amateur publishers and others can contribute. Not that that is easy — the best bloggers know their subject inside-out, more so than any professional journalist — but equally there is plenty of drivel out there, in both the amateur and professional spheres. For sure there are dreadful blogs, YouTube is full of inane video and fatuous comments but equally partisan news outlets like Fox News, the Daily Mail present biased, misleading and often downright inaccurate reporting. In the week of the US Presidential Elections it is worth considering whether Barack Obama’s use of the Internet — including the role of amateur publishers, UGC if you like — helped dull the effect of such biased news reporting which has historically had a significant role.

The trick then is to find the best content, whoever has written it, and bring it to the fore for people to read and debate. To understand what it is about the Web that makes it an effective communication medium and to harness that in whatever way that that makes sense for each context. Considering the Web in the same patronising fashion as the Culture and Media Secretary Andy Burnham does, that is as “…an excellent source of casual opinion” fails to recognise the value that debate and discussion can bring to a subject.

Media companies should embrace the generative nature of the web

Generativity, the ability to remix different pieces of the web or deploy new code without gatekeepers (so that anyone can repurpose, remix or reuse the original content or service for a different purpose) is going to be at the heart of successful media companies.

Depth of field (Per Foreby)

As Jonathan Zittrain points out in The Future of the Internet (and how to stop it) the web’s success is largely because it is a generative platform.

The Internet is also a generative system to its very core as is each and every layer built upon this core. This means that anyone can build upon the work of those that went before them – this is why the Internet architecture, to this day, is still delivering decentralized innovation.

This is true at a technological level, for example, XMPP, OAuth and OpenID are all technologies that have been invented because the technology layers upon which they are built are open, adaptable and easy for others to reuse and master. It is also true at the content level – Wikipedia is only possible because it is built as a true web citizen, likewise blogging platforms and services such as MusicBrainz – these services allow anyone to create or modify content without the need for strict rules and controls.

But what has this got to do with the success or otherwise of any media company or any content publisher? After all just because the underlying technology stack is generative doesn’t mean that what you build must be generative. There are, after all, plenty of successful walled gardens and tethered appliances out there. The answer, in part, depends on what you believe the future of the Web will look like.

Tim Berners-Lee presents a pretty compelling view in his article on The Giant Global Graph. In it he explains how the evolution of the Internet has seen a move from a network of computers, through the Internet, to a  web of documents and we are now seeing a migration to a ‘web of concepts’.

[The Internet] made life simpler and more powerful. It made it simpler because of having to navigate phone lines from one computer to the next, you could write programs as though the net were just one big cloud, where messages went in at your computer and came out at the destination one. The realization was, “It isn’t the cables, it is the computers which are interesting”. The Net was designed to allow the computers to be seen without having
to see the cables. […]

The WWW increases the power we have as users again. The realization was “It isn’t the computers, but the documents which are interesting”. Now you could browse around a sea of documents without having to worry about which computer they were stored on. Simpler, more powerful. Obvious, really. […]

Now, people are making another mental move. There is realization now, “It’s not the documents, it is the things they are about which are important”. Obvious, really.

If you believe this, if you believe that there is a move from a web of documents to concepts, then you can start to see why media companies will need to start to publish data the right way. Publishing it so that they, and others, can help people find the things they are interested in. How does this happen then? For starters we need a mechanism by which we can identify things and identify the relationship between them – at a level above that of the document. And that’s just what the semantic web technologies are for – they allow different organisations a common way of describing the relationship between things. For example, the Programmes Ontology allows any media company to describe the nature of a programme; the music ontology any artist, release or label.

This implies a couple of different, but related things, firstly it highlights the importance of links. Links are an expression of a person’s interests. I choose what to link to from this blog – which words, which subjects to link from and where to – my choice of links provide you with a view onto how I view the subject beyond what I write here. The links give you insight into who I trust and what I read. And of course it allows others to aggregate my content around those subjects.

It also implies that we need a common way of doing things. A way of doing things that allows others to build with, on top of, the original publishers content. This isn’t about giving up your rights over your content, rather it is about letting it be connected to content from peer sites. It is about joining contextually relevant information from other sites, other applications. As Tim Berners-Lee points out this is similar to the transition we had to make in going from interconnected computers to the Web.

People running Internet systems had to let their computer be used for forwarding other people’s packets, and connecting new applications they had no control over. People making web sites sometimes tried to legally prevent others from linking into the site, as they wanted complete control of the user experience, and they would not link out as they did not want people to escape. Until after a few months they realized how the web works. And the re-use kicked in. And the payoff started blowing people’s minds.

Because the Internet is a generative system it means it has a different philosophy from most other data discovery systems and APIs (including some that are built with Internet technologies), as Ed Summers explains:

…which all differ in their implementation details and require you to digest their API documentation before you can do anything useful. Contrast this with the Web of Data which uses the ubiquitous technologies of URIs and HTTP plus the secret sauce of the RDF triple.

They also often require the owner of the service or API to give permission for third parties to use those services, often mediated via API keys. This is bad, had the Web or the Internet before that adopted a similar approach, rather than the generative approach it did take, we would not have seen the level of innovation we have; and as a result we would not have had the financial, social and political benefits we have derived from it.

Of course there are plenty of examples of where people have been able to work with the web of documents – everything from 800lb gorilla’s like Google through to sites like After Our Time and Speechification – both provide users with a new and distinctive service while also helping to drive traffic and raise brand awareness to the BBC. Just think what would also be possible if transcripts, permanent audio, and research notes where also made available not only as HTML but also as RDF joining content inside and outside the BBC to create a system which, in Zittrain words, provides “a system’s capacity to produce unanticipated change through unfiltered contributions from broad and varied audiences.”

Its been a long time coming – but finally we’re out of beta

Providing online Programme support has a long history at the BBC. Tom Coates (now with Yahoo! Brickhouse) announced the launch of the Radio 3 website in 2004. Then Gavin Bell (now with Nature), Matt Biddulph (Dopplr‘s peripatetic CTO) and Tom spoke [pdf] about Programme Information Pages (or PIPs) back in 2005 at ETech. At the time it was hoped that PIPs would be rolled out to all BBC programmes so that every programme the BBC broadcast had a permanent web presence. Things didn’t quite work out that way.

BBC 2 Schedule
BBC 2 Schedule

For a bunch of reasons this early version of PIPs wasn’t going to scale across the entire BBC programming output. At the time the only solution available to the team was a static web publishing solution and trying to collapse the entire graph down to a series of static webpages was, frankly, a nightmare. But this work did show the way forward and put in place much of the intellectual framework for what followed.

What followed was a new version of PIPs. This new version had, from certain perspectives, a much simpler brief: to provide a repository of programme metadata for all BBC programmes. Of course from other perspectives it was a more complex brief, but that’s another story. What this left however was a public representation of this data.

iPlayer of course is one representation, but iPlayer is trying to solve a different problem. iPlayer is incredible successful at delivering BBC’s Radio and TV content over IP. What it doesn’t solve is a permanent, persistent, web presence for all BBC programmes one that could support the archive and the existing BBC broadcast brands.

Last October we launched BBC programmes with the aspiration to build a true web citizen. One that would enhance the BBC’s web presence, making it a more useful place for people using bbc.co.uk and, at the same time, provide a useful service for external developers.

BBC Programmes at launch
BBC Programmes at launch

The last year has seen the service grow and develop at quite a rate (we’ve tried to release updates every couple of weeks), which especially given the modest size of the team is very impressive. I have tried to chart the major functional changes here on this blog. But what I’ve not tried to report on is the work of other teams who have styled and integrated the service into the existing broadcast brands, such as Springwatch, Last Choir Standing and now the TV Channel and Radio stations. This most recent piece of work – integrating the service into the relaunched TV sites – has also seen the service come out of beta which is truly fantastic.

An episode page for Maestro
An episode page for Maestro

As I mentioned the team is small, however, it is also incredibly talented. I have learnt more from them, and enjoyed working with them, more than I suspect they will every truly know. Consider that 6 people have, in addition to designing and building the service, also designed and built a light weight MVC framework and laid the foundation for a highly interlinked, modern web offering. The credit for the site lies with:

Paul Clifford [Lead Software Engineer]
Duncan Robertson [Software Engineer]
Dave Evans [Software Engineer]
Michael Smethurst [Information Architect]
Jamie Tetlow [Designer]
Stephen Butler [Project Manager]

Should you find yourself in a similar position, needing to design and develop a complex modern web service, then if I were you I would make sure your team is small and full of really smart, T-shaped people who understand the domain and care deeply about the quality of the product they are developing.

So where next? Since although we’re now out of Beta there is still much to be done. At a high level we will be working on two fronts:

Firstly we will be making the pages at /programmes richer and the navigation between them more coherent and consistent. So for example making schedules by format in addition to schedules by genre and generally linking everything up. We’re also going to be adding the missing views – those where we have a view in one format but not others.

We are also working to link between, and transclude data from, other domains. For example, tracklistings on episode pages, aggregation of programmes by artist and more programme information on artist pages.