URL shortening it’s nasty but it’s also unnecessary

URL shortening is just wrong and it’s not just me that thinks so Joshua Schachter thinks so too and Simon Willison has a partial solution. The reason various folk are worried about URL shortening and think that it’s largely evil is because it breaks the web.

"The weakest link" by Darwin Bell. Some rights reserved.
"The weakest link" by Darwin Bell. Some rights reserved.

URLs need to be persistent and that’s not so likely when you use these services. But the ever increasing popularity of Twitter, who impose a 140 character limit on tweets, means that more and more URLs are getting shortened. The ridiculous thing is it isn’t even necessary.

In addition to the rev=”canonical” fix that Kellan proposed Michael has also recently come across longurl.org which

…could solve at least some of these problems. It provides a service to expand short urls from many, many providers into long urls

That’s cool because:

it caches the expansion so has a persistent store of short <> long mappings. They plan to expose these mappings on the web which would also solve [reliance on 3rd party – if they go out of business links break]

Of course what would be extra cool would be if, in addition to the source code being open sourced, so was the underlying database. That way if anything happened to longurl.org someone else could resurrect the service.

All good stuff. But the really ironic thing is that none of this should be neccessary. The ‘in 140 characters or less’ thing isn’t true. As Michael points out:

if i write a tweet to the 140 limit that includes a link then <a href=”whatever”>whatever</a> will be added to the message. so whilst the visible part of the message is limited to 140 chars the message source isn’t. There’s no reason twitter couldn’t use the long url in the href whilst keeping the short url as the link text…

All Twitter really needs to do is provide their own shortening service – if you enter anything that starts “http://&#8221; it gets shortened in the visable message. Of course it doesn’t really need to actually provide a unique, hashed URL, it could convert the anchor text to “link” or the first few letters of the title of the target page while retaining the full-fat, canonical URL in the href.

12 responses to “URL shortening it’s nasty but it’s also unnecessary”

  1. I think a nice solution would be for Search Engines to index the underlying URL rather than the shortened URL for known popular URL shortening services.

    This would mean that people could continue to share short URLs via Twitter etc. but once it comes down to those URLs being picked up by Google, Yahoo etc. the full URL gets indexed.

    1. That’s true and that would help, but it wouldn’t fix the resultant broken links if the link shortner went bust, recycled the link or decided to censor the site.

  2. I fully agree with you, indeed, it went wrong at the very moment Twitter decided to not provide a url shortener themselves. That would have meant exit Twitter, exit shortened links. But the problem is bigger now Tweets themselves get indexed and stored on the web.

    If you’re a blogger, you can at least match the lifetime of links to your blog to the lifetime of your blog itself by installing the WordTwit plugin.
    This turns your blog engine into its own url shortener. Example: this link http://squio.nl/blog/Cd to my blog is expanded on the blog itself and links to the full post Use your blog as shortened URL service.

    (Note: I’m not affilated in any way with the WordTwit plugin).

    All nice, but I do agree that that Url shorteners should not be necessary in the first place!

    1. That certainly helps – especially when combined with the rev=”canonical” solution as proposed by Kellan and implemented by Simon Willison: http://simonwillison.net/2009/Apr/11/revcanonical/

  3. Twitter have already started doing exactly this. eg see my tweet: http://twitter.com/frankieroberto/status/2040670517 – they’ve truncated the visible url to 30 characters (including “…” at the end).

    Unfortunately, they haven’t factored this truncation into the 140 character limit yet. But it’s surely only a matter of time…

  4. “[…]in addition to the source code being open sourced[…]” Safe.mn addresses the two main criticisms to URL shorteners: security and transparency. All links are thoroughly verified for viruses, malware, phishing, malicious content, session stealing, cross-site scripting attacks, etc. Any suspicious link gets flagged, and users are warned about it. Safe.mn is also the most transparent URL shortener service: all links generated by Safe.mn are publicly available, and updated regularly.

    The code is not outsourced, but the list of all shortened URLs is available to anybody.

    1. Julien, that’s good news that you are making the data available, but looking at your site I see that you are licensing the underlying data under a CC-no derivative works license which, when it comes to data, is seriously limiting. Put it this way, it would prevent anyone else from using your data to create a similar service should safe.mn decide to close the service, if you went bust etc. And that’s important because it goes to the very heart of the problem – shortening services make the web more brittle because they risk resulting in broken links.

      1. Tom, that is a good point. Which version of the CC would you prefer? I can certainly change it to a less restritictve version.

  5. @Julien basically you need to allow derivative works for data.

  6. Just to correct myself. When a tweet contains the shortened URL the source is actually:
    <a href=”shorturl” rel=”nofollow” target=”_blank”>shorturl</a>

    So if the message body is 140 characters long the message source is 140 + 46 characters of additional markup + the length of the short URL in the @href.

    The source of this tweet for example is 205 characters.

    I’m guessing twitter do the URL shortening (where they feel necessary) on tweet ingest and don’t store the original message / URL. Then when they display the tweet they add the link element.

    Obviously it would add huge overheads if they shortened URLs on publishing but there’s no reason (other than data storage) why they couldn’t additionally store the tweet as:
    <a href=”longurl” rel=”nofollow” target=”_blank”>shorturl</a>
    or even
    <a href=”longurl” rel=”nofollow” target=”_blank”>link</a>
    and use that version for the web and the shorturl version for elsewhere.

    Having said that there’s always some reason why people do things that’s impossible for an outsider to see so difficult to second guess their decisions!

    @frankie twitter only shorten where they feel necessary. Not really possible to work out the rules for this without spamming twitter with test messages. I think the point is it’s never necessary. Even if they have to use a shortener to shorten the visible message they don’t have to shorten the link @href on the web.

    @Julien as open as possible – in this case totally :-)
    http://wiki.creativecommons.org/CC0
    it gets timbl’s backing
    http://lists.w3.org/Archives/Public/public-lod/2009Jun/0091.html

    1. Twitter may well store the message as 205 characters – it’s not as if that would eat too much disk space. The only hard restriction comes in when they send it as a SMS text message.

      It would have been just as sensible for text-message recipients not to see the link at all (maybe just [] brackets around the link-word). They have to log on to a modern-day device to follow the link anyway, so they might as well visit Twitter when they do that…

Leave a Reply to Michael Smethurst Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: