There’s only metadata and URIs

On the web I reckon there’s only metadata and URIs or perhaps there’s no metadata and only data. Either way the metadata, data/content distinction isn’t helpful.

Linked Data allows you to bind HTTP URIs to an object and to information about that object. This is useful because it’s more useful to talk about real world things — things like people, places and events — the things that people think about. Despite this I have numerous conversations, and have done for years, about what ‘metadata’ to use to describe a document. Typically what this really means is: “what keywords to use so that some technomagical solution can use that ‘metadata’ to personalise/ recommend content”.

Self-portraiture + metadata by Saltatempo's. Some rights reserved
Self-portraiture + metadata by Saltatempo's. Some rights reserved

Beyond the obvious — keywords on their own are never going to achieve the sorts of solutions non-technical people imagine — it also forces an unhelpful schism. It makes people think about their content and your metadata, or that metadata is somehow outwith the content they are creating. The trouble is that one persons data is another persons metadata. Is the title of a story metadata or content? Is a news story content or metadata about a real world event? The answer depends on your perspective.

It seems to be that a more useful way to think about things is to have URIs to identify things and then have information/documents/data/metadata/whatever that make assertions about those things. Sometimes those bits of information will be simple data points, for example, for an album release they might include information/metadata about who performed or wrote the piece (obviously linking to URIs to identify the person who did perform or write it, with appropriate predicates) while other bits of metadata might be more verbose: reviews of the album or the lyrics etc. and then again some might be media things (recordings of the album etc.).

And of course because we’re talking about a graph of data, those documents making assertions about a thing can in turn also have metadata/data/documents which make assertions about them, for example, who wrote it, comments about it etc.

Imagine what might happen if a news website took this approach? You would mint a URI for the event (or reuse one that already existed) and then write news stories about it, each with their own URL, each making assertions about that event. It would create a news service which was truly native to the Web, rather than a facsimile of the printed press. Imagine then what it would be like if we could link-up all the news stories on the web which also made assertions about that event. As a user of such a site/ set of sites I could find everything about a given thing (a person, event or place).

Of course, as Dan Brickley, put it:

concepts and events are still social and technological artefacts, but they are designed to help interconnect descriptions of butterflies, documents (and data) about butterflies, and people with interest or expertise relating to butterflies.

In other words what matters is a way of identifying things, a way of interconnecting them and a way of describing them — subdividing those ways of describing them into ‘data’ and ‘metadata’ is unhelpful, or at the very least adds nothing useful.

It is however useful to separate our concept of something from our conception of it. As Stephen Pinkers puts it:

…if you look up William Shakespeare in a dictionary it says “English playwright, lived in the 17th century, wrote Romeo and Juliet and Hamlet, etc.” Is that what the name William Shakespeare means, and is that what the concept William Shakespeare is? That sounds plausible, but it turns out not to be true. If we were to learn that William Shakespeare didn’t write any of the plays attributed to him — let’s say that we learned he didn’t even live in Stratford, that there was a clerical error and he really lived in Warwick. He would still be William Shakespeare, and we wouldn’t posthumously dub the real author of Shakespeare’s plays William Shakespeare. We would just say we were mistaken about what we believed about William Shakespeare.

So what is the concept of William Shakespeare, the meaning of the word William Shakespeare? Basically, when Mr. and Mrs. Shakespeare christened their son William, and the name stuck, and then everyone who knew him, and then who knew someone else, who knew someone else, and passed it down to us — that unbroken chain of transmission of the name from the moment of first dubbing is what gives William Shakespeare its meaning. There’s a sense in which to have a concept necessarily means to be connected to the world through this chain of transmission of a name going back to the moment of first dubbing.

So while I don’t think it’s helpful to separate data from metadata it is helpful separate concept from conception.

9 responses to “There’s only metadata and URIs”

  1. I’m not sure I understand what the last example is driving at, or your distinction between concept and conception. If WS is discovered not to have written the plays attributed to him (“English actor, lived in the 17th century, formerly believed to have written Romeo and Juliet and Hamlet, etc.”) – is the ‘concept’ the actual thing we’re talking about, in this case the dead dude called William whose history hasn’t been changed by this modern-day discovery, and the ‘conception’ our understanding of him, which is fundamentally changed?

    1. If some of the facts we currently ascribe to the person William Shakespeare (the concept) turn out not to be true then I don’t think most people would reject the William Shakespeare concept, nor would they wish to change that concept. However, our conceptualisation – the information we have about the concept William Shakespeare – has changed.

      Likewise both you and I would probably agree there’s a concept/ a person known as William Shakespeare (or the place ‘London’ or the event ‘Wimbledon 2009’ etc.), however, we might disagree about how we conceptualise that person, place or event. That is a concept isn’t defined by it’s conceptualisation.

      This is one reason why it’s important to separate identifiers for a thing, a concept from the information about it.

  2. The news analogy is especially interesting to me – if you take a URI for an event, a URI for a story about an event, (and in that story, you link to all the URIs for concepts contained within the story, e.g. places, people, other events), *and* then a URI for the person who wrote the story, you potentially can start to build a web of people’s perspectives on events – and from that point, begin to draw interesting conclusions about prevailing opinions and interpretations of history and the world around us.

    1. Agreed, it would also be nice if you knew something about the authors social graph, their beliefs, education, have they visited the country they are writing about (if relevant) etc. Find counter arguments and supporting evidence etc.

      I guess the underlying point of my post was that until authors stop thinking of their story as the primary object and start thinking of the thing they are writing about as the primary object we’re never going to get there.

      1. Definitely. The story is a by-product (though that sounds a little harsh, to be fair) of the objects the journalist has identified, and the links he/she has drawn between them…

  3. Seems that the data/metadata distinction happens when the “data” is in a different *language* to the metadata. An image, for instance, is not metadata. Perhaps the rule is: if google can pull it apart and understand the bits in it, then its metadata.

    To put it another way, data is stuff that is *atomic* (as far as the web is concerned). Your point, then, is that today’s atom is tomorrow’s molecule. True enough, I suppose. An image that is a photo of a celebrity is not just an opaque set og bits, but with enough technology becomes a mention of that celebrity.

    That being the case, “metadata” is simply stuff that fills in the gaps that the technology cannot do. One day, we will not need to accompany the photo with a caption saying “this is [President Barack Obama] at [the United Nations]” – the tech will extract it from the photo itself.

    Until then, we need metadata for photos.

    1. Not sure I agree – I think photos can be metadata.

      If I have a URI for a person then with regard to that person a photo of them is metadata (information about them), although some people might feel a little odd describing a person as data. Or (since I work at the BBC) a programme page might have a photo to illustrate that programme (of piece of video), again from the perspective of the programme that photo is metadata about that programme.

      Photos can also have their own metadata – but that doesn’t mean that photos should only be thought of as data. It’s metadata all the way down! Or data all the way down.

  4. […] We often don’t give ourselves the best opportunity to womble with what we’ve got – to reuse what others make, to reuse what we make ourselves. Or to let others outside our organisations build with our stuff. If you want to take these opportunities then publish your data the webby way. […]

  5. […] There’s only metadata and URIs « Derivadow.com (tags: metadata linkeddata uris) […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: