So this is the question: do you always need separate URIs for non-information resources and the information resource? That is do you need an identifier for both the document and the thing the document is about? Your answer to that question will depend a lot on your attitudes to the semantic web project.

Now until recently I would have said “yes you do need both”, but recently I’ve been thinking that perhaps it’s not quite so black and white.

Before I get into why I think it probably makes sense to backtrack a little and explain the background to the question. After all for many people this question seems odd: why on earth would you need a URI for anything other than the web page, the document?

In the real world we give all sorts of things identifiers: people have passports and National Insurance Numbers; buildings get Post Codes; books ISBNs etc. We do this because it’s useful to be able to unambiguously identify stuff. To be able to point, discuss and share information about things.

On the Internet we have email addresses and URIs on the Web. OpenID for example is predicated on the notion that a person can have an URI to identify themselves. And the Linked Data project gives URIs for not just people, but all sorts of things: people, places, animals, music, and through dbpedia the myriad of things described in Wikipedia.

Once you have an identifier for a thing you can make assertions about that thing. How big it is, where it is (in the real world), when it was created, who owns it, anything. You can also describe how those things relate to other things – this person is friends with this person and works for this company, which is at this address etc.

Now many people will tell you (indeed I probably will too) that you need to distinguish the statements you make about the thing in the real world from the statements about the document. For example, a URI for me might return a document with some information about me, but the creation date for that document and the creation date for me are two different things. And because you don’t want to get confused it’s better to have a URI for the thing and another one for the document making assertions about the thing. Make sense?

For those that are interested there are a couple of different ways of achieving this separation. For the purposes of this post it’s not important to know how to do this, but if you’re interested have a look at this paper by Richard.

But here’s the thing, many people will tell you that this is all too complex and frankly unnecessary, indeed you may well be thinking the same thing right about now.

Some people will tell you that the whole non-information resource thing isn’t necessary – we have a web of documents and we just don’t need to worry about URIs for non-information resources; others will claim that everything is a thing and so every URL is, in effect, a non-information resource.

Michael, however, recently made a very good point (as usual): all the interesting assertions are about real world things not documents. The only metadata, the only assertions people talk about when it comes to documents are relatively boring: author, publication date, copyright details etc.

If this is the case then perhaps we should focus on using RDF to describe real world things, and not the documents about those things.

On the Web there are a number of different ways of making an assertion about a thing (as identified via a URI): you can state how it relates to other things, you can link it to a piece of data (e.g. RDF literals) or you can link it to a document which makes some statements about the thing (e.g. a news article).

The question is: is there much utility in defining non-information resources in this third scenario: do you need URIs for the documents? Obviously they still need a URL so you can link to it and you should make that document available in a variety of representations but do you need a separate identifier for the non-information resource?

I think not.

This is how I’ve started to think about it: RDF is a great way of describing how (real world) things relate to each other and for this you need URIs for non-information resources. And because you’re dealing with real world things (I know documents are real world things too, but going down this path is how we ended up with the confusion we have today) you will hopefully have interesting and useful links to other things, useful chunks of data and links to useful documents about that thing. Those documents could be in any format – they could be an HTML document, a (Flash) movie, MP3 file, even a csv file. The point is the documents decorate the tree they are discoverable via the RDF graph but they don’t need to be published as RDF themselves.

An RDF graph of things is therefore a great way to: discover documents, to make assertions and share what we know about how those things. Or put another way RDF is a way of building a vocabulary to describe how web resources related to real world objects. I my however me wrong and I would be interested to hear what others think.