Linked things

Library Parabola

So this is the question: do you always need separate URIs for non-information resources and the information resource? That is do you need an identifier for both the document and the thing the document is about? Your answer to that question will depend a lot on your attitudes to the semantic web project.

Now until recently I would have said “yes you do need both”, but recently I’ve been thinking that perhaps it’s not quite so black and white.

Before I get into why I think it probably makes sense to backtrack a little and explain the background to the question. After all for many people this question seems odd: why on earth would you need a URI for anything other than the web page, the document?

In the real world we give all sorts of things identifiers: people have passports and National Insurance Numbers; buildings get Post Codes; books ISBNs etc. We do this because it’s useful to be able to unambiguously identify stuff. To be able to point, discuss and share information about things.

On the Internet we have email addresses and URIs on the Web. OpenID for example is predicated on the notion that a person can have an URI to identify themselves. And the Linked Data project gives URIs for not just people, but all sorts of things: people, places, animals, music, and through dbpedia the myriad of things described in Wikipedia.

Once you have an identifier for a thing you can make assertions about that thing. How big it is, where it is (in the real world), when it was created, who owns it, anything. You can also describe how those things relate to other things – this person is friends with this person and works for this company, which is at this address etc.

Now many people will tell you (indeed I probably will too) that you need to distinguish the statements you make about the thing in the real world from the statements about the document. For example, a URI for me might return a document with some information about me, but the creation date for that document and the creation date for me are two different things. And because you don’t want to get confused it’s better to have a URI for the thing and another one for the document making assertions about the thing. Make sense?

For those that are interested there are a couple of different ways of achieving this separation. For the purposes of this post it’s not important to know how to do this, but if you’re interested have a look at this paper by Richard.

But here’s the thing, many people will tell you that this is all too complex and frankly unnecessary, indeed you may well be thinking the same thing right about now.

Some people will tell you that the whole non-information resource thing isn’t necessary – we have a web of documents and we just don’t need to worry about URIs for non-information resources; others will claim that everything is a thing and so every URL is, in effect, a non-information resource.

Michael, however, recently made a very good point (as usual): all the interesting assertions are about real world things not documents. The only metadata, the only assertions people talk about when it comes to documents are relatively boring: author, publication date, copyright details etc.

If this is the case then perhaps we should focus on using RDF to describe real world things, and not the documents about those things.

On the Web there are a number of different ways of making an assertion about a thing (as identified via a URI): you can state how it relates to other things, you can link it to a piece of data (e.g. RDF literals) or you can link it to a document which makes some statements about the thing (e.g. a news article).

The question is: is there much utility in defining non-information resources in this third scenario: do you need URIs for the documents? Obviously they still need a URL so you can link to it and you should make that document available in a variety of representations but do you need a separate identifier for the non-information resource?

I think not.

This is how I’ve started to think about it: RDF is a great way of describing how (real world) things relate to each other and for this you need URIs for non-information resources. And because you’re dealing with real world things (I know documents are real world things too, but going down this path is how we ended up with the confusion we have today) you will hopefully have interesting and useful links to other things, useful chunks of data and links to useful documents about that thing. Those documents could be in any format – they could be an HTML document, a (Flash) movie, MP3 file, even a csv file. The point is the documents decorate the tree they are discoverable via the RDF graph but they don’t need to be published as RDF themselves.

An RDF graph of things is therefore a great way to: discover documents, to make assertions and share what we know about how those things. Or put another way RDF is a way of building a vocabulary to describe how web resources related to real world objects. I my however me wrong and I would be interested to hear what others think.

23 responses to “Linked things”

  1. Tom,

    I agree with most of what you assert in this post, but it doesn’t cover the scenario where the “Subject” of a Description is a Web Information Resource (the chunk of Data that browsers present as “Compound Document” to it users).

    The problem with the Web of Linked Data lies in the fact that information resources (Docs) are the native information projection mechanism. Thus, at some point you hit the inevitability that is the essence of this post: how do I describe a Thing of Type: Web Document?

    Beyond Web Information Resources, the issue is simpler, give a Name (using an HTTP URI based Name Reference) to the Subject of your Structured Description which ultimately gets published to the Web at an Address (URL).

    Existing Web Documents are Things, and the transition to a Web of Linked Data cannot ignore this reality as it leaves an uptake void i.e., how do we turn the existing Linked Documents on the Web into Structured Linked Data meshes while ignoring their metadata etc..

    One option I am mulling over goes like this: use the HTTP response data to make a provenance oriented graph that is associated with the data object collection (named entities and subject matter) referred to in the Web Doc. This deemphasizes the container resource (as per Linked Data abstraction essence) without creating the void mentioned above i.e., boring metadata isn’t lost.

    Nice post! Might have triggered a alternative solution for our proxy style linked data URIs for Web Documents :-)


    1. — Proxy style HTTP based URI for this blog post .


    1. Yes web documents are things, but I’m suggesting that their only relationship to Linked Data is the relationship they have with real world things.

      So for example, a web document on a news site about last nights semi final in the World Cup between Spain and Germany would be linked to a URI for the match, the two teams (each of which would have further links to the World Cup 2010 and the two national sides etc.) etc. nothing more i.e. there isn’t a URI for the web document as a NIR.

      1. Yes, we use the doc URL as a cheap basis for the proxy URIs we generate :-) So for the BBC report on a World Cup match, we make a URI for the description of the match report.


        1. — URI for a BBC Web Site Report .

  2. Great post, Tom.

    As almost always, I keep finding myself agreeing strongly with Michael’s perspective on things. The assertions made about a document are often boring. On my own site, I have assertions in RDFa that I’m the author, that it’s a blog, and that it’s about me (narcissistic much?). At the moment, I’ve had to hand-wrap these assertions in my otherwise wonderfully automatic WordPress blog (“/<span rel="foaf: … " doesn't exactly trip off the fingers, does it?). And I'm looking forward to these kinds of assertions being generated by the tools I use.

    So they're boring, but I think the metadata about documents is still very important to have exposed as RDF. I think that's what you're saying by the documents being found from the RDF graph, right?

    But I don't need to publish my entire blog post (which is written for people to understand) as RDF. It wouldn't make sense. However, It'd be cool if I could more easily interweave graph data with human-readable data. So, In a recent post about Stephen Fry's new iPad app, It would have been cool if the tags I use as part of WordPress carried some additional meaning: "This post is about this topic, this thing is an iPad, the app mentioned is this one specifically, this person is 'Stephen Fry', etc etc. These would then be tiny assertions as part of the graph pulling in useful data from human-readable and intended documents. It might be nice, for example, to be able to name a link disassociating myself from something. For the same reason I no longer link to Daily Mail articles in my tweets (all trafic is good trafic), I'd like to be able to more accurately define my relationships with the real-world objects I'm linking to.

    I understand this is very much from a human, writerly perspective. But, I kind of see the interweaving of RDF assertions within documents as a potential big win.

    But there will be places where RDF is not a useful framework.

    I think you've captured this very well, Tom!

  3. I think I agree with the vast majority of all this. It’s certainly true that the claims we tend to make about documents are pretty generic and boring (dull but worthy, perhaps).

    The one thing I’m not so sure about is probably more where Zach says “But I don’t need to publish my entire blog post (which is written for people to understand) as RDF. It wouldn’t make sense.”

    The next bit is definitely true (interweaving graph data with human-readable data), but my hunch is that it *would* be quite cool to have an RDF representation of the content of a blog post. Yes, it wouldn’t make much sense to a person (especially one expecting text), but that doesn’t mean it’s not useful. I know I keep going on about it, but having what amounts to a small ball (constellation?) of URIs and links between them, which represent the content of the blog post, seems just the same to me as having a block of text which represents the content.

    The content of the blog post isn’t text. The content is the ideas & concepts. Text is just the form, just as you could have made a video or audio form expressing the same things. A ‘ball’ of URIs & links, likewise, would express the content in a way that is truly native to the Web, because it’s using the very building blocks of the Web.

    How we then visualise that ‘ball’ and allow people to traverse it, is another matter, but I don’t think we should be abandoning attempts to explore the use of the fundamentals of the Web to describe concepts & ideas just because it’s not text…(which, I’m sure, neither you or Zach is really saying…)

    I guess this particular post is really just reaffirming what we thought already – that we should concentrate on identifying and linking up ‘things’, and *then* think about documents later (if indeed at all).

    So yes, we probably don’t need to identify ‘non-information resources’ for documents – as you say, the claims we can make about them are (mostly) not that useful (though perhaps sometimes in historical enquiry they are, possibly), but recognising that the document is essentially a different form of a ‘ball’ of URIs & links is important too, I feel.

  4. @r4isstatic

    I think I know where you’re coming from, and I agree that it could be cool in some situations. But how do I map the concepts I’m talking about using URI’s? English is densely semantic, and open to interpretation. As a human, you’re great at understanding this subconsciously.

    I’m of the opinion that most language expressed is only loosely encoded. We leave a lot open to interpretation in most of what we say. It leaves a lot of room for subtlety and means we can use small amounts of text (or utterance) to express a whole lot of semantics.

    So, I’m not entirely sure a mapping of uris expressing the semantics of my blog post could do the job at all. It’s not what RDF is for, anyway, as it’s made to deal with structured data. Text can be structured, I suppose, but the RDF description of a blog post would miss out much of the intended meaning and all of the interpreted meaning (i.e. the semantics inferred by the reader or hearer.)

    Having URI’s to disambiguate some things, and having the graph system for explicitly naming relationships between these things is interesting (i.e. broadly saying “I don’t like x” or “I just bought y” or “I’m referring to this kind of z”), but I don’t think it’s useful to try to RDFise everything there is, especially when it’s not structured. Also, it’d make the creation of articles cumbersome. The author would forever be trying to keep up with vocabularies and ontological subtleties: “Do I mean I foaf:know this person?” It raises the barrier to publishing data as RDF by making it over complicated at stages where it’s not necessary to be that structured.

  5. Agreeing with Zach about r4isstatic’s comment that “The content is the ideas & concepts. Text is just the form,”. Even functional views of communication such as speech-act theory view communication as multi-layered “It’s cold in here” may mean just that or “Please close the door”. The point of NOT saying the latter is to carry the message without being to blunt about it. Attempts to make this sort of thing explicit have failed spectacularly (Winograd and Flotre’ Coordinator!). Rhetoric – in a purely semantic world irony too would go the same way as politeness … and just imagining the markup for aesthetic qualities, maybe Shakespeare in RDF? This said in a blog post just before the general election I found I had to tell one commenter that the post was meant to be ironic, so maybe explicit markup would be useful :-/

    1. Exactly! Language is hugely complicated and far from—computationally—structured. The contexts are supremely complex and our ability to make sense of the semantics of language is barely understood despite hundreds of millennia of us using it all the time. I don’t think we need to worry about RDFising everything we write. I don’t think we could. I don’t think we should.

      Besides, this might pull attention away from the usefulness of structured data models, and the big wins of using RDF in the first place. This feels a little like the hype curve, with RDF-everything solving every problem we can throw at it.

      RDF is a great idea, and it offers the ability to organise and link to a whole web’s worth of information. I’d love to see tools making use of graph-like logic and for this to be something that helps us more and more. But it’s hardly the right tool for encoding everything we say. It’s more, in this context certainly, like the gesture we make while talking face to face: it’s over there… that one!

      1. I think the metadata about documents is still very important to have exposed as RDF. I think that’s what you’re saying by the documents being found from the RDF graph, right?

        Yes! the metadata is still important to have exposed as RDF but only because it allows us to link it to a RDF graph of things – a graph that includes URIs for people, iPads, places and the like.

        I think that limiting RDF to resource discovery hopefully makes the world a little bit more simple, but also allows publishers to link together content using the things people think about.

  6. I’m not sure why OpenID didn’t make it. I mean, it had everything…people simply hated to type their username over and over again and wanted some simple system. I think it was partly because users didn’t trust them…if they could find some way to make sure people knew they were supported by such big companies that would have been a big credibility factor.

    Your thought on RDF are pretty insightful, can’t wait to see whether things like the semantic web will bring more of these concepts to the mainstream population.

    1. OpenID might yet make it… but I think people just didn’t understand how it could work! How can entering a URL work and what’s stopping someone else typing it in and accessing my account etc.

  7. […] Scott’s recent Linking Things post got me jotting down what I’ve been thinking lately about URIs, Linked Data and the Web. […]

  8. Great post Tom … it spurred me on to post something similar myself … although slightly different conclusions :-)

    1. Thanks, heading over there in a bit to carry on the conversation there :)

  9. I have a question about the notion that the metadata about the document isn’t important enough to warrent separating out the two resources. Retreiving all the metadata that the author R4isStatic has contributed about Doctor Who seems a definate use case. Being able to make determinations regarding perspectives – ‘this document was written before event foo’ is another. Whats the best approach to take to achieve these ?

    1. So I think those are useful use cases and ones that we need to support. But I think the way to do it is link the document to the URI for R4isStatic (author) and the URI for Doctor Who etc. I don’t think you also need to have a NIR for R4isStatic’s document.

  10. chris sizemore Avatar
    chris sizemore

    I used to think that “resource” was the least understood word in and around these themes (including by me). now I think it’s “document”! “document” is used in the post and comments here in at least 4 different senses, without anyone blinking. “ubiquitous language” alert!

    1. The problem with “Resource” is that its yet another example of horrific term overloading. A “Resource” is a physical artifact in a given realm. It was so long before the Web, and the term re. computing originates (I think) from early Apple Macintoshes via Mac OS Classic.

      In Semantic Web lingo, Real World Objects have become darn “Non Information Resources” (yuck!). Likewise, an actual “Resource” is now referred to as an “Information Resource”.

      All of the above comes down to people simply not understanding the URI abstraction which delivers: Name References (via multiple schemes including HTTP) and Address References (URLs also via multiple schemes including HTTP).

      Giving Name References to Real World Objects is essential to producing structured descriptions about them in any medium (Web included) :-)


    2. Chris,

      Re, my “I Think” comment, that sheds light on what a “Resource” is [1]. Sadly, Semantic Web parlance has lead to overloaded mangling of this critical term.



  11. Richard Cyganiak and Leo Sauermann sorta define Document in Cool URIs for the Semantic Web:

    Like everything on the traditional Web, each of the pages mentioned above are Web documents. Every Web document has its own URI. Note that a Web document is not the same as a file: a single Web document can be available in many different formats and languages, and a single file, for example a PHP script, may be responsible for generating a large number of Web documents with different URIs. A Web document is defined as something that has a URI and can return representations (responses in a format such as HTML or JPEG or RDF) of the identified resource in response to HTTP requests. In technical literature, such as Architecture of the World Wide Web, Volume One [AWWW], the term Information Resource is used instead of Web document.

    On the traditional Web, URIs were used primarily for Web documents—to link to them, and to access them in a browser. The notion of resource identity was not so important on the traditional Web, a URL simply identified whatever we see when we type it into a browser.

    Document just seems to be a slightly more palatable way of saying Information Resource. Personally I prefer Resource and Representation from REST. So when people say that it’s important that they can get metadata for a Document I think in my head that they’d like to have metadata for a Representation–which normally come over the wire in HTTP response headers.

    1. Sorry, I meant to s/Document/Web Document/ in my last comment.

  12. Hi everyone,

    I just wanted to say that this is one of the best discussions on RDF I’ve read. I’ve been somewhat familar with RDF for a while now but have basically had this nagging indecision as to what “level”/”depth” at which to apply assesertions. Should I RDFise the document and make assertions about the high-level topics and included concepts or should I make asserstions on the actual content – the words and sentences contained within that document.
    I know see that this is still somewhat open to debate and interpretation. Thankfully, as a result of this discussion, I now have a much better understanding of some of the arguments surrounding this question.

    Many thanks everyone.

  13. Btw, the comment from Michael is simply based on Tim Berners-Lee’s statement “It’s not the documents — It’s the things” (see ;)

Leave a Reply to Tom Scott Cancel reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: