Comments

22 Comments so far. Leave a comment below.
  1. Tom,

    I agree with most of what you assert in this post, but it doesn’t cover the scenario where the “Subject” of a Description is a Web Information Resource (the chunk of Data that browsers present as “Compound Document” to it users).

    The problem with the Web of Linked Data lies in the fact that information resources (Docs) are the native information projection mechanism. Thus, at some point you hit the inevitability that is the essence of this post: how do I describe a Thing of Type: Web Document?

    Beyond Web Information Resources, the issue is simpler, give a Name (using an HTTP URI based Name Reference) to the Subject of your Structured Description which ultimately gets published to the Web at an Address (URL).

    Existing Web Documents are Things, and the transition to a Web of Linked Data cannot ignore this reality as it leaves an uptake void i.e., how do we turn the existing Linked Documents on the Web into Structured Linked Data meshes while ignoring their metadata etc..

    One option I am mulling over goes like this: use the HTTP response data to make a provenance oriented graph that is associated with the data object collection (named entities and subject matter) referred to in the Web Doc. This deemphasizes the container resource (as per Linked Data abstraction essence) without creating the void mentioned above i.e., boring metadata isn’t lost.

    Nice post! Might have triggered a alternative solution for our proxy style linked data URIs for Web Documents :-)

    Links:

    1. http://linkeddata.uriburner.com/about/id/http/derivadow.com/2010/07/01/linked-things/ — Proxy style HTTP based URI for this blog post .

    Kingsley

  2. Great post, Tom.

    As almost always, I keep finding myself agreeing strongly with Michael’s perspective on things. The assertions made about a document are often boring. On my own site, I have assertions in RDFa that I’m the author, that it’s a blog, and that it’s about me (narcissistic much?). At the moment, I’ve had to hand-wrap these assertions in my otherwise wonderfully automatic WordPress blog (“/<span rel="foaf: … " doesn't exactly trip off the fingers, does it?). And I'm looking forward to these kinds of assertions being generated by the tools I use.

    So they're boring, but I think the metadata about documents is still very important to have exposed as RDF. I think that's what you're saying by the documents being found from the RDF graph, right?

    But I don't need to publish my entire blog post (which is written for people to understand) as RDF. It wouldn't make sense. However, It'd be cool if I could more easily interweave graph data with human-readable data. So, In a recent post about Stephen Fry's new iPad app, It would have been cool if the tags I use as part of WordPress carried some additional meaning: "This post is about this topic, this thing is an iPad, the app mentioned is this one specifically, this person is 'Stephen Fry', etc etc. These would then be tiny assertions as part of the graph pulling in useful data from human-readable and intended documents. It might be nice, for example, to be able to name a link disassociating myself from something. For the same reason I no longer link to Daily Mail articles in my tweets (all trafic is good trafic), I'd like to be able to more accurately define my relationships with the real-world objects I'm linking to.

    I understand this is very much from a human, writerly perspective. But, I kind of see the interweaving of RDF assertions within documents as a potential big win.

    But there will be places where RDF is not a useful framework.

    I think you've captured this very well, Tom!

  3. r4isstatic,

    I think I agree with the vast majority of all this. It’s certainly true that the claims we tend to make about documents are pretty generic and boring (dull but worthy, perhaps).

    The one thing I’m not so sure about is probably more where Zach says “But I don’t need to publish my entire blog post (which is written for people to understand) as RDF. It wouldn’t make sense.”

    The next bit is definitely true (interweaving graph data with human-readable data), but my hunch is that it *would* be quite cool to have an RDF representation of the content of a blog post. Yes, it wouldn’t make much sense to a person (especially one expecting text), but that doesn’t mean it’s not useful. I know I keep going on about it, but having what amounts to a small ball (constellation?) of URIs and links between them, which represent the content of the blog post, seems just the same to me as having a block of text which represents the content.

    The content of the blog post isn’t text. The content is the ideas & concepts. Text is just the form, just as you could have made a video or audio form expressing the same things. A ‘ball’ of URIs & links, likewise, would express the content in a way that is truly native to the Web, because it’s using the very building blocks of the Web.

    How we then visualise that ‘ball’ and allow people to traverse it, is another matter, but I don’t think we should be abandoning attempts to explore the use of the fundamentals of the Web to describe concepts & ideas just because it’s not text…(which, I’m sure, neither you or Zach is really saying…)

    I guess this particular post is really just reaffirming what we thought already – that we should concentrate on identifying and linking up ‘things’, and *then* think about documents later (if indeed at all).

    So yes, we probably don’t need to identify ‘non-information resources’ for documents – as you say, the claims we can make about them are (mostly) not that useful (though perhaps sometimes in historical enquiry they are, possibly), but recognising that the document is essentially a different form of a ‘ball’ of URIs & links is important too, I feel.

  4. @r4isstatic

    I think I know where you’re coming from, and I agree that it could be cool in some situations. But how do I map the concepts I’m talking about using URI’s? English is densely semantic, and open to interpretation. As a human, you’re great at understanding this subconsciously.

    I’m of the opinion that most language expressed is only loosely encoded. We leave a lot open to interpretation in most of what we say. It leaves a lot of room for subtlety and means we can use small amounts of text (or utterance) to express a whole lot of semantics.

    So, I’m not entirely sure a mapping of uris expressing the semantics of my blog post could do the job at all. It’s not what RDF is for, anyway, as it’s made to deal with structured data. Text can be structured, I suppose, but the RDF description of a blog post would miss out much of the intended meaning and all of the interpreted meaning (i.e. the semantics inferred by the reader or hearer.)

    Having URI’s to disambiguate some things, and having the graph system for explicitly naming relationships between these things is interesting (i.e. broadly saying “I don’t like x” or “I just bought y” or “I’m referring to this kind of z”), but I don’t think it’s useful to try to RDFise everything there is, especially when it’s not structured. Also, it’d make the creation of articles cumbersome. The author would forever be trying to keep up with vocabularies and ontological subtleties: “Do I mean I foaf:know this person?” It raises the barrier to publishing data as RDF by making it over complicated at stages where it’s not necessary to be that structured.

  5. Agreeing with Zach about r4isstatic’s comment that “The content is the ideas & concepts. Text is just the form,”. Even functional views of communication such as speech-act theory view communication as multi-layered “It’s cold in here” may mean just that or “Please close the door”. The point of NOT saying the latter is to carry the message without being to blunt about it. Attempts to make this sort of thing explicit have failed spectacularly (Winograd and Flotre’ Coordinator!). Rhetoric – in a purely semantic world irony too would go the same way as politeness … and just imagining the markup for aesthetic qualities, maybe Shakespeare in RDF? This said in a blog post just before the general election I found I had to tell one commenter that the post was meant to be ironic, so maybe explicit markup would be useful :-/

    • Exactly! Language is hugely complicated and far from—computationally—structured. The contexts are supremely complex and our ability to make sense of the semantics of language is barely understood despite hundreds of millennia of us using it all the time. I don’t think we need to worry about RDFising everything we write. I don’t think we could. I don’t think we should.

      Besides, this might pull attention away from the usefulness of structured data models, and the big wins of using RDF in the first place. This feels a little like the hype curve, with RDF-everything solving every problem we can throw at it.

      RDF is a great idea, and it offers the ability to organise and link to a whole web’s worth of information. I’d love to see tools making use of graph-like logic and for this to be something that helps us more and more. But it’s hardly the right tool for encoding everything we say. It’s more, in this context certainly, like the gesture we make while talking face to face: it’s over there… that one!

      • I think the metadata about documents is still very important to have exposed as RDF. I think that’s what you’re saying by the documents being found from the RDF graph, right?

        Yes! the metadata is still important to have exposed as RDF but only because it allows us to link it to a RDF graph of things – a graph that includes URIs for people, iPads, places and the like.

        I think that limiting RDF to resource discovery hopefully makes the world a little bit more simple, but also allows publishers to link together content using the things people think about.

  6. I’m not sure why OpenID didn’t make it. I mean, it had everything…people simply hated to type their username over and over again and wanted some simple system. I think it was partly because users didn’t trust them…if they could find some way to make sure people knew they were supported by such big companies that would have been a big credibility factor.

    Your thought on RDF are pretty insightful, can’t wait to see whether things like the semantic web will bring more of these concepts to the mainstream population.

    • OpenID might yet make it… but I think people just didn’t understand how it could work! How can entering a URL work and what’s stopping someone else typing it in and accessing my account etc.

  7. Great post Tom … it spurred me on to post something similar myself … although slightly different conclusions :-)

  8. I have a question about the notion that the metadata about the document isn’t important enough to warrent separating out the two resources. Retreiving all the metadata that the author R4isStatic has contributed about Doctor Who seems a definate use case. Being able to make determinations regarding perspectives – ‘this document was written before event foo’ is another. Whats the best approach to take to achieve these ?

    • So I think those are useful use cases and ones that we need to support. But I think the way to do it is link the document to the URI for R4isStatic (author) and the URI for Doctor Who etc. I don’t think you also need to have a NIR for R4isStatic’s document.

  9. chris sizemore,

    I used to think that “resource” was the least understood word in and around these themes (including by me). now I think it’s “document”! “document” is used in the post and comments here in at least 4 different senses, without anyone blinking. “ubiquitous language” alert!

    • The problem with “Resource” is that its yet another example of horrific term overloading. A “Resource” is a physical artifact in a given realm. It was so long before the Web, and the term re. computing originates (I think) from early Apple Macintoshes via Mac OS Classic.

      In Semantic Web lingo, Real World Objects have become darn “Non Information Resources” (yuck!). Likewise, an actual “Resource” is now referred to as an “Information Resource”.

      All of the above comes down to people simply not understanding the URI abstraction which delivers: Name References (via multiple schemes including HTTP) and Address References (URLs also via multiple schemes including HTTP).

      Giving Name References to Real World Objects is essential to producing structured descriptions about them in any medium (Web included) :-)

      Kingsley

    • Chris,

      Re, my “I Think” comment, folklore.org that sheds light on what a “Resource” is [1]. Sadly, Semantic Web parlance has lead to overloaded mangling of this critical term.

      Link:

      1. http://www.folklore.org/StoryView.py?project=Macintosh&story=The_Grand_Unified_Model.txt&sortOrder=Sort%20by%20Date&detail=medium&search=resource

  10. Richard Cyganiak and Leo Sauermann sorta define Document in Cool URIs for the Semantic Web:

    Like everything on the traditional Web, each of the pages mentioned above are Web documents. Every Web document has its own URI. Note that a Web document is not the same as a file: a single Web document can be available in many different formats and languages, and a single file, for example a PHP script, may be responsible for generating a large number of Web documents with different URIs. A Web document is defined as something that has a URI and can return representations (responses in a format such as HTML or JPEG or RDF) of the identified resource in response to HTTP requests. In technical literature, such as Architecture of the World Wide Web, Volume One [AWWW], the term Information Resource is used instead of Web document.

    On the traditional Web, URIs were used primarily for Web documents—to link to them, and to access them in a browser. The notion of resource identity was not so important on the traditional Web, a URL simply identified whatever we see when we type it into a browser.

    Document just seems to be a slightly more palatable way of saying Information Resource. Personally I prefer Resource and Representation from REST. So when people say that it’s important that they can get metadata for a Document I think in my head that they’d like to have metadata for a Representation–which normally come over the wire in HTTP response headers.

  11. Sam Shiles,

    Hi everyone,

    I just wanted to say that this is one of the best discussions on RDF I’ve read. I’ve been somewhat familar with RDF for a while now but have basically had this nagging indecision as to what “level”/”depth” at which to apply assesertions. Should I RDFise the document and make assertions about the high-level topics and included concepts or should I make asserstions on the actual content – the words and sentences contained within that document.
    I know see that this is still somewhat open to debate and interpretation. Thankfully, as a result of this discussion, I now have a much better understanding of some of the arguments surrounding this question.

    Many thanks everyone.

  12. zazi,

    Btw, the comment from Michael is simply based on Tim Berners-Lee’s statement “It’s not the documents — It’s the things” (see http://www.w3.org/DesignIssues/Abstractions.html) ;)

Trackbacks

One Trackback

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,337 other followers

%d bloggers like this: