Digital soup – separating information storage, retrieval and presentation

I’ve written before about some of the issues related to the two copy file systems, however, recently I’ve read a couple of articles by Nick Santilli over at Lifehacker and The Apple Blog highlighting a further issue:

“I’m fairly certain your hard drive is teeming with files, most likely in your Documents folder, just like mine. I’ll venture further, to guess that within that Documents folder you’ve got several more folders – maybe a ‘Letters’ folder for all your correspondences, a ‘txt’ folder for all your txt notes, a ‘work’ folder for business stuff, and so on and so forth. Maybe those folders have more nested within them…you get the idea. You’ve got the typical folder hierarchy that every semi-organized computer user has had since the computer shipped with a hard drive larger than a megabyte.

So what happens when you’ve got a letter that you wrote for a business proposal? Do you file it in your ‘Letters’ folder, or your ‘work’ folder? Or maybe a ‘Letters’ folder within your ‘work’ folder… How do you remember which place you put it in?”

Nick goes on to explain how to use metadata in conjunction with Spotlight, or preferably Quicksilver, to more effectively manage your documents. In essence what Nick is highlighting is the same issue Alan Cooper addressed when he introduced the idea of digital soup:

Users don’t give a hoot what the storage system is. All they care about is how the retrieval system works. Like a coat check at the theater, we don’t care much about which organizational system they use, as long as, after the show, when we hand them our token, they give us back the right coat. When we search for that elusive bit of e-mail, we don’t care how the system stores it away, as long as the process of finding it and bringing it back to us is successful.

… a system that – like a library separated the storage and retrieval… It would use indices for retrieval, and use, well, whatever, for the storage system. We can imagine the storage facility as a sort of digital soup where we could put our records. This soup would accept any record we dumped into it, regardless of its size, length, type or contents. Whenever a record was entered, the program would return a magic cookie; a token that can be used to retrieve the record (those of you with a culinary bent who take exception to mixing sweet with savory may prefer to think of it as a magic crouton for your digital soup). All we have to do is give it back that magic cookie, and the soup will instantly return our record. This is just our storage system, however, and we still need a retrieval system that manages all of those magic cookies for us.”

Both file system and database retrieval tools are based on their storage technology despite the fact that there is no need for this constraint – as demonstrated by Nick Santilli and elsewhere by the Sugar UI team.

I believe the same argument is just as pertinent to a lot of web design. A resource should be able to exist at multiple points within a hierarchy (i.e. a polyhierarchical taxonomy [pdf]) – much like the letter, about the business proposal, in Nick’s example above – so that users can browse by meaning to find the information they need. However, because of existing paradigms namely conflating retrieval and storage into the idea of a ‘Page’, websites are too often designed as mono-hierarchical structures where a resource only ‘lives’ in one location and where there is only way to navigate to it.

Now I’m not suggesting a web page should have multiple (canonical) URLs – but I am suggesting that a user should be able to navigate to a resource by more than one route and that, that resources should be reused within multiple views – if appropriate. This of course requires a shift such that a ‘page’ is seen as a ‘view’ across multiple resources each of which can be reused; rather than a hierarchy of published webpages. In other words we need to separate the presentation from the retrieval.

Below I have tried to outline a model for how information storage, retrieval and presentation could work.

Digital Soup IA model

The model considers five elements or layers, namely:

Resources – potentially useful information objects existing in various formats and structures, in known but perhaps disparate locations. The individual information objects may come in a variety of formats, e.g. PDF, Word, Quark, Excel, ASCII, and structures, e.g. different XML schema, document structures, table formats.

This layer is the ‘digital soup‘ and is only about storage.

Model – the agreed domain model or schema for describing the resources, this might take the form of a controlled vocabulary, Topic Map, ontology or indexing strategy. In essence this is a description of how the resources will be described.

Classification – the model is used to interpret, relate, contextualise, classify, metatag, or otherwise assign meaning to the resources in a machine-readable and human understandable way so that the resources are findable and readable centrally using a common interpretative function.

The model and classification layers are about retrieval – they ‘know’ nothing about how the resources are stored only how to classify them and how to retrieve them.

Architecture – because a common language now exists to describe the resources they can be configured into (multiple) frames that address the information needs of a user – and as such the configuration is goal-directed.

Interaction – the interface the user interacts with to search and browse for information.

The separation of presentation from both storage and retrieval allows the information to be easily re cut and presented in different formats or in different contexts and therefore allows the user to navigate to the information in the way they understand the problem. This is analogous to the library system Alan Cooper discusses in his article:

When the library adds a book, it is given a unique, identifying number. The book is placed on a shelf in sequence based on that number. That is the storage system (True, the Dewey Decimal number has some inherent meaning, and it results in placing the book near related topics*, but for our sake it could be nothing fancier than a plain, serial number). The retrieval system, however, is not only more complex, but it is completely separate from the shelf-based storage system, and it uses a radically different technology. Index cards are typed up that represent the book in three indices: Author, Title and Subject. One author, one title, but there could be several subject entries. The cards are then inserted into their respective indices-little wooden drawers-in alphabetical order. The book’s serial number is recorded on every index card, representing the connection between the retrieval system and the storage system.

When you walk into the library seeking a book, you don’t have to remember its identifying number nor do you have to remember the shelf it was stored on. If it weren’t for the separate retrieval system, you’d have to remember at least one of these to find it, just like in a database system. Instead, in the library, you merely look up the title, author or subject in the card catalog, then follow the pointer to the book on the shelf. The retrieval values are stored external to the stored object. There does not have to be a separate card catalog for paperbacks or videotapes, nor does the lookup key in the card catalog have to be present verbatim on the pages of the stored book.”

* Note: The architecture layer in the model above is analogous to the instantiation of the Dewey Decimal system which results in books about related topics being placed near to each other.

Digital soup – separating information storage, retrieval and presentation

Rate this:

Share this: