Some thoughts on working out who to trust online

Some thoughts on working out who to trust online

The deplorable attempts to use social media (and much of the mainstream media’s response) to find the bombers of the Boston marathon and then the tweets coming out of the Social Media Summit in New York got me thinking again about how we might get a better understanding of who and what to trust online.

When it comes to online trust I think there are two related questions we should be asking ourselves as technologists:

  1. can we help people better evaluate the accuracy, trustworthiness or validity of a given news story, tweet, blogpost or other publication?;
  2. and can we use social media to better filter those publications to find the most trustworthy sources or article?

This second point is also relevant in scientific publishing (a thing I’m trying to help out with these days) where there is keen interest in ‘altmetrics‘ as a mechanism to help readers discover and filter research articles.

In academic publishing the need for altmetrics has been driven in part by the rise in the number of articles published which in turn is being fuelled by the uptake of Open Access publishing. However, I would like to think that we could apply similar lessons to mainstream media output.

MEDLINE literature growth chart

Historically a publisher’s brand has, at least in theory, helped its readers to judge the value and trustworthiness of an article. If I see an article published in Nature, the New York Times or broadcast by the BBC the chances are I’m more likely to trust it than an article published in say the Daily Mail.

Academic publishing has even gone so far as to codify this in a journal’s Impact Factor (IF) an idea that Larry Page later used as the basis for his PageRank algorithm.

The premiss behind the Impact Factor is that you can identify the best journals and therefore the best content by measuring the frequency with which the average article in that journal has been cited in a particular year or period.

Simplistically then, a journal can improve their Impact Factor by ensuring they only publish the best research. ‘Good Journals’ can then act as a trusted guides to their readership – pre filtering the world’s research output to bring their readers only the best.

Obviously this can go wrong. Good research is published outside of high impact factor journals, journals can publish poor research; and mainstream media is so rife with examples of published piffle that the likes of Ben Goldacre can make a career out of exposing it.

As is often noted the web has enabled all of us to be publishers. It scarcely needs saying that it is now trivially easy for anyone to broadcast their thoughts or post a video or photograph to the Web.

This means that social media is now able to ‘break’ a story before the mainstream media. However, it also presents a problem: how do you know if it’s true? Without brands (or IF) to help guide you how do you judge if a photo, tweet or blogpost should be trusted?

There are plenty of services out there that aggregating tweets, comments, likes +1s etc. to help you find the most talked about story. Indeed most social media services themselves let you find ‘what’s hot’/ most talked about. All these services seem however to assume that there is wisdom in crowds – that the more talked about something is the more trustworthy it is. But as Oliver Reichenstein pointed out:

There is one thing crowds have a flair for, and it is not wisdom, it’s rage.”

Relying on point data (most tweeted, commented etc.) to help filter content or evaluate its trustworthiness whether that be social media or mainstream media seems to me to be foolish.

It seems to me that a better solution would be to build a ‘trust graph’ which in turn could be used to assign a score to each person for a given topic based on their network of friends and followers. It could work something like this…

If a person is followed by a significant number of people who have published peer reviewed papers on a given topic, or if they have publish in that field, then we should trust what that person says about that topic more than the average person.

Equally if a person has posted a large number of photos, tweets etc. over a long period of time from a given city and they are followed by other people from that city (as defined by someone who has a number of posts, over a period of time from that city) then we might conclude that their photographs are going to be from that city if they say they are.

Or if a person is retweeted by someone that for other reasons you trust (e.g. because you know them) then that might give you more confidence their comments and posts are truthful and accurate.

PageRank is Google's link analysis algorithm, that assigns a numerical weighting to each element of a hyperlinked set of documents, with the purpose of "measuring" its relative importance within the set.

Whatever the specifics the point I’m trying to make is that rather than relying on a single number or count we should try to build a directed graph where each person can be assigned a trust or knowledge score based on the strength of their network in that subject area. This is somewhat analogous to Google’s PageRank algorithm.

Before Google, search engines effectively counted the frequency of a given word on a Webpage to assign it a relevancy score – much as we do today when we count the number of comments, tweets etc. to help filter content.

What Larry Page realised was that by assigning a score based on the number and weight of inbound links for a given keyword he and Sergey Brin where able to design and build a much better search engine – one that relies not just on what the publisher tells us, nor simply on the number of links but on the quality of those links. A link from a trusted source is worth more than a link from an average webpage.

Building a trust graph along similar lines – where we evaluate not just the frequency of (re)tweets, comments, likes and blogposts but also consider who those people are, who’s in their network and what their network of followers think of them – could help us filter and evaluate content whether it be social or mainstream media and minimise the damage of those who don’t tweet responsibly.

Interesting stuff from around the web 2009-04-22

Amazing render job by Alessandro Prodan
Amazing render job by Alessandro Prodan

The open web

Does OpenID need to be hard? [factoryjoe.com]
Chris considers “the big fat stinking elephant in the room: OpenID usability and the paradox of choice” as usual it’s a good read.

I wonder whether restricting the OpenID providers displayed based on visited link would help? i.e. hide those that haven’t been visited? It clearly wouldn’t be perfect – Google isn’t my OpenID provider but I visit google.com lots, but it should cut down some of the clutter.

Security flaw leads Twitter, others to pull OAuth support [cnet.com]
The hole makes it possible for a hacker to use social-engineering tactics to trick users into exposing their data. The OAuth protocol itself requires tweaking to remove the vulnerability, and a source close to OAuth’s development team said that there have been no known violations, that it has been aware of it for a few days now, and has been coordinating responses with vendors. A solution should be announced soon.

Twitter and social networks

Relationship Symmetry in Social Networks: Why Facebook will go Fully Asymmetric [bokardo.com]
Asymmetric model better mimics how real attention works…and how it has always worked. Any person using Twitter can have a larger number of followers than followees, effectively giving them more attention than they give. This attention inequality is the foundation of the Twitter service… The IA of Facebook does not allow this. Facebook has designed a service that forces you to keep track of your friends, whether you want to or not. Facebook is modeling personal relationships, not relationships based on attention. That’s the crucial difference between Facebook and Twitter at the moment.

When Twitter Gets Weird… [Dave Gorman]
“The difference between following someone and replying to them is the difference between stopping to chat with someone in the street or giving them a badge declaring that you know them. One is actual interaction. The other is just something you can show your friends.” Blimey – Dave Gorman clearly has a much better grasp of life, the web and being a human than the two people who attacked him for not following them on Twitter. As Dave points out he hopes that Twiiter doesn’t descend into the MySpace “thanks for the add’ nonsense”. Me too.

Google profiles included in search results [googleblog]
A new “Profile results” section will appear at the bottom of a Google search page, when it finds a strong match in response to a name-based search. But only in the US. To help things along remember to use rel=me elsewhere (here’s how).

Shortlisted for a BAFTA, launch of clickable tracklistings and the start of BBC Earth

Look, look clickable tracklistings, w00t!
Few will every know the pain to get this useful little (cross domain) feature live.

We’ve been shortlisted for an Interactive Innovation BAFTA
The /programmes aka Automated Programme Support project. So proud.

Out of the Wild [bbc.co.uk]
Our first tentative steps towards improving the BBC’s online natural history offering. Out of The Wild seeks to bring you stories from BBC crews on location. Eventually this should all form part of an integrated programme offer.

Stuff

Biological Taxonomy Vocabulary
An RDF vocabulary for the taxonomy of all forms of life.

On url shorteners [joshua.schachter.org]
Joshua Schachter considers the issues associated with URL shortening. Similar argument to the one I put forward in “The URL shortening antipattern” but with some useful recommendations: “One important conclusion is that services providing transit (or at least require a shortening service) should at least log all redirects, in case the shortening services disappear. If the data is as important as everyone seems to think, they should own it. And websites that generate very long URLs, such as map sites, could provide their own shortening services. Or, better yet, take steps to keep the URLs from growing monstrous in the first place.”

Identity, relationships and why OAuth and OpenID matter

Twitter hasn’t had a good start to 2009, it was hacked via a phishing scam and then there were concerns that your passwords were up for sale and that’s not a good thing; except there may be a silver lining to Twitter’s cloud because it has also reopened the password anti-pattern debate and the use of OAuth as a solution to the problem. Indeed it does now looks like Twitter will be implementing OAuth as a result. W00t!

touch by Meredith Farmer (Flickr). Some rights reserved.
Day 68 :: touch by Meredith Farmer (Flickr). Some rights reserved.

However, while it is great news that Twitter will be implementing OAuth soon, they haven’t yet and there are plenty of other services that don’t use it, it’s therefore worth pausing for a moment to consider how we’ve got here and what the issues are, because while it will be great — right now — it’s a bit rubbish.

We shouldn’t assume that either Twitter or the developers responsible for the third-party apps (those requesting your credentials) are trying to do anything malicious — far from it — as Chris Messina explains:

The difference between run-of-the-mill phishing and password anti-pattern cases is intent. Most third parties implement the anti-pattern out of necessity, in order to provide an enhanced service. The vast majority don’t do it to be malicious or because they intend to abuse their customers — quite the contrary! However, by accepting and storing customer credentials, these third parties are putting themselves in a potentially untenable situation: servers get hacked, data leaks and sometimes companies — along with their assets — are sold off with untold consequences for the integrity — or safety — of the original customer data.

The folks at Twitter are very aware of the risks associated with their users giving out usernames and passwords. But they also have concerns about the fix:

The downside is that OAuth suffers from many of the frustrating user experience issues and phishing scenarios that OpenID does. The workflow of opening an application, being bounced to your browser, having to login to twitter.com, approving the application, and then bouncing back is going to be lost on many novice users, or used as a means to phish them. Hopefully in time users will be educated, particularly as OAuth becomes the standard way to do API authentication.

Another downside is that OAuth is a hassle for developers. BasicAuth couldn’t be simpler (heck, it’s got “basic” in the name). OAuth requires a new set of tools. Those tools are currently semi-mature, but again, with time I’m confident they’ll improve. In the meantime, OAuth will greatly increase the barrier to entry for the Twitter API, something I’m not thrilled about.

Alex also points out that OAuth isn’t a magic bullet.

It also doesn’t change the fact that someone could sell OAuth tokens, although OAuth makes it easier to revoke credentials for a single application or site, rather than changing your password, which revokes credentials to all applications.

This doesn’t even begin to address the phishing threat that OAuth encourages – its own “anti-pattern”. Anyone confused about this would do well to read Lachlan Hardy’s blog post about this from earlier in 2008: http://log.lachstock.com.au/past/2008/4/1/phishing -fools/.

All these are valid points — and Ben Ward has written an excellent post discussing the UX issues and options associated with OAuth — but it also misses something very important. You can’t store someone’s identity without having a relationship.

Digital identities exist to enable human experiences online and if you store someone’s Identity you have a relationship. So when you force third party apps into collecting usernames, passwords (and any other snippet of someone’s Identity) it forces those users into having a relationship with that company — whether the individual or the company wants it. If you store someones identity you have a relationship with them. 

With technology we tend not to enable trust in the way most people use the term. Trust is based on relationships. In close relationships we make frequent, accurate observations that lead to a better understanding and close relationships, this process however, requires investment and commitment. That said a useful, good relationship provides value for all parties. Jamie Lewis has suggested that there are three types of relationship (on the web):

  1. Custodial Identities — identities are directly maintained by an organisation and a person has a direct relationship with the organisation;
  2. Contextual Identities — third parties are allowed to use some parts of an identity for certain purposes;
  3. Transactional Identities — credentials are passed for a limited time for a specific purpose to a third party.

Of course there are also some parts to identity which are shared and not wholly owned by any one party.

This mirrors how real world identities work. Our banks, employers and governments maintain custodial identities; whereas a pub, validating your age before serving alcohol need only have the yes/no question answered — are you over 18?

Twitter acts as a custodian for part of my online identity and I don’t want third party applications that use the Twitter API to also act as custodians but the lack of OAuth support means that whether I or they like it they have to. They should only have my transactional identity. Forcing them to hold a custodial identity places both parties (me and the service using the Twitter API) at risk and places unnecessary costs on the third party service (whether they realise it or not!).

But, if I’m honest, I don’t really want Twitter to act as Custodian for my Identity either — I would rather they held my Contextual Identity and my OpenID provider provided the Custodial Identity. That way I can pick a provider I trust to provide a secure identity service and then authorise Twitter to use part of my identity for a specific purpose, in this case micro-blogging. Services using the Twitter API then either use a transactional identity or reuse the contextual identity. I can then control my online identity, those organisations that have invested in appropriate security can provide Custodial Identity services and an ecosystem of services can be built on top of that.

UPDATE

Just wanted to correct a couple of mistakes, as pointed out by Chris, below:

1. Twitter was hacked with a dictionary attack against an admin’s account. Not from phishing, and not from a third-party’s database with Twitter credentials.
2. The phishing scam worked because it tricked people into thinking that they received a real email from Twitter.

Neither OpenID nor OAuth would have prevented this (although that not to say Twitter shouldn’t implement OAuth). Sorry about that.

Online communities are about people stupid

Flickr, Twitter and Facebook all work because they are primarily about people. Photos, status updates, messages and comments are all secondary, they are the social glue that help make the community work. And if you doubt me then consider this – Heather Powazek Champ, the Director of Community at Flickr has reported that:

People have fallen in love on Flickr. Some have proposed over Flickr. It’s just a delightful thing for so many people, and I get to spend my days with them.

Liverpool Street station crowd blur. By David Sims, some rights reserved.
Liverpool Street station crowd blur. By David Sims, some rights reserved.

Flickr is about the social nature of photography. Strangers meet online to comment on each others’ photography, form and join groups based on common interests and share photos that document and categorize the visible world. Likewise Twitter isn’t simply a stream of the world’s consciousness, it’s a semi-overlapping stream of activity – some public, some private and some semi-public.

It seems to me that it is the semi-public, semi-overlapping aspects that make services like Flickr and Twitter work so well because they help reinforce the social. Consider the alternative: YouTube for all it’s success as a video uploading and publishing service it is a mess when it comes to its community. In fact there’s no community, there are just banal comments which often don’t get much better than “LOL”.

Flickr on the other hand doesn’t try to be an all purpose photo publishing service, it’s a photo-sharing service primarily aimed at sharing photos with your friends, family and others with a common interest. That’s not to say that there isn’t also a public sharing aspect to Flickr; indeed most of the photos on this blog (including the one used in this post) are from Flickr, and in the main, from people I don’t know. There is a public aspect to Flickr, just as there is a public aspect to Twitter, but these aren’t the primary use cases. The primary use cases are those associated with the semi-public: finding and connecting to friends; sharing photos, ideas and your thoughts with friends, that sort of thing.

The semi-public nature of these services also means that the community can, and does, develop and enforce community rules. With Flickr these are site-wide rules, as Heather Powazek Champ puts it:

“We don’t need to be the photo-sharing site for all people. We don’t need to take all comers. It’s important to me that Flickr was built on certain principles.” And so they’re defended — and evaluated — constantly.

With Twitter the rules are more personal, more contextual and as a result so are the communities. You get to choose who you follow and only those people are then part of your timeline. If you don’t follow someone then you won’t be bothered with their updates (and they can’t direct message you).

This shouldn’t be surprising since this is pretty much what happens in the real world. You have networks of friends whose conversations overlap, and whose conversations are sometimes held in private and sometimes semi-public.

So what’s all this mean? Well for one thing it means that unless you want banal comments and no real community you need to build people into your service as primary objects, rather than treating their comments, content and stuff as primary objects. You also need to work out how to allow semi-overlapping activity streams. It also probably means that you shouldn’t design for ‘user generated content’ since this will tend to make you think about the user’s content rather than the people and their community.

Media companies should embrace the generative nature of the web

Generativity, the ability to remix different pieces of the web or deploy new code without gatekeepers (so that anyone can repurpose, remix or reuse the original content or service for a different purpose) is going to be at the heart of successful media companies.

Depth of field (Per Foreby)

As Jonathan Zittrain points out in The Future of the Internet (and how to stop it) the web’s success is largely because it is a generative platform.

The Internet is also a generative system to its very core as is each and every layer built upon this core. This means that anyone can build upon the work of those that went before them – this is why the Internet architecture, to this day, is still delivering decentralized innovation.

This is true at a technological level, for example, XMPP, OAuth and OpenID are all technologies that have been invented because the technology layers upon which they are built are open, adaptable and easy for others to reuse and master. It is also true at the content level – Wikipedia is only possible because it is built as a true web citizen, likewise blogging platforms and services such as MusicBrainz – these services allow anyone to create or modify content without the need for strict rules and controls.

But what has this got to do with the success or otherwise of any media company or any content publisher? After all just because the underlying technology stack is generative doesn’t mean that what you build must be generative. There are, after all, plenty of successful walled gardens and tethered appliances out there. The answer, in part, depends on what you believe the future of the Web will look like.

Tim Berners-Lee presents a pretty compelling view in his article on The Giant Global Graph. In it he explains how the evolution of the Internet has seen a move from a network of computers, through the Internet, to a  web of documents and we are now seeing a migration to a ‘web of concepts’.

[The Internet] made life simpler and more powerful. It made it simpler because of having to navigate phone lines from one computer to the next, you could write programs as though the net were just one big cloud, where messages went in at your computer and came out at the destination one. The realization was, “It isn’t the cables, it is the computers which are interesting”. The Net was designed to allow the computers to be seen without having
to see the cables. […]

The WWW increases the power we have as users again. The realization was “It isn’t the computers, but the documents which are interesting”. Now you could browse around a sea of documents without having to worry about which computer they were stored on. Simpler, more powerful. Obvious, really. […]

Now, people are making another mental move. There is realization now, “It’s not the documents, it is the things they are about which are important”. Obvious, really.

If you believe this, if you believe that there is a move from a web of documents to concepts, then you can start to see why media companies will need to start to publish data the right way. Publishing it so that they, and others, can help people find the things they are interested in. How does this happen then? For starters we need a mechanism by which we can identify things and identify the relationship between them – at a level above that of the document. And that’s just what the semantic web technologies are for – they allow different organisations a common way of describing the relationship between things. For example, the Programmes Ontology allows any media company to describe the nature of a programme; the music ontology any artist, release or label.

This implies a couple of different, but related things, firstly it highlights the importance of links. Links are an expression of a person’s interests. I choose what to link to from this blog – which words, which subjects to link from and where to – my choice of links provide you with a view onto how I view the subject beyond what I write here. The links give you insight into who I trust and what I read. And of course it allows others to aggregate my content around those subjects.

It also implies that we need a common way of doing things. A way of doing things that allows others to build with, on top of, the original publishers content. This isn’t about giving up your rights over your content, rather it is about letting it be connected to content from peer sites. It is about joining contextually relevant information from other sites, other applications. As Tim Berners-Lee points out this is similar to the transition we had to make in going from interconnected computers to the Web.

People running Internet systems had to let their computer be used for forwarding other people’s packets, and connecting new applications they had no control over. People making web sites sometimes tried to legally prevent others from linking into the site, as they wanted complete control of the user experience, and they would not link out as they did not want people to escape. Until after a few months they realized how the web works. And the re-use kicked in. And the payoff started blowing people’s minds.

Because the Internet is a generative system it means it has a different philosophy from most other data discovery systems and APIs (including some that are built with Internet technologies), as Ed Summers explains:

…which all differ in their implementation details and require you to digest their API documentation before you can do anything useful. Contrast this with the Web of Data which uses the ubiquitous technologies of URIs and HTTP plus the secret sauce of the RDF triple.

They also often require the owner of the service or API to give permission for third parties to use those services, often mediated via API keys. This is bad, had the Web or the Internet before that adopted a similar approach, rather than the generative approach it did take, we would not have seen the level of innovation we have; and as a result we would not have had the financial, social and political benefits we have derived from it.

Of course there are plenty of examples of where people have been able to work with the web of documents – everything from 800lb gorilla’s like Google through to sites like After Our Time and Speechification – both provide users with a new and distinctive service while also helping to drive traffic and raise brand awareness to the BBC. Just think what would also be possible if transcripts, permanent audio, and research notes where also made available not only as HTML but also as RDF joining content inside and outside the BBC to create a system which, in Zittrain words, provides “a system’s capacity to produce unanticipated change through unfiltered contributions from broad and varied audiences.”

Interesting stuff from around the web 2008-09-21

Eadward Muybridge’s 1878 investigation into whether horses’ feet were actually all off the ground at once during a trot.
Eadward Muybridge’s 1878 investigation into whether horses’ feet were actually all off the ground at once during a trot.

Born To Run – Human Evolution [DISCOVER Magazine]
Biomechanical research reveals a surprising key to the survival of our species: Humans are built to outrun nearly every other animal on the planet over long distances.

Prisoner’s Dilemma Visualisation [James Alliban]
Nice visualisation of the Prisoner’s Dilemma (a classic example of game theory) using Processing.

More Google news

Google Visualization API [Google Code]
The Google Visualization API lets you access multiple sources of structured data that you can display, choosing from a large selection of visualizations.

GAudi – Google’s new audio index [Official Google Blog]
It’s currently in Google Labs and is restricted to content from political sources but it still looks interesting. In addition to being able to search for terms you can also jump directly to the point in the video where the keyword is mentioned.

The social web: All about the small stuff [Official Google Blog]
The promise of the social web is about making it easy to share the small stuff – to make it effortless and rebuild that feeling of connectedness that comes from knowing the details.

More background on Matt’s hack: streaming content to iTunes

Things to do with /programmes #431: iTunes! [BBC Radio Labs]
Matt’s write up of his work on streaming iPlayer content through iTunes.

Very surprised the blog sphere hasn’t picked up on the implications of this hack but there you go.

Formally modelling a trust network – a sign of hubris?

“Interactivity. Many-to-many communications. Pervasive networking. These are cumbersome new terms for elements in our lives so fundamental that, before we lost them, we didn’t even know to have names for them.” Clever man Douglas Adams. You see he wrote this in 1999 and clearly understood, even then, the nature of the web so much better than most of us do even today.

"Camp fire on the beach" by joaquimb. Used under license.
Camp fire on the beach, by joaquimb. Used under license.

As Douglas Adams points out the Internet is still novel – it’s very easy to forget that despite it’s incredible uptake the world has only had the Web since 1991. That’s really not very long. We are still getting use to it, still working out how to use it. But back in 1999 Douglas Adams clearly understood that one thing you shouldn’t be trying to do is model human trust and that’s because our brains do the job so much better.

Working out the social politics of who you can trust and why is, quite literally, what a very large part of our brain has evolved to do.

Although the Internet is a new technology, it is in many ways a return to a more traditional form of entertainment. The sit back and consume world of 20th century entertainment is the abnormality. TV, radio and the cinema are the aberrations because they aren’t interactive – all other forms of entertainment up until the early 20th century (and an increasing amount of entertainment since) are ‘interactive’ its just that we didn’t call them interactive entertainment because that would be silly – “a game of interactive cricket anyone?”

Unfortunately we currently looking at the Internet from the perspective of the non-interactive entertainment world of TV and radio. And that perspective isn’t helpful, as Adams puts it:

Newsreaders still feel it is worth a special and rather worrying mention if, for instance, a crime was planned by people ‘over the Internet’. They don’t bother to mention when criminals use the telephone or the M4, or discuss their dastardly plans ‘over a cup of tea,’ though each of these was new and controversial in their day.

Possibly because people see interactive entertainment as new and different they believe that they therefore need to build policies and models to express human trust into their web apps. The trouble is it just isn’t necessary – worse it doesn’t work. Our brains are great at working out who and what to trust – you just need to expose enough information so we can make the decisions. On the other hand it seems to me that attempts to formally model a trust network is a sign of hubris.

Of course you can’t trust what people tell you on the web anymore than you can trust what people tell you on megaphones, postcards or in restaurants… For some batty reason we turn off this natural scepticism when we see things in any medium which require a lot of work or resources to work in, or in which we can’t easily answer back like newspapers, television or granite. Hence “carved in stone.” What should concern us is not that we can’t take what we read on the internet on trust of course you can’t, it’s just people talking but that we ever got into the dangerous habit of believing what we read in the newspapers or saw on the TV – a mistake that no one who has met an actual journalist would ever make. One of the most important things you learn from the internet is that there is no “them” out there. It’s just an awful lot of “us”.

What you need then is not a model of trust instead you need a mechanism to answer back. You actually need a bit more than that – you need a mechanism to identify a person online – ideally wherever they appear on the web – via OpenID and FOAF for example. You also want to know who their friends are, or more specifically who claims to be friends with them. So for example, if I can see that someone is a friend of a friend I’m more likely to trust them than if neither I, nor my friends, have a connection with that person.

I also want to be able to read what they say and do online. If I can read their blog, look at their comments, check out their del.icio.us feed or twitter stream etc. then all the better. And since we are talking about online social networks this shouldn’t be too unreasonable.

Our brains are very good at processing this kind of social relationship information so we can assess whether or not we should trust a person, or more importantly to assess when and in which context to trust a person. I would trust Nick’s advice on say how to build my own home brew radio (in a lunch box) but not which pet to buy.

I remember Dan talking about the social graph and saying how he felt uncomfortable about the way XFN encouraged you to assert the nature of the relationship: “nope you’re not my ‘friend’ you’re an ‘acquaintance’ or ‘co-worker’ etc.” Which is why FOAF just has ‘friends’. This might be just because Dan is a nice bloke but I have to agree it is just a bit weird categorising the nature of your relationships the XFN way. But more pragmatically it’s also just not that helpful to model this information. All you really need is a mechanism to assert that there is a relationship and a URI to identify the person; you can then go and dereference the resource to work out whether you should trust that person or not for a given context.