Some thoughts on working out who to trust online

Some thoughts on working out who to trust online

The deplorable attempts to use social media (and much of the mainstream media’s response) to find the bombers of the Boston marathon and then the tweets coming out of the Social Media Summit in New York got me thinking again about how we might get a better understanding of who and what to trust online.

When it comes to online trust I think there are two related questions we should be asking ourselves as technologists:

  1. can we help people better evaluate the accuracy, trustworthiness or validity of a given news story, tweet, blogpost or other publication?;
  2. and can we use social media to better filter those publications to find the most trustworthy sources or article?

This second point is also relevant in scientific publishing (a thing I’m trying to help out with these days) where there is keen interest in ‘altmetrics‘ as a mechanism to help readers discover and filter research articles.

In academic publishing the need for altmetrics has been driven in part by the rise in the number of articles published which in turn is being fuelled by the uptake of Open Access publishing. However, I would like to think that we could apply similar lessons to mainstream media output.

MEDLINE literature growth chart

Historically a publisher’s brand has, at least in theory, helped its readers to judge the value and trustworthiness of an article. If I see an article published in Nature, the New York Times or broadcast by the BBC the chances are I’m more likely to trust it than an article published in say the Daily Mail.

Academic publishing has even gone so far as to codify this in a journal’s Impact Factor (IF) an idea that Larry Page later used as the basis for his PageRank algorithm.

The premiss behind the Impact Factor is that you can identify the best journals and therefore the best content by measuring the frequency with which the average article in that journal has been cited in a particular year or period.

Simplistically then, a journal can improve their Impact Factor by ensuring they only publish the best research. ‘Good Journals’ can then act as a trusted guides to their readership – pre filtering the world’s research output to bring their readers only the best.

Obviously this can go wrong. Good research is published outside of high impact factor journals, journals can publish poor research; and mainstream media is so rife with examples of published piffle that the likes of Ben Goldacre can make a career out of exposing it.

As is often noted the web has enabled all of us to be publishers. It scarcely needs saying that it is now trivially easy for anyone to broadcast their thoughts or post a video or photograph to the Web.

This means that social media is now able to ‘break’ a story before the mainstream media. However, it also presents a problem: how do you know if it’s true? Without brands (or IF) to help guide you how do you judge if a photo, tweet or blogpost should be trusted?

There are plenty of services out there that aggregating tweets, comments, likes +1s etc. to help you find the most talked about story. Indeed most social media services themselves let you find ‘what’s hot’/ most talked about. All these services seem however to assume that there is wisdom in crowds – that the more talked about something is the more trustworthy it is. But as Oliver Reichenstein pointed out:

There is one thing crowds have a flair for, and it is not wisdom, it’s rage.”

Relying on point data (most tweeted, commented etc.) to help filter content or evaluate its trustworthiness whether that be social media or mainstream media seems to me to be foolish.

It seems to me that a better solution would be to build a ‘trust graph’ which in turn could be used to assign a score to each person for a given topic based on their network of friends and followers. It could work something like this…

If a person is followed by a significant number of people who have published peer reviewed papers on a given topic, or if they have publish in that field, then we should trust what that person says about that topic more than the average person.

Equally if a person has posted a large number of photos, tweets etc. over a long period of time from a given city and they are followed by other people from that city (as defined by someone who has a number of posts, over a period of time from that city) then we might conclude that their photographs are going to be from that city if they say they are.

Or if a person is retweeted by someone that for other reasons you trust (e.g. because you know them) then that might give you more confidence their comments and posts are truthful and accurate.

PageRank is Google's link analysis algorithm, that assigns a numerical weighting to each element of a hyperlinked set of documents, with the purpose of "measuring" its relative importance within the set.

Whatever the specifics the point I’m trying to make is that rather than relying on a single number or count we should try to build a directed graph where each person can be assigned a trust or knowledge score based on the strength of their network in that subject area. This is somewhat analogous to Google’s PageRank algorithm.

Before Google, search engines effectively counted the frequency of a given word on a Webpage to assign it a relevancy score – much as we do today when we count the number of comments, tweets etc. to help filter content.

What Larry Page realised was that by assigning a score based on the number and weight of inbound links for a given keyword he and Sergey Brin where able to design and build a much better search engine – one that relies not just on what the publisher tells us, nor simply on the number of links but on the quality of those links. A link from a trusted source is worth more than a link from an average webpage.

Building a trust graph along similar lines – where we evaluate not just the frequency of (re)tweets, comments, likes and blogposts but also consider who those people are, who’s in their network and what their network of followers think of them – could help us filter and evaluate content whether it be social or mainstream media and minimise the damage of those who don’t tweet responsibly.

UGC its rude, its wrong and it misses the point

Despite recent reports that blogging is dead traditional media companies are still rushing to embrace UGC – User Generated Content – and in many ways that’s great. Except User Generated Content is the wrong framing and so risks failing to deliver the benefits it might. I also find it a rather rude term.

Graffiti

Newspapers and media companies are all trying to embrace UGC — they are blogging and letting folk comment on some of their articles — and if Adam Dooley of snoo.ws is right with good reason, he suggests that UGC might be saving the newspapers.

I don’t think it’s coincidental that this [growth in] popularity has come since many papers have embraced both the Internet’s immediacy (real time news is the thing) and its ability to foster debate and discussion with readers. It’s also come since major papers such as the New York Times have taken the locks off their content making most or all of it free online.

But depressingly UGC is also seen by some as no more than a way to get content on the cheap from a bunch of mindless amateurs, geeks and attention seekers. This view and indeed the very term itself helps to create a dichotomy between professional journalists and the like on one side and everybody else on the other. As Scott Karp points out:

There is a revolution in media because people who create blogs and MySpace pages ARE publishers, and more importantly, they are now on equal footing with the “big,” “traditional” publishers. There has been a leveling of the playing field that renders largely meaningless the distinction between “users” and “publishers” — we’re all publishers now, and we’re all competing for the finite pie of attention. The problem is that the discourse on trends in online media still clings to the language of “us” and “them,” when it is all about the breakdown of that distinction.

Sure most bloggers don’t have the audience of the online newspapers and media companies and there are plenty of people who, as the New Scientist article points out, are simply attention seekers. But that still doesn’t make them ‘users’ and nor does it mean that they’re ‘generating content’ anymore than any other publisher – indeed one might argue that they are less ‘content generators’ than professional journalists. As I sit here writing this post am I a user? If I am I have no idea what I’m using other than WordPress, and if I am then so must journalists be users of their CMS. I know one thing for sure, I don’t think of myself as a user of someone’s site and I don’t create content for them. I suspect most people are the same.

Bloggers, those that contribute to Wikipedia, or otherwise publish content on the Web are amateur publishers — in the same way that amateur sportsmen and women are amateur athletes, whatever their ability — until they give up their day job. But that doesn’t necessarily make them any less knowledgeable about the subject they are writing about. Indeed an ‘amateur publisher’ might well know much more about the subject they are writing about than a professional journalist because they have direct person experience of their subject matter. Whether that be a technical blog by someone who helps make the technology, a news story written on Wikinews or BreakingNewsOn by someone that was there and experienced the events being written about, or even the man that invented the Web. Are any of these people doing UGC? I don’t know what they think – but I know that when I write for this blog, or upload a photo to Flickr – I don’t think I’m generating user content, I’m not doing UGC.

It seems to me that newspapers and media companies need to work to understand how amateur publishers and others can contribute. Not that that is easy — the best bloggers know their subject inside-out, more so than any professional journalist — but equally there is plenty of drivel out there, in both the amateur and professional spheres. For sure there are dreadful blogs, YouTube is full of inane video and fatuous comments but equally partisan news outlets like Fox News, the Daily Mail present biased, misleading and often downright inaccurate reporting. In the week of the US Presidential Elections it is worth considering whether Barack Obama’s use of the Internet — including the role of amateur publishers, UGC if you like — helped dull the effect of such biased news reporting which has historically had a significant role.

The trick then is to find the best content, whoever has written it, and bring it to the fore for people to read and debate. To understand what it is about the Web that makes it an effective communication medium and to harness that in whatever way that that makes sense for each context. Considering the Web in the same patronising fashion as the Culture and Media Secretary Andy Burnham does, that is as “…an excellent source of casual opinion” fails to recognise the value that debate and discussion can bring to a subject.