Category Linked Data

Some thoughts on working out who to trust online

The deplorable attempts to use social media (and much of the mainstream media’s response) to find the bombers of the Boston marathon and then the tweets coming out of the Social Media Summit in New York got me thinking again about how we might get a better understanding of who and what to trust online.

Trust by Christian Scheja. some Rights Reserved.

Trust by Christian Scheja. Some Rights Reserved.

When it comes to online trust I think there are two related questions we should be asking ourselves as technologists:

  1. can we help people better evaluate the accuracy, trustworthiness or validity of a given news story, tweet, blogpost or other publication?;
  2. and can we use social media to better filter those publications to find the most trustworthy sources or article?

This second point is also relevant in scientific publishing (a thing I’m trying to help out with these days) where there is keen interest in ‘altmetrics‘ as a mechanism to help readers discover and filter research articles.

In academic publishing the need for altmetrics has been driven in part by the rise in the number of articles published which in turn is being fuelled by the uptake of Open Access publishing. However, I would like to think that we could apply similar lessons to mainstream media output.

MEDLINE literature growth chart

Historically a publisher’s brand has, at least in theory, helped its readers to judge the value and trustworthiness of an article. If I see an article published in Nature, the New York Times or broadcast by the BBC the chances are I’m more likely to trust it than an article published in say the Daily Mail.

Academic publishing has even gone so far as to codify this in a journal’s Impact Factor (IF) an idea that Larry Page later used as the basis for his PageRank algorithm.

The premiss behind the Impact Factor is that you can identify the best journals and therefore the best content by measuring the frequency with which the average article in that journal has been cited in a particular year or period.

Simplistically then, a journal can improve their Impact Factor by ensuring they only publish the best research. ‘Good Journals’ can then act as a trusted guides to their readership – pre filtering the world’s research output to bring their readers only the best.

Obviously this can go wrong. Good research is published outside of high impact factor journals, journals can publish poor research; and mainstream media is so rife with examples of published piffle that the likes of Ben Goldacre can make a career out of exposing it.

As is often noted the web has enabled all of us to be publishers. It scarcely needs saying that it is now trivially easy for anyone to broadcast their thoughts or post a video or photograph to the Web.

This means that social media is now able to ‘break’ a story before the mainstream media. However, it also presents a problem: how do you know if it’s true? Without brands (or IF) to help guide you how do you judge if a photo, tweet or blogpost should be trusted?

There are plenty of services out there that aggregating tweets, comments, likes +1s etc. to help you find the most talked about story. Indeed most social media services themselves let you find ‘what’s hot’/ most talked about. All these services seem however to assume that there is wisdom in crowds – that the more talked about something is the more trustworthy it is. But as Oliver Reichenstein pointed out:

There is one thing crowds have a flair for, and it is not wisdom, it’s rage.”

Relying on point data (most tweeted, commented etc.) to help filter content or evaluate its trustworthiness whether that be social media or mainstream media seems to me to be foolish.

It seems to me that a better solution would be to build a ‘trust graph’ which in turn could be used to assign a score to each person for a given topic based on their network of friends and followers. It could work something like this…

If a person is followed by a significant number of people who have published peer reviewed papers on a given topic, or if they have publish in that field, then we should trust what that person says about that topic more than the average person.

Equally if a person has posted a large number of photos, tweets etc. over a long period of time from a given city and they are followed by other people from that city (as defined by someone who has a number of posts, over a period of time from that city) then we might conclude that their photographs are going to be from that city if they say they are.

Or if a person is retweeted by someone that for other reasons you trust (e.g. because you know them) then that might give you more confidence their comments and posts are truthful and accurate.

PageRank is Google's link analysis algorithm, that assigns a numerical weighting to each element of a hyperlinked set of documents, with the purpose of "measuring" its relative importance within the set.

Whatever the specifics the point I’m trying to make is that rather than relying on a single number or count we should try to build a directed graph where each person can be assigned a trust or knowledge score based on the strength of their network in that subject area. This is somewhat analogous to Google’s PageRank algorithm.

Before Google, search engines effectively counted the frequency of a given word on a Webpage to assign it a relevancy score – much as we do today when we count the number of comments, tweets etc. to help filter content.

What Larry Page realised was that by assigning a score based on the number and weight of inbound links for a given keyword he and Sergey Brin where able to design and build a much better search engine – one that relies not just on what the publisher tells us, nor simply on the number of links but on the quality of those links. A link from a trusted source is worth more than a link from an average webpage.

Building a trust graph along similar lines – where we evaluate not just the frequency of (re)tweets, comments, likes and blogposts but also consider who those people are, who’s in their network and what their network of followers think of them – could help us filter and evaluate content whether it be social or mainstream media and minimise the damage of those who don’t tweet responsibly.

Some thoughts on rNews

IPTC are working an ontology known as rNews which aims to standardise (and encourage the adoption of) RDFa in news articles.

This is a very, very good idea – it should allow for better content discovery, new ways to aggregate news stories about people, places or subjects and generally allow computers to help people process some of the structured information behind a story.

Newspaper by Luc De Leeuw

rNews is still in draft. At the time of writing the published spec is at version 0.1, there are clearly ambitions to built out on this work and it will be interesting to see where it goes.

Although I’m sure much of this has been thought about before I thought I would jot down my initial thoughts on this early draft.

More URIs please

The current spec makes extensive use of xsd:string and xsd:double to assign attributes to a class. For example, the Location Class includes attributes for longitude, latitude and altitude but no URIs for places.

Using URIs to name places (and people, subjects, organisations etc.) would allow for much more interesting things to be done with the data.

It would make it easier to aggregate content from more than one news outlet and generally link things together by location, person and area of interest.

There’s obviously an issue here – there needs to be a good source of URI for places – but in reality there are lots of candidates out there from dbpedia to geonames.

Greater reuse of existing vocabularies

There are existing vocabularies that describe the some of the classes described in rNew – notably FOAF and Dublin Core.

I would prefer rNews reusing those vocabularies or at least linking (owl:sameAS) to them.

I’m not a fan of tags

I don’t really like “tagging” it lack semantics and is extremely ambiguous.

If I tag a news story am I claiming it’s primarily about that thing, features that thing, also about that thing, what? And whatever you think it means I guarantee I can find someone else who disagrees!

I would rather see more defined predicates such as primarilyAbout etc. I recognise this would add a bit of complexity but it would also increase the utility of the vocabulary.

If the intention is to aid discoverability through categorisation then use SKOS.

Explicit predicates for source materials

I think it’s really important to explicitly link to source material, especially for science and medicine (it’s why Nature News and has always done so).

A simple set of predicates for the DOI, abstract URI, scientist/researcher of the original research and/or a URI for the raw data should suffice.

Again, it would also help if there was a handy source of URIs for scientists.

Should the story be at the heart of the ontology?

I’ve always thought of news stories as metadata about real world events.

If you reframe the problem in this way then what you really want are predicates to describe the relationship of the story (article, photo, video) to the event. You also then want links between people & places and those events (which could be inferred through the various news stories).

Building the ontology this way round would allow for some very powerful analysis and discovery of stories.

Anyway – I’ll be really interested to see how the ontology develops and how widely it gets adopted.

Science ontology — take three

Paul, Michael and Silver have done a bit more work refining the nascent science ontology — unfortunately I was caught up doing something a lot less interesting so this version is all their work and not mine, and it is all the better for it.

The big change to this version is the removal of much of the publication specific stuff since this is handled elsewhere otherwise otherwise it should look like a fairly obvious evolution from the previous versions.

Version 3 of the science domain model

And here’s a N3 serialisation of the model. There’s still lots to do, it needs checking against what happens when there are multiple ranges are given for a property, we need to write proper definitions, add namespaces, look for existing ontology reuse etc.

<!-- Science Ontology - First version! Still to do: Declare namespaces Define ontology (name, author etc) Finish definitions Look for existing ontologies for reuse etc. Publish! -->

<!-- Classes -->

so:Observation a owl:Class;
	rdfs:label "Observation";
	rdfs:comment "Definition goes here" .

so:Hypothesis a owl:Class;
	rdfs:label "Hypothesis";
	rdfs:comment "Definition goes here" .

so:Experiment a owl:Class;
	rdfs:label "Experiment";
	rdfs:comment "Definition goes here" .

so:Equipment a owl:Class;
	rdfs:label "Equipment";
	rdfs:comment "Definition goes here" .

so:Method a owl:Class;
	rdfs:label "Method";
	rdfs:comment "Definition goes here" .

so:Collaboration a owl:Class;
	rdfs:label "Collaboration";
	rdfs:comment "Definition goes here" .

so:ExperimentalObservation a owl:Class;
	rdfs:label "Experimental Observation";
	rdfs:comment "Definition goes here";
	rdfs:subClassOf so:Observation .

so:Data a owl:Class;
	rdfs:label "Data";
	rdfs:comment "Definition goes here" .

so:Analysis a owl:Class;
	rdfs:label "Analysis";
	rdfs:comment "Definition goes here" .

so:Publication a owl:Class;
	rdfs:label "Publication";
	rdfs:comment "Definition goes here" .

so:Theory a owl:Class;
	rdfs:label "Theory";
	rdfs:comment "Definition goes here" .

so:Prediction a owl:Class;
	rdfs:label "Prediction";
	rdfs:comment "Definition goes here" .

so:Agent a owl:Class;
	rdfs:label "Agent";
	rdfs:comment "Definition goes here"
	rdfs:subClassOf foaf:Agent .

<!-- Properties -->

so:inspiredBy a owl:ObjectProperty;
	rdfs:label "inspiredBy";
	rdfs:comment "definition goes here - but what happens with multiple ranges? hypotheses can be inspired by Observations, Theories and Predictions...";
	rdfs:domain so:Hypothesis;
	rdfs:range so:Observation;
	rdfs:range so:Theory;
	rdfs:range so:Prediction .

so:makes a owl:ObjectProperty;
	rdfs:label "makes";
	rdfs:comment "definition goes here";
	rdfs:domain so:Theory;
	rdfs:range so:Prediction .

so:tests a owl:ObjectProperty;
	rdfs:label "tests";
	rdfs:comment "definition goes here";
	rdfs:domain so:Experiment;
	rdfs:range so:Hypothesis .

so:equipment a owl:ObjectProperty;
	rdfs:label "equipment";
	rdfs:comment "Relates a piece of equipment to an experiment it is used in.";
	rdfs:domain so:Experiment;
	rdfs:range so:Equipment .

so:method a owl:ObjectProperty;
	rdfs:label "method";
	rdfs:comment "Relates a method to an experiment it was used in.";
	rdfs:domain so:Experiment;
	rdfs:range so:Method .

so:experimentalObservation a owl:ObjectProperty;
	rdfs:label "experimental observation";
	rdfs:comment "Relates an observation made as a result of an experiment to the experiment it was made in.";
	rdfs:domain so:Experiment;
	rdfs:range so:ExperimentalObservation .

so:captures a owl:ObjectProperty;
	rdfs:label "captures";
	rdfs:comment "Relates data to an experimental observation it was captured in.";
	rdfs:domain so:ExperimentalObservation;
	rdfs:range so:Data .

so:analyses a owl:ObjectProperty;
	rdfs:label "analyses";
	rdfs:comment "Definition goes here";
	rdfs:domain so:Analysis;
	rdfs:range so:Data .

so:published a owl:ObjectProperty;
	rdfs:label "published";
	rdfs:comment "Relates an Analysis to a Publication it was published in.";
	rdfs:domain so:Analysis;
	rdfs:range so:Publication .

<!-- Analysis to Theory -->

so:establishes a owl:ObjectProperty;
	rdfs:label "establishes";
	rdfs:comment "Definition goes here";
	rdfs:domain so:Analysis;
	rdfs:range so:Theory .

so:validates a owl:ObjectProperty;
	rdfs:label "validates";
	rdfs:comment "Definition goes here.";
	rdfs:domain so:Analysis;
	rdfs:range so:Theory .

so:modifies a owl:ObjectProperty;
	rdfs:label "modifies";
	rdfs:comment "Definition goes here.";
	rdfs:domain so:Analysis;
	rdfs:range so:Theory .

so:contradicts a owl:ObjectProperty;
	rdfs:label "contradicts";
	rdfs:comment "Definition goes here.";
	rdfs:domain so:Analysis;
	rdfs:range so:Theory .

<!-- Analysis to Hypothesis -->

so:supports a owl:ObjectProperty;
	rdfs:label "supports";
	rdfs:comment "Definition goes here.";
	rdfs:domain so:Analysis;
	rdfs:range so:Hypothesis .

so:modifies a owl:ObjectProperty;
	rdfs:label "modifies";
	rdfs:comment "Definition goes here.";
	rdfs:domain so:Analysis;
	rdfs:range so:Hypothesis .

so:disproves a owl:ObjectProperty;
	rdfs:label "disproves";
	rdfs:comment "Definition goes here.";
	rdfs:domain so:Analysis;
	rdfs:range so:Hypothesis .

<!-- Agent properties -->

so:proposes a owl:ObjectProperty;
	rdfs:label "proposes";
	rdfs:comment "Definition goes here.";
	rdfs:domain so:Agent;
	rdfs:range so:Hypothesis .

so:collaborates a owl:ObjectProperty;
	rdfs:label "collaborates";
	rdfs:comment "Definition goes here.";
	rdfs:domain so:Agent;
	rdfs:range so:Collaboration .

so:funds a owl:ObjectProperty;
	rdfs:label "funds";
	rdfs:comment "Definition goes here.";
	rdfs:domain so:Agent;
	rdfs:range so:Experiment .

so:performs a owl:ObjectProperty;
	rdfs:label "performs";
	rdfs:comment "Definition goes here.";
	rdfs:domain so:Agent;
	rdfs:range so:Experiment .

so:observes a owl:ObjectProperty;
	rdfs:label "proposes";
	rdfs:comment "Definition goes here.";
	rdfs:domain so:Agent;
	rdfs:range so:Observation .

so:forms a owl:ObjectProperty;
	rdfs:label "forms";
	rdfs:comment "Definition goes here.";
	rdfs:domain so:Agent;
	rdfs:range so:Analysis .

so:creates a owl:ObjectProperty;
	rdfs:label "creates";
	rdfs:comment "Definition goes here.";
	rdfs:domain so:Agent;
	rdfs:range so:Publication .

so:creditedWith a owl:ObjectProperty;
	rdfs:label "credited with";
	rdfs:comment "Definition goes here.";
	rdfs:domain so:Agent;
	rdfs:range so:Theory .

so:participates a owl:ObjectProperty;
	rdfs:label "participates";
	rdfs:comment "Definition goes here.";
	rdfs:domain so:Agent;
	rdfs:range so:Agent .

so:collaboratesOn a owl:ObjectProperty;
	rdfs:label "proposes";
	rdfs:comment "Definition goes here.";
	rdfs:domain so:Collaboration;
	rdfs:range so:Experiment;
	rdfs:range so:Hypothesis .
Follow

Get every new post delivered to your Inbox.

Join 1,310 other followers

%d bloggers like this: