Data storytelling

Notes from a fascinating workshop hosted at the National Audit Office in April.

Nick Halliday opened the workshop by introducing the data analytics framework in NAO.
Start with understanding what you have, who owns it, are you actually in a position to publish or republish?
Does anyone really know all the data that exists around their organisation?
NAO ran an internal data hack, used examples to draw out more data sets and help people think through what might be able to be done with them.
Whole process helped generate more ideas about doing different things with their data.
Question about appetite. Transparency has generated more interest and there is a growing demand from journalists. Often people are interested in the raw data that sits behind the tables that orgs publish eg in annual reports.
Question about formats: across government there are an awful lot of PDFs, but we’re seeing a gradual move to more open formats.

Highlights:

what story are you trying to tell
what are the key messages, is the central message clear. It takes experienced journalistic approach to look at a lot of rough data and see the story tools and training
what do users actually want – are you publishing just because you want to share or have seen a cool tool

Other refs:
ODI blog: Five stages of data grief.
Data journalism handbook – free open source reference book

Next presenter: Nick Bryant, head of design at ONS
He shared their experiences in developing infographics. First, a definition: Self contained visual story presenting information, data or knowledge, clearly with meaning and context, without bias.

If you search google, you get over 20 million results. There are a lot out there, which could indicate appetite, but also possible saturation.
To make yours stand out, you need to work hard and look at using different channels to disseminate them.
The story for ONS started in 2011 (back when a search only returned half a million results).
Article in the independent Hot Data: the art of the infographic – mentioned some of the pioneers: David McCandless – Information is beautiful, Guardian data team etc.
Need to trust your design instincts.
There was an element possibly of distrust – have to make sure data is reliable, the graphic won’t hide or mislead.
They tested the water in 2012 with a couple of simple ones, starting to think about a house style and consistency, while still experimenting with different models.
By 2013 starting to take off. Raised visibility across the office, which led to questions about differentiation between infographics and other info products. Do they add value? Better not just to duplicate. Find a new angle where the infographic is the best way to tell the story.

Looked outwards. BBC global experience language was a good pointer, evidence that having clear guidance that people follow means outputs will look more professional. Consistency is key.
ONS published their Infographics guidelines.
Included all parts of the process, including getting the right people involved in the team from the start, getting the story clear, and agreeing sign off routes. Reinforced the need to be consistent with design elements: accessible colours, consistent use of fonts.

Ref Design Council publication: Leading business by design.

Martin Nicholls (@martyandbells) heads up editorial team in digital publishing at ONS.
First point: content is king. If the data doesn’t contain the story, don’t try and force it.
Collaboration is queen. Editorial and designers can only produce material when working closely with statisticians and data experts. Need them to make sure the data is not being misinterpreted.

Golden rules
Everything created has to be for people: they want people to engage with it, understand it. How? Use the vocabulary real people use. Recognise that content has to be crafted, can’t simply be harvested. Don’t just cut and paste text others have produced.
Aim to be interesting, but sensation free.
Add human elements, and look for the international context.
Apply news value – what is being reported at the time.
Agree objectives and target audience with the business areas before you start to create.

Correlation doesn’t imply causation (ref chart with numbers of storks and babies being born in Oldenburg!) don’t always aim for the biggest number. [since then, have seen the fascinating site Spurious Correlations which provides many illustrations of this]

Take care with headlines. Basic stats publications tend to have dry factual titles. Be alert to how infographics can highlight slightly odd angles.

How to measure success?
Syndication is a good marker. If others use your infographic, that is seen as a success.
Social engagement, how are people reacting on twitter. Either love or hate, people who are indifferent don’t tend to comment.
Internally, if people want to work with you again that is a good measure. Especially among statisticians who tend to be divas! (Think their data is already cool.)

Recognise different needs of different audiences. Those who engage with statistical releases are very different to public with a passing interest.

Recognise there is no silver bullet. It is impossible to have a short checklist which will guarantee great content every time.

Next presenter was Will Moy from FullFact.org
Recognise they are on a journey. Stories that aren’t good enough don’t get retold. Brevity is key. Don’t waste people’s time.
Shared the example where they were asked to live fact check the debates between Farrage and Clegg.
Pressure helps you develop skills in clarity when people have an agenda to push. Often their [Full Fact’s] role is to explain that things aren’t quite as simple as people may want you to believe. Not contrasting right and wrong, more about showing shades of grey.
The debate gave them the chance to test new toy. As the video of the debate runs, each fact check explanation pops up.

New tech means they can be much more engaging. But they also tend to keep it extremely basic, illustrate a single fact or definition. They draw out full picture. Things that make headlines are dramatic changes, which if you then look more closely, it’s explained by seasonal peaks. Tells you that you should always look at trends rather than individual steps. Good idea to add keys and notes to charts.

Key point: infographics don’t have to be hugely complex things. Can be just the right data presented in the right way to get the right message across to the right people.

Reinforced point made by earlier speakers about using the vocabulary that the audience is using. Retail price index or cost of living?

Ref book: The tiger that isn’t: seeing through a world of numbers. by Andrew Dilnot and Michael Blastland.

Talks about clusters, which can always be found if you look, but don’t necessarily lead to facts.

Mentioned the danger in averages: The average person only has one testicle………

The challenge is in finding the stories that people want to tell their friends.
Key though is trustworthiness. Challenge is for example when a fact is provided, and it says ‘source ONS’ and doesn’t send you directly to the specific fact or data set.

Ref balance between time spent on publication and time spent on conversation. Eg spending a lot of time on preparing the perfect infographic compared to thinking time around answering the questions people are actually asking when you publish something.

Alan Smith (@theboysmithy) ONS Data visualisation centre.
Together with Rob Fry, talked about interactive infographics.

Started with quote from Simon Rogers, formerly guardian data blog: “ONS has incredibly useful data on its website, but also has world’s worst website…..”

Andrew Dilmott talks a lot about citizen users. They are different from traditional users, as people like the Treasury and banks would work out what they needed and work out how to get it.
Visualisation not really aimed at specialists, they are for the more casual visitors. Eg the best visualisations cause you to see something you weren’t expecting.
Shared the example of recent New York Times interactive infographic around dialect and vocabulary which resulted in a map.
It was the most popular NYT item that year, even though only released 21 December.
Why? It was visual, personal, and social. Immediately you got a response to your actions, you could share it.

Talked through some interactives that they produced, the first of which was around 2011 census data. It allowed comparisons between cities and ways of comparing the data by overlaying or showing scale.
Similarly, the team used the data to tell their own story, and added value by including context.

Another interactive was on the annual survey of hours and earnings The statistical release looks at overall trends, gaps between male and female etc. But what about other stories? For the team, there were geographic examples, where plotting the numbers on a map showed some obvious pockets. But much more dramatic when you skew the map to show where jobs actually are. However, much harder then to understand what you are looking at. Their solution was to show both maps side by side.

Question of skills – how do you learn them?
Team in Hampshire are hosting a conference on the graphical web, supported by W3C. August 27-30th. Theme is visual storytelling.

Tom Smith (@_datasmith) Oxford consultants for social inclusion.
Talked about data for social good

Concept of open data: Government publishes increasing amounts of open data which is available for reuse. There is a common belief that this is a one way street, lots of publishing with no sense that it might deliver benefits, and a reliance on an army of armchair hackers who may or may not actually make something of it.
BUT there are already some really good examples of good things being done with data.
UK probably leading the world in open data (or at least up among the leaders) Open Data Institute, and open data user group doing good work. ODUG recently published set of case studies.

Shared the case study of Community insight – a tool based on open data for housing associations to start basing service decisions on data. OCSI worked with the housing associations to find out what their needs were.

Had to be simple, to be used by housing officers. Contained lots of maps. Not sticking to govt boundaries, eg need their own definitions. Needed to be able to generate reports.

Then he talked about closed data.
Sometimes this is for legitimate reasons, but there are issues around how the data is used, perhaps about allowing limited access. Its not necessarily always about publishing the data. Departments who hold the data could use it to answer questions, without giving out the actual data itself. An example might be the percentage of people whose circumstances changed after a particular intervention.

Referenced the ONS virtual microdata lab. Controlled access allowed to academics and other authenticated users to the raw data that ONS holds.
There are conditions of use: has to be lawful, support public benefit and what you pull out has to be non-sensitive.

Ministry of Justice did something similar to allow access to data on re-offending. The potential of closed data is a good counter balance to the power of open.

Dan Collins (@dpcollins101) from GDS
Data, information and the user.

Dan is one of two data scientists at GDS, and introduced the main work areas of GDS: GOV.UK, transformation exemplars, assisted digital, user research, IT reform, performance and delivery.

He sits in the latter, focusing on measurement and analytics.
So what does a data scientist do?
Estimate probabilities, statistical learning theory, data visualisation and task automation.
In reality, most of the job is data collection and cleaning.

Introduced the performance platform. Aim is to give simple and clear access to the performance of services. Gives real time info to service owners, but also transparent and available to all.
Aim is to combine data sources, from back office systems, from call centres, from web stats and social media.

Raised question around whether data needed a narrator? Subject matter experts know their data, but not necessarily best people to talk about it to others.

He is currently working on DCLG data – not just on the data specifically, but looking at what skills are needed in the dept to do this sort of thing, and what technical blockers there are.
He shared an example which allowed for a lot of filtering and displaying London Fire Brigade data. This would otherwise just be a massive spreadsheet and it would be virtually impossible to spot patterns.

Nick Smith (@geckoboard and @nickwsmith)

Was originally going to talk about building better dashboards, but evolved to how to use dashboards more effectively. (Focus is on using the geckoboard products.)

Geckoboard is a startup which aims to bring data from different systems together into a dashboard. Their dashboards pull together data sources and display data in real time. Must be simple to use.

Shared five insights:

First, need to understand why. Eg what are you trying to achieve by using data to tell a story. Maybe it’s an issue about accessing up to date information, or data is lost in lots of different places.
Second, decide what matters. Don’t just communicate “because I can” Need to gather and share metrics that contribute to overall objectives. All else is vanity metrics.
Third, try to kill vanity metrics, they are not actionable.
Fourth, good stories evolve, as do good dashboards. Organisations don’t stand still, people come and go, objectives evolve.
Finally, ignore him! Sometimes it’s right to trust gut instincts, work out what is valid and valuable for your own organisation.

Martin Stabe (@martinstabe) Interaction team at Financial Times
Martin closed the session with a highly engaging talk – introducing this topic as a weird new sub genre of journalism.
Described FT as a typical news organisation not famous for depth of statistical knowledge.
A data journalism team needs three types of people: computer assisted reporter, data visualisation specialist, eg graphic designer who works with numbers, and web person, who probably works elsewhere not in the news room. The aim is to bring those people together and get them working on specific projects.
Not a new thing – journalists do dig into statistics to find stories. This has been going on for longer in the US and Scandinavia, as tradition of access to public data has longer history there.
Early example shown from pre computer days, was a story illustrating racial distribution in Atlanta, compared with banks lending data. In that story, map was a tiny part of the story. Data journalism about rigorous reporting based on data.
Pretty pictures not necessarily the aim. Best reporting using statistical analysis may just include a couple of clear charts to illustrate the story that has been discovered.

So, what is new?
In the UK in particular, it’s access to data. Since 2000 FOI act, start of acceleration. Also, the evolution of the web – being able to publish content that is truly useful to readers. This has supported a range of new ways of telling the story.
Traditionally the choice was either explanatory or exploratory. Now both can be offered. Martini glass narrative structure: Big picture, then we walk you through a narrow channel, they we turn the whole database over to you.
Can do both near and far views, national and local.
Opportunity for personal relevance – eg extracting your school from the national stats.
Integration with social media – story can be shared with friends.
Again, different from traditional view that news is tomorrow’s fish wrapper. Digital products are reusable and have longer lifespan.

Shared a slightly more light hearted example, which used mortality data to calculate the likelihood you might live to see King George VII.

Another example was a calculation to work out the value of twitter just before it launched in stock exchange. Hid most of the tricky stuff, but gave people a couple of variables to tweak. And a similar exercise to work out what your personal data was worth.

In order to do their job, they need high quality open public data, that is free to use. They have to be able to access it fast, and it needs to be analysable, openable and reusable.

Note, data journalists are weird. They don’t want tidy tables, they don’t want to read the stuff you release, they want raw data that they can load into a tool to manipulate it. Eg they prefer CSV Nb they also need the look up files which help understand the data.

What next?
UK data explorer, set of tools for exploring UK public data. Mass produced interactives, scripts written once, so any new versions of the data can simply be uploaded.

If you are just updating a time series, could have automated stories (which would leave journalist free to do proper analysis.) Example shown of Washington Post and job statistics every month. Los Angeles Times has a similar scraper which takes data from USGS earthquake notification service, and writes a basic story on data. Can produce something virtually immediately after the data is available.

And that was it – a fascinating afternoon with a wide range of interesting speakers. Data and visualisation is a topic that is really causing a buzz at the moment – and these speakers combined to show that doing it right rather than doing it for the sake of it is key. And its not as easy as perhaps the simple output might indicate.

If you are interested to see any of the slidesets, ONS have published them.

Julia's Blog