Part of a series: Development Data Challenges – and thats probably the best message to take away – not one event, not one single opportunity to get things done, but a whole community, lots of questions, lots of events where different people come together, work on something, make progress in a combination of major breakthroughs, lightbulb moments, and tiny steps, plus of course lots of frustration – unfortunately, despite the huge progress of IATI, the leitmotiv continues to be data availability and data quality.

In classic data hack day style,  started with a freeform discussion of ideas,  and while there was a wall of paper:

Challenges for developers  - pictures on a wall

Challenges for developers

they were not just the result of a brainstorm of people in a room, but via a website soliciting questions. These included:

  • who are the biggest funders of primary education in Africa
  • How can we produce data about IATI data and how can that help us improve the data
  • How do we know whats missing
  • How would you demonstrate transparency contributing to (or hindering) accountability and good governance
  • Is aid serving disabled people
  • Promised v actual spending
  • How can we use open data to improve the quality of data-driven journalism
  • How predictable are aid flows
  • Can we map conditions attached to IFI loans
  • Can we map photos of development projects in every country
  • What % of EU aid is spent via financial institutions like the World Bank
  • What % of DFID aid is spent evaluating projects and what $ of recipients are given formal opportunities to provide feedback – and how many do

and there were many more.

On this weekend, the group went through them and agreed to work through some questions which broadly grouped around data quality, impact of media coverage on funding, access to water, traceability and geolocation

The following were presented [nb these are transcribed from my scribbled notes and what I understood from the presentations – so if I’ve mis-described your work or given a false link, please let me know via the comments]

Mark Herringer presented a konekta project: Geolocation for community development. His idea is that if community events could be indexed in a common form on a database, with geolocation data, then there could be a simple interface on a mobile phone where users could say “where is my nearest x” – when X might be a clinic offering vaccinations, or health checks.

On transparency, there was no clever app to show, but Simon Whitehouse outlined the range of things they had looked at. All practical – aiming to track finance and project outcomes. Two approaches – one worked from the donor starting point, trying to track funding from taxes > beneficiaries, another looked from a recipient point of view. Challenges included that IATI data is incomplete, or it is good – until you get to the delivery organisation layer. Much of their work over the weekend was to try and map out what data they would LIKE to have, to be able to do more on this sort of traceability. One model they described sounded familiar – going beyond the financial data to include a formal data structure, along the lines of a webpage per project, which would ultimately aggregate all material around a project, including allowing feedback from all stages of the process.

In order to quality check the data, one team worked on a process for testing data for errors – from something as simple as “is there a title” along to more complex data fields. Their calculations gave a % quality mark for IATI data. Run around 40 tests, identified around 100 – need more, also need to work out a way of weighting tests, to identify which fields are more valid/useful. He tweeted a site where tests will run: Test site (not sure if this is the one they showed in the demo though).

Watermap used newly available data from UNDP on South Sudan – looking at mapping settlements and distance from watersources (which may be a well, or river/waterway), plus overlaying geo data – eg presence of aquifers. Tried also to map where funding was spent: exactly what it was spent on, exactly where. A further overlay showed population density – although when questioned, there were no figures as to exact sizes of settlements – which gave rise to questions as to how you could use a map like this to priorities – eg the community furthest from any water source (one was over 10km) looked as though it was in an aquifer rich area – so its possible that they all had informal access, eg wells dug in their gardens As the presenter said – at least its a start – its a way of seeing where to look deeper and what other questions to ask.

One team attempted to map media influence on fundraising. They referred to the guardian data blog which has an illustration of Somalia famine and funding – their coverage data is based on print sources.
This team looked at 5 natural disasters: tsunami, Haiti, Japan earthquake, floods in Pakistan plus the horn of Africa famine. Looked at both public and private sources of funding, number of people affected, and minutes of media coverage -used broadcast mainstream media, but agreed it is hard to capture. They produced an interesting visualisation – but how far does it really go towards proving the concept? Lots of variables – early data from tsunami may not be available, Japan govt didn’t ask for funding, difficulties in capturing exact/accurate figures – even for something you would expect would be commonly available – eg number of people affected/displaced.

They explained that sometimes, even when data is available, it requires a lot of manual intervention to do what you want with it – eg tag funds as to whether they were NGO, foundations, donors – as you could start with premise that private donations are influenced by media, while those from more formal channels (donors) are not.

Common thread again – LOT of time spent just trying to get hold of the data.
They commented FTS [?] is good service,  but you have to download xls files, there is no api.

Also showed another hack worked on before this weekend: Hack Malawi – simple map, aid by sector plotted onto outline map of country, from range of donors. Hover over circle, it shows amount, project status, donor – but not recipient.

The next DDC will be in Helsinki – part of OK festival in September. Also mentioned were the disaster 2.0 hackathon – to be held in Warwick in Sept, and the Data Kind event (formerly data without borders). Also mentioned Guardian will be running a data visualisation competition throughout September – details to be shared soon – to focus around aid data and there will be cash prizes.

Shared on screen was a link to resources for this and future DDC events – produced by the Open Knowledge Foundation

Happy to add/change any links if people let me know info via twitter (@juliac2) or in comments.

This DDC event should be written up on the guardian datablog and linked from They also covered it via storify and the hashtag #ddc2012 gathers tweets together.

Impressions left with – lots of people want to do things with data. Skills are there, but data isn’t always available – so, the need is to continue to engage, those who want data need to keep asking questions of those who they think have the data,  so that those who have will make what is known available in the format developers need, plus , if they know the questions, will do more more work to identify what else could be released, or, where there are gaps, where more/different data should be collected.

The open data movement in the UK has started well, but need to recognise this is not a time to sit back: if the model was only to continue to deliver those datasets already identified, on the format we do at the moment, that would not be good enough.

Update: edited to include correct hashtag : search for #ddc2012

The Guardian team have now published their (much prettier) writeup of the event: < also includes proper links to all the data used.