Published by the IEEE Computer Society
0018-9162/09/$26.00 © 2009 IEEE
Open Source Data Collectionin the Developing World
n the developed world, datais relatively easy to collect. Beit population demographics,embedded traic sensors, oreven popular Internet services, theability to easily tap and synthesizeraw data enables individuals andorganizations to make decisions.Examples o such synthesis includeearthquake sensing via Twitter, tracmapping in Google Maps, and disease-oriented websites like PatientsLikeMeand Google Flu Trends.In the developing world, the lacko reliable inrastructure, ubiquitousconnectivity, and adequate exper-tise makes data collection dicult.Currently, most organizations col-lect data on paper orms despiteineiciencies such as the physicalcollection o completed orms, datatranscription errors, and long delaysbeore the data is available.This problem is exacerbated by thedata’s critical nature. I, or example,you don’t know how ar villagers arerom a stagnant water source, it’s di-cult to know how many mosquitonets to deploy; and, i deploymentinormation isn’t connected tomalaria cases at local clinics, it’simpossible to know whether the netshave made a dierence.The exponential growth o cellphone usage and inrastructure indeveloping regions has aroused greatexcitement or using mobile devicesto address current gaps in data gath-ering. In addition to the variety o data—text, photos, location, audio,video, barcode scans—that canbe gathered, mobile devices haveproven to be dramatically aster atboth collecting the data and makingit available to decision makers. More-over, deploying mobile devices can beless expensive and less error pronethan using pen and paper.
While several systems currentlyexist or simple data collection indeveloping regions, they’re otendiicult to deploy, hard to use,complicated to scale, and rarely cus-tomizable or extensible.Current oerings like PendragonForms, Frontline Forms, and NokiaData Gathering are infexible becausethey’re closed source and based onclosed standards. Others like Java-Rosa, RapidSMS, FrontlineSMS, andEpiHandy are more fexible but pri-marily collect textual data.Moreover, many o the devices thatrun this sotware have limited pro-cessing power and restricted storage,and they oten lack cellular connec-tivity. Input oten comes in the ormo a stylus or numerical keypad,while output must t on minusculescreens—a combination that resultsin poor usability. In addition, devel-opers only have limited access to thephone’s resources, making it dicultto include essential inputs like thephone’s unique identier, GPS loca-tion, or captured photos with the data.There’s also a need to develop moreand better server-side tools. Ideally,these tools should be as service-ori-ented as e-mail has become. In thesame way that consumers no longerneed to congure and maintain mailservers, organizations that collect dataneed “e-mail easy” solutions that letthem ignore the hidden costs o serverinrastructure: power, connectivity,maintenance, security. And just likee-mail, it must be easy to move thedata across various systems.
OPEN DATA KIT
To help ll this gap, we are devel-oping Open Data Kit (http://code.google.com/p/open-data-kit), a suiteo tools that enables users to collecttheir own rich data. ODK is designedto let users own, visualize, and sharedata without the diculties o settingup and maintaining servers. The toolsare easy to use, deploy, and scale.They also go beyond open source—they’re based on open standards andsupported by a larger community.
Yaw Anokwa, Carl Hartung, Waylon Brunette,and Gaetano Borriello,
University of Washington
Massachusetts Institute of Technology
Open Data Kit enables timely and efficient data collection oncell phones, a much-needed service in the developing world.