This action might not be possible to undo. Are you sure you want to continue?
Anne Njuguna Nyambura Kimathi University College of Technology, Nyeri, Kenya firstname.lastname@example.org Oyugi Steve Ouma Kimathi University College of Technology, Nyeri, Kenya email@example.com Lishba Mose Naisinkoi Kimathi University College of Technology, Nyeri, Kenya firstname.lastname@example.org
ABSTRACT This paper intends to explicitly provide an overview of Public Health and Epidemiology and Data Mining. How the Internet in specific can be used to collect and analyze data to generate a pattern that can be used by the health industry to improve their services. This paper looks at Kenya as the area of study. What it intends to do is implement a website that shall be used to gather data and use various statistical and data mining tools to provide accurate information that can be used for the above intention. It shall make use of the various web 2.0 tools such as are provided by php, which has become one of the most dynamic web scripting languages world over. It shall be used to facilitate interactions with a database that shall store all the required data. The same website shall be used to connect to all the medical centres available in the country that will provide manual input of data. Analysis of the data shall be done using relevant data mining methods and techniques. INTRODUCTION Health is the cornerstone of all societies and the apex of all social and economic systems world over. There is no doubt that for every economy to succeed, its working population must consist of healthy individuals; healthy in mind, body and soul. This paper seeks to study a big field in health: Epidemiology. Epidemiology is the study of patterns of health and illness and associated factors at the population level. It is a field that is greatly accredited with the determination of precise evidence based medicine and medical practices used in identifying risk factors for disease and is also used in the determination of optimal treatment approaches to clinical practice and preventive medicine. Epidemiologists rely on scientific disciplines such as Bio-Statistics, Geospatial Information Science and Social Science to better understand disease processes, obtain the current raw information available, store data and map disease patterns, to better understand proximate and distal risk factors that affect the spread of major diseases. Currently we are being dogged by great challenges in health and the integration of data mining methodologies and ideologies will greatly reduce the load taken in data collection and analysis.
BACKGROUND INFORMATION In Kenya, some of the major diseases affecting the population include Malaria, HIV, Cholera, Typhoid, Cancer and Polio. The hindrances to the provision of these services include lack of adequate access to information, lack of adequate VCT centres, insufficient provision of Treated Mosquito Nets, poor hygiene, poverty that brings us to the use of the Internet as a means of propagation of Information. A great percentage of the Kenyan population own Mobile telephony equipment (about 50% of the population), in essence this could mean that this very population could have access to the Internet. We intend to use this as an advantage, by employing online facilities that seek to capture data from the Kenyan Diaspora on issues evolving around health. This shall be employed through the use of Medical Websites that seek and gather data from other sites on the various variables of study. PREVIOUS RESEARCH There is information that has been collected on epidemiology in Kenya in the recent past, most of which were specific on health, especially about HIV, Malaria, Bacteria and Typhoid by The Centre for Viral Research, under the Kenya Medical Research Institute (KEMRI), Nairobi, Kenya and other initiatives also actively undertaken by KEMRI. They have however been heavily paper based, relying on traditional data techniques like filling paper questionnaires. Our methodology would intend to make use of a larger field, which would be the whole of Kenya, and would be entirely digital. This will be cheaper in terms of implementation, can be done faster and would eradicate factors such as bias. PRELIMINARY FINDINGS The beauty of web 2.0 is its ability to integrate the use of virtually ant technology under its huge wings. Php has come up as a force to be reckoned with in the creation and dimensioning of dynamic sites, whose content changes every fraction of a second. The current Kenyan population, as earlier stated, consists of young and tech adequate individuals, with the mind and eye for technology. It has often been joked about the assumption that Kenyans live off Facebook, they board taxis while they are on Facebook, are constantly on chat and currently 2go seems so much better than regular cell phone service. This gets even more interesting: A regular blogger, Moses Kemibaro used Alexa, a website ranking service to determine the most common websites in Kenya and came up with this results from first to tenth: Yahoo, Google, Facebook, Windows Live Search, Hi5, YouTube, MSN, Blogger, RapidShare and Wikipedia with others like MySpace, Daily and Sunday Nation and the East African Standard in close rank.
Trends in the world Internet usage is hanging towards production of statistics of website usage and site ratings that is being used, most of the time for economic marketing purposes. This highlights the key to the success of this research. Basically, it shall involve the creation of a main website, specifically dedicated to data collection and analysis. This data shall be stored in a central database. It shall rely on other website such as those frequented by Kenyans through the use of advertisements and posts, questionnaires and other data collection mechanisms. The same site shall be used by health centres in the country for data collection in the form of regular data uploads. To facilitate this venture, all the health centres in the country shall be linked to each other, allowing comparisons and further analysis of the same data. This idea brings as to the implementation of this research, which shall rely greatly on this phenomena. Site visitations shall include statistics on topics and forums on health most frequented, medicines most purchased, family health histories, lifestyle statistics, geographical factors, genomic factors, ethnic/tribal/race factors, diseases most clicked on, prescriptions most filled and other variables of research. The data collected shall seek to answer some of the following Medical Practice concerns: What factors affect the onset of disease within a population? What is the likelihood that a patient will require follow-up treatment, hospitalization or that the case will worsen? Are there particular clusters of Patients that are likely to develop certain diseases? Which geographical regions in the country are likely to be affected by some diseases and which ones are not susceptible? How often is a case misdiagnosed? What particular treatments and drug combinations are most likely to cure the diseases? Other variables shall include Patient and family Histories, Treatment facilities, Drug interactions, geographical factors, genomic influences, lifestyle influences and Treatment Personnel. The data shall be classified according to the following criteria: Environment in terms of exposure, location, job risks and diet; Genetics in terms of genetic markers present; Clinical in terms of blood and other diagnostic data; Familial in terms of other family members; History in terms of past illnesses; Socio-economics in terms of job, marriages, education, age and gender; Lifestyle in terms of Exercise, smoking patterns and alcohol consumption issues; and Ethnicity, Tribal factors, and Geographical placement. METHODS There are various data mining techniques that shall be utilized. A few of them shall be: Classification which shall seek to find out attributes that best describe the dependent variables, clustering that shall seek to group various attributes in the entire population being investigated, association which shall determine whether one event shall affect another, attribute importance which shall provide information on how one
attribute’s effects overally affect another dependent variable and lift model which shall seek to find out how well a variable can seek out or identify a required target. There exists several algorithms that can be used to analyze the data, and their use shall be dependant on their functionality and resource requirement. Methods such as Naïve Bayes would be useful when dealing with new data. This is so because it assumes that the presence or absence of data of a particular feature of a class or variable is completely unrelated to the presence or absence of another feature of the same variable. It considers all of the properties to independently contribute to the probability that a variable is what it is. An example would be such as the classification of an organism as bacteria provided it has common bacterium properties. Another possibility would be the use of Bayesian Probability, slightly similar to the latter, only that it is probabilistic in approach and depends heavily on the availability of prior knowledge on the variable being covered or discussed. The assumptions that were used to formulate policies on the knowledge of that variable are also considered, provided that they do exist. It is very fast in analysis and in the case of unavailable data, cross validation and bootstrapping is implemented. Cross validation shall deal with the subsequent submitted parts of the data to generate patterns while bootstrapping is employed on random samples of submitted data taken later are being used instead of the prior ones. The data shall be collected periodically, after which analysis shall be conducted and the results released on all available media of communication. CONCLUSIONS The collection and analysis of this data on epidemiology shall result to the provision of better medical services throughout the country. The success of this venture shall be a great jump for ICT in Medicine in the country. It shall highlight the significance of Information Technology in the Kenyan Diaspora as a cheap and efficient alternative. Data mining as a field will be accredited with facilitating the analysis of this data, and provision of accurate and timely information. ACKNOWLEDGEMENTS We would like to acknowledge the work of the A.G Director and Chairman of the School of Computer Science, Mr. Nicholas Gachui for his dedication towards our research work. We also appreciate the support of our university, Kimathi University College of Technology in this noble venture. Mistake and omissions remain ours.
REFERENCES 1. Peter Lucas, Bayesian Analysis, Pattern Analysis and Data Mining in Health Care, October 2004, 399-403 2. Scott A. Rappoport, Data Mining and Epidemiology, OCP MTS Technologies, Oracle World 2003 3. Ramoni M, Sebastian P, Cohen P, Bayesian Clustering by Dynamics, Machine Learning, 2002; 47;91-121
4. Cousin J. The New Maths of Clinical Trials. Science 303; 2004; 784-786 5. Beaumont MA, Rannala B, The Bayesian Revolution in Genetics, Nature Reviews Genetics 5; 2004; 251-261 6. Greg Rogers, Ellen Joyner, Mining your Data for Health Care Quality Improvement 1997; 3-5 7. Online Data Mining Projects http://www.ultragem.com/projxmpl.html 8. SAS Instittute Inc. , SAS Communications, Data Mining Reveals Diamonds in your Database, Third Quarter, 1995 9. Eric V. Siegel, Predictive Analytics with Data Mining, DM Review Magazine, February 2005 10. James A. Berkley, Brett S. Lowe, KEMRI, Bacteremia Among Children Admited to a Rural Hospital in Kenya; January 2005 11. ME Parise, JG Ayisi, Prevention of Placental Malaria in an Area of Kenya with a high Prevalence of malaria and Human Immunodeficiency Virus Infection; November 1998
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.