You are on page 1of 33

Chapter 1

Introduction
1.1 Organization Background WHO-IPD (Programme for Immunization Preventable Diseases) formerly known as Polio Eradication Nepal (PEN) provides technical support to the Ministry of Health and Population (MoHP) for vaccine preventable diseases (VPDs) in Nepal. Since its establishment in 1998, IPD has been supporting the Government of Nepals endeavor to strengthen the surveillance of acute flaccid paralysis (AFP for polio), measles, neonatal tetanus , acute encephalitis syndrome (AES for Japanese encephalitis), and the routine immunization programme. IPD supports these activities in close collaboration with the Child Health Division, Epidemiology and Diseases Control Division, and National Public Health Laboratory under the Department of Health Services of Ministry of Health and Population. IPD currently has 11 field offices with 15 surveillance medical officers. All of the IPD field offices operate in close coordination with the Regional Health Directorates and the District (Public) Health Offices to carry out the surveillance and immunization related activities. 1.1.1 IPDs Core Activities 1. 2. 3. 4. Surveillance of VPDs Surveillance support for other infectious diseases Support for routine and supplementary immunization activities Support in policy formulation and strategy development for the National Immunization Programme. (NIP) 5. Research, publication and dissemination of surveillance information and guidelines 6. 7. 8. Social mobilization Coordination with partners Technical support to MOHP for laboratory diagnosis of VPDs

1.2 Schedules for Internship

Learning objectives S.N

April Week 3 ((16-22) Week 4 (23-30) Week 1 (1-8)

May Week 2 (9-16) 3 4

June 1 2 3 4

1.

Learn about designing forms, Modifying database

Assist in day to day activities. Develop a conversion software

2.

understand data verification by checking for inconsistencies

3.

generating analysis tools like graphs, maps etc

4.

. Know about importing and exporting data from various file format to and from database

5.

Understanding querying on single & multiple relations

6.

Know about backup, restore and recovery

1.3 Task performed during internship 1.3.1 Data entry: Since the advent of computers, and since the beginning of typing, the need to collect and neatly present documents has required data entry. Data entry is the act of transcribing some form of data into another form, usually a computer program. Forms of data that people might transcribe include handwritten documents, information off spreadsheets from another computer program, sequences of numbers, letters and symbols that build a program, or simple data like names and addresses. Disease surveillance is the routine ongoing collection, analysis and dissemination of health data that includes the detection and notification of health events, investigation and confirmation of cases or outbreaks, creation of reports, provision of feedback, and feed-forward to the higher levels for public health interventions. To direct these interventions the surveillance team must provide detailed epidemiological information. 1.3.2 Data Verification and Validation Data verification is a process which ensures the completeness, correctness, and compliance of a data set against the applicable needs or specifications. This we all know. We even know data verification takes into consideration the double check of procured data to correct all the necessary human errors against the actual information gathered. The purpose of data verification is to ensure that the stored data can be easily located and found whenever searched for irrespective of technical specifications, the location, and the source. So that effectively it can help in accelerates organizational processes. Data validation is the process of ensuring that a program operates on clean, correct and useful data. It uses routines, often called "validation rules" or "check routines", that check for correctness, meaningfulness, and security of data that are input to the

system. The rules may be implemented through the automated facilities of a data dictionary, or by the inclusion of explicit application program validation logic. The simplest data validation verifies that the characters provided come from a valid set. For example, telephone numbers should include the digits and possibly the characters (plus, minus, and brackets). A more sophisticated data validation routine would check to see the user had entered a valid country code, i.e., that the number of digits entered matched the convention for the country or area specified. Validation methods Allowed character checks Checks that only expected characters are present in a field. For example a numeric field may only allow the digits 0-9, the decimal point and perhaps a minus sign or commas. Consistency checks Checks fields to ensure data in these fields corresponds, e.g. If District Code= "KTM", then District name = "Kathmandu". Data type checks Checks the data type of the input and give an error message if the input data does not match with the chosen data type, e.g., In an input box accepting numeric data, if the letter 'O' was typed instead of the number zero, an error message would appear. Format or picture check Checks that the data is in a specified format (template), e.g. dates have to be in the format DD/MM/YYYY. Regular expressions should be considered for this type of validation. Limit check Unlike range checks, data is checked for one limit only, upper OR lower, e.g., data should not be greater than 2 (<=2). Logic check Checks that an input does not yield a logical error, e.g., date of patient visit should be greater that date rash onset incase of measles.

Presence check Checks that important data are actually present and have not been missed out, e.g., Patient name must always be specified. Range check Checks that the data lie within a specified range of values, e.g., the month of a person's date of birth should lie between 1 and 12. Spelling and grammar check Looks for spelling and grammatical errors. Uniqueness check Checks that each value is unique. This can be applied to several fields but mostly applied to primary keys. Table Look up Check A table look up check takes the entered data item and compares it to a valid list of entries that are stored in a database table.for e.g. it is used to validate the names of villages or municipalities.

1.3.3 Data Cleaning Data cleansing, data cleaning, or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. Used mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant, etc. parts of the data and then replacing, modifying, or deleting this dirty data. After cleansing, a data set will be consistent with other similar data sets in the system. The inconsistencies detected or removed may have been originally caused by user entry errors, by corruption in transmission or storage, or by different data dictionary definitions of similar entities in different stores. Data cleansing differs from data validation in that validation almost invariably means data is rejected from the system at entry and is performed at entry time, rather than on batches of data.

The actual process of data cleansing may involve removing typographical errors or validating and correcting values against a known list of entities. The validation may be strict (such as rejecting any address that does not have a valid postal code) or fuzzy (such as correcting records that partially match existing, known records). All data is dirty or inconsistent. That's one of the rules of computer- assisted reporting. That means, perforce, that all data must also be cleaned. Data cleaning can be performed with update, find and replace, lookup table. But we should always work off a copy of your table to preserve the original data in its raw form. 1.3.4 Data Backup Backup or the process of backing up is making copies of data which may be used to restore the original after a data loss event. Backups have two distinct purposes. The primary purpose is to recover data after its loss, be it by data deletion or corruption. Data loss can be a common experience of computer users. The secondary purpose of backups is to recover data from an earlier time, according to a user-defined data retention policy, typically configured within a backup application for how long copies of data are required. Though backups popularly represent a simple form of disaster recovery, and should be part of a disaster recovery plan, by themselves, backups should not alone be considered disaster recovery. 1.3.5 Data Analysis Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains. Type of data

Data can be of several types

Quantitative data is a number

Often this is a continuous decimal number to a specified number of significant digits

Sometimes it is a whole counting number

Categorical data: data one of several categories Qualitative data: data is a pass/fail or the presence or lack of a characteristic

Data analysis and report generation can be performed by Pivoting and charting. 1.3.6 Performing Queries Queries can be used to quickly analyze and sort information that is in an Access database. A query allows to present a question to your database by specifying specific criteria. Queries allow to specify:

The table fields that appear in a query. The order of the fields in a query. Filter and sort criteria for each field in a query.

Queries have two views: Design view Datasheet view

In the Design view, we specify which tables we want to see, which tables they come from, and the criteria that records have to meet in order to appear on the resulting database. Criteria are tests that records have to pass. In the Query Datasheet view, we view the records that are found to meet your criteria.

When we run a query, Access pulls data out of tables and puts the data in a database for us to see. The original table and database stay connected, so that if we make changes to the data on the database, the results of the query also change. A select query can be used to select certain data from a table or tables. It basically filters and sorts the data and can perform simple calculations, such as summing and averaging.

1.4 Statement of Problem and Objectives WHO-IPD receives the surveillance data from 11 field offices and 15 medical officers from across the country. The data is received in Microsoft Excel format which is being manually entered into the Access database. The problem with this approach is: 1. Time consuming 2. Prone to human error during data insertion. 3. Resource Overhead

Hence, It was determined that automated computerized system to update the database with the periodic surveillance data would do benefit to the organization. And the task of developing prototype software for this objective was given to me during the period of my internship.

1.5 Literature review and methodology There are a number of tools available for the purpose of converting spreadsheet files into access database. Also the Microsoft Access itself has an option to export data from the excel files. But the WHO-IPD requires the customized converter as some of the fields are dependent upon the other, some need to be removed and some fields should be added and data need to be validated before insertion into database. The conversion of .xls file to .mdb file will be a 2 step process: 1. Convert .xls file to .csv (Comma Separated Variable) format. We will make use of ExcelDataReader v.2.0.1.0, a Lightweight and fast library written in C# for reading Microsoft Excel files ('97-2007). Thus read rows of source excel file will be converted to CSV file in the first phase. 2. Bulk insert the data from CSV file to access database.

10

Chapter 2
System Analysis
System development can generally be thought of having two major components: systems analysis and systems design. In System Analysis more emphasis is given to understanding the details of an existing system or a proposed one and then deciding whether the proposed system is desirable or not and whether the existing system needs improvements. Thus, system analysis is the process of investigating a system, identifying problems, and using the information to recommend improvements to the system. During System analysis 2 types of requirement analysis is performed: 2.1 Functional Requirement The system should be able to convert flat excel files to database files. Functional requirements explain what has to be done by identifying the necessary task, action or activity that must be accomplished. Functional requirements analysis will be used as the top level functions for functional analysis. 2.2 Non Functional Requirement Non-functional requirements are requirements that specify criteria that can be used to judge the operation of a system, rather than specific behavior. Non functional requirement are the quality requirements that can guarantee or promise how well software do. Non Functional Requirement is specified as:

Performance:

It is the measure of the response time. It is the issue concerned with Short response time for a given piece of work , High throughput (rate of processing work), Low utilization of computing resource(s) , High availability of the computing system or application.

Reliability

11

It means the extent to which program performs with required precision. The system developed should be extremely reliable and error free. Reliability is often measured as probability of failure, frequency of failures, or in terms of availability.

Usability

The application should be user friendly and should require least effort to operate.

Flexibility

It is effort required to modify operational program. The whole application should be made using independent modules so that any changes done in one module should not affect the other one and new modules can be added easily to increase functionality.

12

Chapter 3
3. System Design
3.1 DFD(Data Flow Diagram) DFD is a picture of the movement of data between external entities and the processes and data stores within a system.

Excel File 1 USER Filename Select Excel File Access Database filename Selected file

Datarecords csv file 5 Select CSV file 6 Insert into database file name Sheet 2 Select Sheet

3 Specify Output filename and folder Csv file 4 Convert File File and folder name

CSV file

Figure 1: DFD of the system to convert excel to access format.

13

3.2 Use Case Diagram


A Use case is a list of steps, typically defining interactions between a role/ actor and a system, to achieve a goal. The actor can be a human or an external system.

3.2.1 ( Excel to CSV conversion module)

Select Excel file

Select the sheet

Specify output folder and file name

USER

Convert Excel file to CSV

14

3.2.2 (Module to insert into database)

Select CSV file

Specify Database

Insert into database

USER

15

3.3 External Design:


The goal of external design is to create a description of all elements of the application which interact with users or external systems.

3.3.1 User Interface


User Interface Design is concerned with how users add information to the system and with how the system presents information back to them.

16

3.4 Internal Design 3.4.1 Database Design


Database design is the process of producing a detailed data model of a database. This logical data model contains all the needed logical and physical design choices and physical storage parameters needed to generate a design in a Data Definition Language, which can then be used to create a database. A fully attributed data model contains detailed attributes for each entity.

Field Name CaseID OutbreakID Source SourceName PatientName COUNTRY Region DISTRICT VDC/Muni Ward DNOT DOI DOB SEX Ageyr Agem AGRP Vaccinated Hospitalized OUTCOME

Data Type TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT NUMBER DATE/TIME DATE/TIME DATE/TIME TEXT NUMBER NUMBER TEXT TEXT TEXT TEXT

17

DONSET RASH FEVER COUGH CORYZA

DATE/TIME TEXT TEXT TEXT TEXT

CONJUNCTIVITIS TEXT VitaminA AnySpecimen DateBlood LabID Serum Rubella DateUrine UrineResult DateThroatSwab ThroatSwabResult CLASS Fup DFUP Travel TEXT TEXT DATE/TIME TEXT TEXT TEXT DATE/TIME TEXT DATE/TIME TEXT TEXT TEXT TEXT TEXT

18

Chapter 4
4. Implementation After system analysis and system design phase, comes the implementation phase. It includes : 1.Coding 2.Testing 3.Installation 4.Documentation 5.Maintenance 4. 1 Coding During coding, physical design is converted to program by coding programming language. Coding is the process of designing, writing, testing, debugging, and maintaining
the source code of computer programs. This source code is written in one or more programming languages. The coding and testing process are parallel executed during

the project implementation phase. The software was developed using .Net framework with C# for coding and OLEDB . 4.2 Testing System testing is the stage of implementation that is aimed at ensuring that the system works accurately and efficiently before live operation commences. Testing is vital to the success of the system. System testing makes logical assumption that if all the parts of the system are correct, then the goal will be successfully achieved. A series of testing are done for the system before the system is ready for the user acceptance testing. The following are the types of Testing: 1. Unit Testing 2. Integration Testing 3. Validation Testing

19

Unit Testing: The procedure level testing is made first. By giving improper inputs, the errors occurred are noted and eliminated. Initially the CSV conversion module is tested to verify that the excel files are being correctly converted to the CSV format and the insertion module is tested to verify that Integration Testing Testing is done for each module. After testing all the modules, the modules are integrated and testing of the final system is done with the test data, specially designed to show that the system will operate successfully in all its aspects conditions. Thus the system testing is a confirmation that all is correct and an opportunity to show the user that the system works.

Validation Testing The final step involves Validation testing, which determines whether the software function as the user expected. The end-user rather than the system developer conduct this test most software developers as a process called Alpha and Beta Testing to uncover that only the end user seems able to find. The compilation of the entire project is based on the full satisfaction of the end users. In the project, validation testing is made in various forms. In question entry form, the correct answer only will be accepted in the answer box. The answers other than the four given choices will not be accepted.

4.3.Installation
Installation is the act of making the program ready for execution.

20

4.4.Documentation Software documentation or source code documentation is written text that accompanies computer software. It either explains how it operates or how to use it, and may mean different things to people in different roles.

Documentation is an important part of software engineering. Types of documentation include: 1. Requirements - Statements that identify attributes, capabilities, characteristics, or qualities of a system. This is the foundation for what shall be or has been implemented. 2. Architecture/Design - Overview of software. Includes relations to an environment and construction principles to be used in design of software components. 3. Technical - Documentation of code, algorithms, interfaces, and APIs. 4. End User - Manuals for the end-user, system administrators and support staff. 5. Marketing - How to market the product and analysis of the market demand.

21

4.4.1 End User Documentation: 1.Step 1Select the excel filename, by clicking on the browse button.

2.Step2 Select the sheet from the drop down list that displays the names of sheet in the selected excel file.

22

3.Step 3 Specify the output filename and output folder and then click the convert button. It will display the File Converted message box. Now the excel file is converted to CSV file.

23

4.Step 4 Then select the file to be inserted and specify database name, then click INSERT button. It will display a message box successfullt inserted.

24

Chapter 5
Limitations and future Enhancements

During testing with other data, it was successful in converting other excel files to access file but in case of VPDIFA database, before implementation, the excel file should be pre-processed to match the field types defined in VPDIFA which is based on MS Access. Preprocessing task includes: The header information in the excel file should always be removed. But once the header information is removed, outbreak code and outbreak date reported (if any) would not be accessible. Also as data related to outbreak spans on multiple cells it cant be extracted, hence should be dealt with manually. Swap the position of columns in excel file to the position in database file. Separate VDC/Municipality name and ward number. Identify the age group.

Limitations of this Software: It assumes that Source column has always only the numbers 1,2,3,4,5 representing SMO,AS,RU,OS,RRT respectively. Can't check the spellings of VDC and Municipalities. But it is enforced that the name of VDC/Municipality should be same as defined previously. Hence, the program might throw error if the spelling of VDC/Municipality doesn't match as specified.

25

Conclusion
The work I made during my internship gave me a lot of knowledge in the fieldof database management, such as data validation and verification, cleaning, analysis, performing queries, along with designing and implemention of forms. The file format conversion software I developed during this period gave me an insight on various aspects of programming and MS access database.
In review this internship has been an excellent and rewarding experience.

26

References
http://exceldatareader.codeplex.com/ www.whoipd.org http://en.wikipedia.org http://stackoverflow.com

27

Annexure
1.Source Code //Code to read Excel files. private void getExcelData(string file) { if (file.EndsWith(".xlsx")) { FileStream stream = File.Open(file, FileMode.Open, FileAccess.Read); IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream); result = excelReader.AsDataSet(); excelReader.Close(); } if (file.EndsWith(".xls")) { FileStream stream = File.Open(file, FileMode.Open, FileAccess.Read); IExcelDataReader excelReader = ExcelReaderFactory.CreateBinaryReader(stream); result = excelReader.AsDataSet(); excelReader.Close();

} List<string> items = new List<string>(); for (int i = 0; i < result.Tables.Count; i++) items.Add(result.Tables[i].TableName.ToString()); comboBox1.DataSource = items;

28

//Code to convert into CSV format private void convertToCsv(int ind) { string a = ""; int row_no = 0; while (row_no < result.Tables[ind].Rows.Count) { for (int i = 0; i < result.Tables[ind].Columns.Count; i++) { if (i == 6 || i == 8) {

} else if (i == 2) { a += result.Tables[ind].Rows[row_no][i].ToString(); if (result.Tables[ind].Rows[row_no][2].ToString() == "1") a += " - SMO"; else if (result.Tables[ind].Rows[row_no][2].ToString() == "2") a += " - AS"; else if (result.Tables[ind].Rows[row_no][2].ToString() == "3") a += " - RU"; else if (result.Tables[ind].Rows[row_no][2].ToString() == "4") a += " - OS"; else if(result.Tables[ind].Rows[row_no][2].ToString() == "5") a += " - RRT"; a += ","; }

else a += result.Tables[ind].Rows[row_no][i].ToString() + ",";

29

} row_no++; a += "\n"; } string output = textBox1.Text + "\\" + textBox2.Text + ".csv"; StreamWriter csv = new StreamWriter(output, false); csv.Write(a); csv.Close(); MessageBox.Show(" File Converted"); txt_browse.Text = ""; textBox2.Text = ""; textBox1.Text = ""; comboBox1.DataSource = null; return;

//Code to insert modify the attribute values and insert into database private void button1_Click(object sender, EventArgs e) {

StreamReader sr = new string tex = sr.ReadToEnd();

StreamReader(Txt_SelectToInsert.Text);

char[] text = tex.ToCharArray(); int count = 0; int array_index = 0; while(count != text.Length) { if(Convert.ToString(text[count]) != "\n")

30

count++; else { array_index++; count++;

} }

string[] db = new string[array_index]; int j = 0; int i = 0; while (i < array_index) { if (Convert.ToString(text[j]) != "\n") { db[i] += Convert.ToString(text[j]); j++; } else { i++; j++; } } for (i = 0; i < array_index; i++) {

string constr = @"Provider=Microsoft.Jet.OLEDB.4.0; Data Source = "+ txt_db.Text +""; OleDbConnection conn = new OleDbConnection(constr);

31

try { conn.Open(); OleDbCommand cmd = new OleDbCommand(); cmd.Connection = conn; string dummy_var = Convert.ToString(db[i]); char[] final = dummy_var.ToCharArray(); string enter = "'MSLNEP"; int x = 0; for (x = 0; x < dummy_var.Length-1; x++) { if (final[x] != Convert.ToChar(",")) enter += final[x]; else { enter += "'"; enter += ","; enter += "'"; } } enter += "'"; cmd.CommandText = "insert into final_tbl values(" + enter + ")";

cmd.ExecuteNonQuery(); }

catch (Exception ex) { MessageBox.Show("Error try again" + "\n" + ex);

32

finally { conn.Close(); } } Txt_SelectToInsert.Text = " "; txt_db.Text = " ";

private string District_check(string district) { switch (district) { //code for returning district name based on district codes default: return " ";

} }

33