You are on page 1of 3

EXCEL PRACTICAL DATA ANALYSIS Introduction Excel has an add-in programme called Data Analysis.

. This programme enables you to do a number of statistical analysis methods. The purpose of this practical is to give you the opportunity to apply some of the statistical tests mentioned in the notes. The data base we are going to use come from the clinical records of a large equine hospital and the data was recorded over 10 years. This data has been set up as a Master table as mentioned in your notes. Task 1. 1. 2. Create a new worksheet and call it Task 1 3. Copy from the master table the columns Patient Number (H), Patient Names (I), Birth Date (J), Age (K) and sex (L) and past this into your worksheet Task 1 4. Open your Tools menu and find the data analysis menu 5. select descriptive statistics and for the input range select the column age 6. Call the output worksheet Dstat and press OK to generate a table of descriptive statistics for the column age. 7. Format your results to 2 decimal places. 8. Notice the mean age of horses is a fair bit higher than the median age. Why do you think that is? 9. What do you think is the best measure of central tendency is for age? 10. What do you think about the range of age groups? Task 2 1. Create a new worksheet and call it Task 2 2. Copy from the master table the columns Patient number, sex, Haematology HT (AW), Haematology WCC (AX) and Haematology Nmat (AY) and paste them into the Task 2 worksheet. (WCC stands for white cell count, HT = heamatocrit and Nmat = mature neutrophiles). 3. Select all 5 columns and sort your data by Heamatocrit HT so that all the records that have NA are now at the bottom of the columns. (NA = not available) the sort option can be found under the menu Data. 4. Separate the NA records from the others and delete all the NA records 5. Now select all 5 columns and sort the data by sex so that all the female records are on top. 6. Analyse your data using the t-test assuming equal variances to see if there is a significant difference between female and male HT levels. i.e. go to data analysis, select t-test assuming equal variances, open, for variable 1 range select all the HT data corresponding to females and for variable range 2 select all the HT data corresponding to males. 7. Name your new worksheet output as t-tests 8. Press OK and examine the results of your analysis. Is there a significant difference in mean HT between males and females? What level of significance did you use to come to this conclusion? 9. Now repeat the exercise by comparing the mean WCC for males and females. Select your output this time to be a cell reference in the worksheet t-test below your results for the HT comparison of means. Do the WCC differ significantly between males and females?

2 10. Repeat the exercise again comparing the mean Nmat counts for males and females. Are these significantly different? 11. Why did we select a t-test assuming equal variances?

Task 3 1. Create a new worksheet and call it Task 3. 2. Copy from the master table the two columns Clinician and Clinical Examination/ Respiration (breaths/min) and paste them into the worksheet called Task 3. 3. Select both columns and sort the data by Respiration to get the NA records at the bottom of the column. 4. Delete all the NA records. 5. Select both columns and sort the data by Clinician. 6. Create a column for each of the 16 clinicians putting a name at the top of each of the columns (C-U) 7. Copy and paste the relevant Respiratory data for each clinician into that clinicians column so that you end up with 16 columns of respiratory data next to each other corresponding to each clinician. 8. Do an ANOVA to see if the mean Respiratory counts differ between the clinicians. i.e. go to Tools, data analysis, select Anova single factor, select your 16 columns of data as the input range, Name your new worksheet output as ANOVA, press OK. 9. Can we show that there is a difference in mean respiratory counts between clinicians? 10. Why did we choose a single factor ANOVA? Task 4 1. Create a new worksheet and call it Task 4. 2. Copy from the master table the following six columns: Patient Number, Haematology HT (AW), Haematology WCC (AX), Haematology Nmat (AY), Haematology Nimm (AZ), Haematology Lymphocytes (BA), Haematology monocytes (BB) and paste them into the Task 4 worksheet.. 3. Select all the columns and sort by HT to get all the NA records at the bottom; delete the NA records. Repeat the process but sort by WCC and remove the remaining NA records; continue with this process and sort by Nmat, then Nimm, then Lymphocytes and finily monocytes until all the NA records have been removed from your columns. 4. You now want to see if there is a correlation between the different types of white cell counts. Select Tools, data analysis and then correlation. Your input range will be all the columns (except Patient number) and call your output worksheet Corr. Press OK. 5. Examine the results of your correlation and explain what they mean. Which two parameters have a strong/good positive correlation? (i.e. as the one increases the other one also increases at the same rate). 6. What does a negative sign mean? Task 5 1. Create a new worksheet and call it Task 5 2. Copy the columns Haematology WCC and Haematology Nmat from your Task 4 worksheet into the Task 5 worksheet. 3. You now want to see if there is a linear relationship between WCC and Nmat and you want to model this relationship. To do this we need to do a linear regression model. 4. Select Tools, data analysis, regression. 5. Select as your dependent Y input the Nmat and as your independent X input the WCC; Name your new workbook output as Regression and press OK. 6. Examine your results Note: the coefficients. Remember the linear regression equation is Y=b0+b1X, where b0 is the intercept and b1 is the slope (regression

3 coefficient); in this case for WCC. The regression coefficient tells you by how much Y (in this case the WCC) is going to change for every unit change in X (in this case the Nmat). 7. Put the intercept (b0) and slope (b1) into the regression equation and work out what Y (the Nmat count) will be if X (the WCC) is 9.8. You can now use your WCC to predict your mature neutrophiles count. Task 6 1. Create a new worksheet and call it Task 6. 2. Copy the columns Before referral/antibiotics (AA) and Before referral/antiinflammatories (AB) into worksheet Task 6. 3. Use the =COUNTIF(AA2:AA936,"yes") function to count the proportion of yes records for each column. 4. Do the same to count the number of no records for each column. 5. You can now use this count data to work out the Odds of a horse receiving antiinflammatories relative to antibiotics and the Chi-square to see if there is a significant difference in the proportion of horses receiving antibiotics compared to horses receiving aniti-inflammatories. 6. Do this by opening EpiInfo. 7. Select the menu Utilities 8. Select stat calc. 9. Select 2x2 tables 10. Enter your count data from your Excel spreadsheet for the proportion of horses receiving antibiotics and the proportion receiving anti-inflamatories. 11. Press F4 and note the results, which show both the odds Ratios and Chi-square tests. 12. Is there a significant difference between horses receiving antibiotics and antiinflammatories? 13. How much greater are the odds of a horse receiving anti-inflammatories than antibiotics? You have now completed all the Tasks please sign the class list showing you have completed these tasks.