You are on page 1of 19
‘THE UNIVERSITY OF QUEENSLAND. Excel Data Analysis Course objective: Import data © Use statistical functions in Excel + Greate histograms * Gain insights from your data Student Training and Support Phone: (07) 33ea4s12 Emat asuiibrary.uq edu.cu, a: tpssineab. brary uq.ecdu.aulibraryservices taining Staff Training (Bookings) Phone (or, 3868 2668, Email saber @uqedu.au Web hip: thar ug. edu au/atatdevolopment Slot may contac thir rer with enqurios and feedback related to rang earlert, Please contact Saf Development for booking ‘erqures yourleeal LT, SJopet ee genera terra angus, Resprosiuced or adapted from axiginal content provided under Creative Commons license by ‘The Universi of Guveenslond Library Ud Library Siatf and Student LT. Tearing THE UNIVERSITY OF QUEENSLAND: Table of Contents Importing External Data... pinrin Exercise 1. Importing External Data... Exoroise 2. Importing data from a fil. Descriptive Statistics. eee Exercise 3. Using Dascriptve Statistics Statistical Functions.o..a1cisntnem eds Exereise 4, Using basic statistical tunctions in Excel Using Variance and Standard Deviation in Excel... Exercise 5. Variance and Standard deviation , Histograms and Frequency ooncre-nensnimnrn Exercise 8, Creating histograms. Corretation and Linear Regression... Exercise 7. Calculate Comelation Cavefficiet Exercise 8. Create Chart and Linear Regression Exercise 9. Forecasting a Feb caiman wi Exercise 10. Significance tests... ANOVA: Analysis of Variance Exercise 11, ANOVA: Analysis of Variance Rank and Pereentles Exercise 12, Obtaining your Rank, Exercise document: Go to httpsfwets ibrary.ug.edu.awlibrary-services'training/training-resources and click on Data Analysis (ZIP,40,9 KB) to dowmload, Save these files on your Hi drive or to your local machine or a USB drive, Statistical Function definitions can be found al: biigs:i'supoor ofice, comisn -ostarictelstalistical functions -retecenoe-024dac06-a375-4495-be25- ede 97 13th ote Microsoft Excel: Date Analysis THE UNIVERSITY QUEENSLAND: Ud Library Siatf and Student LT. Tearing Importing External Data Data located in compatible external fles can be imported into excel without the need to retype all the information again. Depending on the formal of the data you would ike to import, different methods can be used, including opening and saving in Excal, linking to data, importing data and copying and pasting data into excel Exercise 1. importing External Data Open the spreadsheet Data Analysis_Exercises.xIsx (which can be found under the Excel section on the Library Training Resaufoes page. The External Data Link sheet is selected Importing Data from websites Data from websites and other sources can be imparted into Excel iit isin an appropriate format 1. Copy the URL of the wets page with the data you want to import. 9. World University Rankings on Wikipedia (which can be found Inve AT of te External Oata Link shset) hitpsian.wkipeda.ormlwiklOS. Worle Universi” Rankings Note: For this exercise ignore From Web in the Get External Data group. It will bring in the entine web page and not just a selected table 2, Navigate to the Data tab 3. Glick om New Query {in the Get & Transform group) 4. From the drap down menu, select From Other Sources > From Web Gatemame | This opens he cslogue Hox fr you to ener the URL ofthe wes page wit the data yeu want Ipod. |g msi &. Paste the URL in the From Web dialogue box and click OK ‘The Navigator Pane wal open wih alist of data that can te Inmpostad into excel 6. Select the required data set (QS World University Rankings — Top 50) on the left pane of the Navigator to preview it NB: You can use the ect butian to isan tne data bet Imposing 7. Select QS World University Rankings — Top 50 8. Click om Load ote IMicmeoft Event: Date Analysis UO Library THE UNIVERSITY ‘Staff and Student LT, Training OF QUEENSLAND: coeanenepermrenmnne Saomimosensaee QE | =o an a 9. Click an any cell within the data table ' raul 10, Click on the Data tab {rnmtate EAS rapa ] ree | 11, Select Refresh All NB: Retiesn al wil retresh all connections in tha workbook. you want refresh data on a single sheet cick Rstosh NB: You may got tlcrosoft Excet Security Nosce sbout cee | ‘conngctens 10 external data sources, YoU can safely cick OK Mere gut amin eens tee but see the section on Gonsideratians when importing data into eves ny vrrrecsysrsetat ne Excel below for fuer informatie, ‘Sreitienitaartoors tt Considerations when importing data into Excel Malware / Macros — Unfortunately there are ways to hide malware inside Exoal files, This is Usually done via “macros” which are litle programs that are typically created to do complex or repetitive tasks. Because hackers have exploited these tools, Microscft has disabled macros by default in Excel. In fact, when you open an Excel fle from an untrusted source, you will gat a security warming like this one. if you are working on data from an unknawm or untrusted source, Use caution before “Enabling Editing” ‘Some hackers have even learned to use sooial engineering techniques to try and trick users into tuming macros back on. For example there may be an image in the file that appears blurred with a note that it ia for acourity reasons. The goal io to get you to enable macros co that you oan ‘cce' the image when, in reality, enabling the macro allows thevirus to run. Gf course if you have good antivirus / anti-malware programs installed, they will go a long way towards mitigating thet threat. References within a file or sheet to external data You can refer to the contents of cells in another Excel workbaok by creating an extemal reference. An external reference (also called a link) is reference fo a cell or range on a worksheet in another Excel workbook, of a reference ta @ defined name in another warkbook. If your data is coming from a source beyond your immediate control, you may find that these ‘links’ are broken, If you don't have access to the workbooks!worksheets where the underlying dala lives, you won't be able to use it via the link in the spreadsheet you are currently working on. sorte Nsomcoft Enea: Date Anatyoie UG Library THE UNIVERSITY ‘Staffend Student LT. Training OF QUEENSLAND Exercise 2. Importing data from a file Open exercise files and enable content mS Se 1. Open the exercise fle Data Analysis_Exercises.xlst and select the Importing Data & Histagrams worksheet. 2. Click on the Enable Content bution on the Security Warning (if necessary) 3. Ifyou get a Security Waming dialog box. Glick on Yes a Note: in Office 365 (Windows version) Microsoft removed the Text import Wizard as an option when using steps below. They force you to use the Power Query window which does not have the “Treat consecuttve delimiters as one” option. You can get around this by opening the text le ‘rectly in Excel whien will launch the wizard below Import data from text file: 4. Click the Data tab 5. Click From Text (in the Get External Data group) Locate data_analysis.txt Click on Import (in Mac - Get Data} Click on Delimited option Click Next 10. Tick the following options Tab Space Treat conseoutive delimiters as ane 11. Click Next Sots ms | als ce | Ec) fa (rom Table! \ eR From One ening Naw Sib Sources” Cenretiors Cun [beet Soa} Sala are [ime epta eee tren | retmpon wad Sip 18 In sreneneyosse tne sete youre reo wee Bm Csemcston Open Drea eencecte setters a one Tet atti 4 Clo IMicmeoft Event: Date Analysis Ud Library Siatf and Student LT. Tearing 12, Ensure General option is selected 13, Click Finish ‘14, Assign data to SAS1 in existing worksheet 15, Click OK Botts BR) or qutsnstann Tetpen an Sep 3a (ota aterm ‘let now you ante eth tain your monsek i wee Brooms menart [sedis ania ie Oaeabtasel a Microsoft Excel: Date Analysis Ud Library THE UNIVERSITY Staff and Student LT. Training QUEENSLAND: Descriptive Statistics Descriptive statistics is the discipline of quantiatively (expressed as numbers) describing the main features of a collection of data, Excel's Analysis Toolpak add-in offers a variety af features to Undertake statistical computations and graphing. Descriptive Statisies is included to provide statistical averages (mean, mode, median), standard error, standard deviation, sample variance, kurtosis and confidence levels of sample data, Exercise 3. Using Descriptive Statistics Mac users may need to add the Analysis Teal Pack Data Tab — Far right hand side — click Analysis: ‘Anaiysie Too | Tools button Click next to Analysis ToolPak Choose OK sewn Too Pros deta wad static ad erences © Aratyia Toote The Data Analysis button will now be visible Biba 1, Click Data Analysis (at the far right of ribbon} on the Data tat 2 Click Descriptive Statistics 3. Click OK Tote IMicmeoft Event: Date Analysis a Litrary ) THE UNIVERSITY Saf and Studeot LT, Tearing ) OF QUEENSLAND: auatestis 4. Highlight cells $AS1:SD8201 for InputRange | a 5. Select Grouped by columns cape el c= 6 Click Labels in first row box pa et ee 7. Click Output Range meen 8. Highlight cel $681 for Output Range one =e 9. Select Summary statistics pea 10. Click OK epee er NB: To obtain deserintive statics for one group See ensure that only ane column is selected E Statistical Functions Exercise 4. Using basic statistical functions in Excel To use Basic Statistical Functions 1. Ensure you are on the Basic Statistics worksheet Select the Home tab Click in cell C44 Click Autosum Check the range is (C5:C71) 5. Press Ent v fe) =sumicsicni) 6. Use Autofill 0 calculate sum for remaining weeks) 7. Caloulate with statistical functions ‘Total 4981 =SUMICS-C11), ‘Sample size = COUNT ‘Sample Size} T =COUNT(CE:C11) Mean = AVERAGE Mean) 283 “AVERAGE(C5.ct ‘Minimum walue = MIN. ane i Maximum value = MAX Note: Meas and average are diferent tes for the ‘same Gung when desing witn Statstien 707 =WANCE C14) Botte IMicmeoft Event: Date Analysis UO Library | THE UNIVERSITY Sta ond Sudent LT, Tring BEN) oF queenstanp- 8, Select cells C1410 C18 = Fl 9. Autofal seross to fil cells in count 7 remaining weeks average 2 ‘min 12 mi ral Bh NB: For quck statistical reference rele to status bar etter highlighting @ selection of values. Adjust options on flats bar by night chcking on land selackng Heme. Using Variance and Standard Deviation in Excel Variance is a measure of the average of the squared difference from the mean. Here te how tie defined manually: ‘Subtract the mean from each value In the data, This gives you 8 measure of the distance of each value from the mean. ‘© Square each of these distances (so thal they are all psitive values), and add al of the squares together, + Divige the sum of the squares by the number of values in the data set. (if calculating variance for a sample subtract 1 from the number of values) Mean ofthe Values | 281. Subtract the meun 2. Square theesult 3.akdd [Sum the squires. Dh by the uber of lure Mon] ase 12 18 73000 seuzestu tela 08 as wee] ase 1 aes Thu] 136 a siz 4 Using Vanorcefirchon eas 138 2a360 sua tat Sa|_—_77 so 2sani6 onl aa 410 260 ‘The standard deviation (a) is simply @ measure of haw clase the values are to the average. A smaller number means the values are bunched whils! a larger number indicates values that are spread out. Exercise 5. Variance and Standard deviation To use Variance Function on a sample 4. Click in cell C24 | cxselect tenors stnencl spa ‘serena click button in formula bar Change category to Statistical 4. Glick on VAR.S function 90tte Mecwsof Exesl: Date Analysis THE UNIVERSITY QUEENSLAND: Ua Library Siatf and Student LT. Tearing 5. Select range (C5:C11) 6 Click om OK mumbeet vc) To use Standard Deviation Function ona iéaniple (Cruse ginger ses 1. Click in cot C22 2. Click bution in formula bar Change category to Statistical Click om STDEV.S function 5. Select range (C5:C11) 6 Click om OK Mente [EBL saeierz Repeat steps above for entire population using range (C5:114) "Click cell C25: Overall Average: =AVERAGE(CS:It1) + Click cell G26: Overall Variance: =VAR,P(G5!I11) + Glick sell C27: Qverall Std Deviation = STDEV.P(CS:IN1) = Click cell;C33: Overall Sum To find WeeklyTotal as a percentage of the Overall Total 4. Goto cell C34 2. Enter =C14/C33 in the formula bar 3. Press function key F4 Note: This wit change cll reference‘ to absolute roteronce IL sc333 4 Press enter 5, Alitofil across (D414) 1001 Micmcof Excel: Date Analysis THE UNIVERSITY QUEENSLAND: Ud Library Staff and Student LT. Training Histograms and Frequency Avhistegram is used ta display tabulated frequencies of data in graphical torrn. tis able to show the proportion of data that fits inta specific categories or bins. For example, we may want to find ul how many items were of a particular length. e.g. 100mm. Excel provides a Histogram tool which is available via the Analy is ToolPak ad¢.in. With the lalest versions of Excel there is now a Histogram chart aveilabie in the Statistics chart options. Exercise 6. Creating histograms Use worksheet ‘Importing Data & Histograms” Using the tool in Data Analysis Prepare data for a histogram of weights = 4 Goto cell F18 2. Type "Bin™ 3. Goto cell F20 4. Type 0 5. Goto cell F2t 6. Type 50 7. Select F20 and F24 8. Autofill to display a value of 500 in cell F30 Input Range: Thies the data that you want to anslyse by using the Histogram took [Bin Range: This represents the infenvals tat you want-the Histogram tool fa use far measuring the inpul data in the data analysis, 9. Click Data Analysis (at the far right of the ribbon} on Data tab 10, Click on Histogam 11, Click OK ‘Complete the dialog box as follows: ‘Input Range = $A1:$A201 # Bin Range = SFS19-$F830 Tick Labels. = Qutput Range: $1521 + Tick Chart Output 12. Click OK To display the frequencies in Histogram: 1. Click on Histagram in worksheet 2. Click Data Labels on Add Chart Element button 4 Select Outside End NB: Table with Bin and Frequwncy headings will appear sfong with Mésiogram graph, Resize grap as required, trol IMicmeoft Event: Date Analysis Uo Library ‘THE UNIVERSITY Sta and Student LT. Training QUEENSLAND Using the Statisties Chart - Histogram option Select the data range A1:A204 | Insert tab — Charts - Statistics Chart - ab Histogram rc) wit = foe mre th teens am 4 Histogram wall appear - <—————— | Forman As : sega — Stee axis to launch the Format Axis panel an the =, jy right of the screen. = ‘Choose the Axis Option and expand the 9 ‘Axis Options = Set the Bin Width to 25 fen Set the Overflow bin to 200 ar Set the Underflow bin to 50 Mac: Faerie o Right mouse click the biue data series ‘columns {010 ‘Choose Format Data Series... Expand the Data Series Options (ifnecessary) «ua ‘Change Bins - Auto to Bin Width ree Set the Bin Width to 25 oot Set the Overflow bin to 200 Veep Set the Underflow bin to 50 a 120819 Mictoson Excel: Date Anatysic UG Library | THE UNIVERSITY Bia end Suideot LT. Waring BEN) oF queenstanp- Correlation and Linear Regression A correlation is a number between -1 and +1 that summarizes the relationship between two Variables, A correlation close to +1 is strong and positive, whereas a correlation close ta=1 is strong but negative. “A zero correlation means there is no relationship between variables. Linear regression is a statistical appraach to modelling the relationship between a scalar variable and one or more explanatory variables denoted X. It can be used far predicatian or forecasting, Exercise 7. Calculate Correlation Co-efficient Select worksheet ‘Correlation & Linear Regression Name cells to find correlation: 1 2. oe PxNHe Click on OK Select celis(B4:B14) Barbone Click Define Name {near middle of eves ribbon) on Formulas Tab siee Check name is “Ys Click on OK f Select cells (C4:014) Es Click Define Name on Formulas Tab — Cheek name is “Tuition Fees" kee: [ilo To calculate correlation co-efficient 1 2. Go to call B17 anaes, | tick * | button in formuta bar Select Correl function arava In Array 1, type Year (or press F3 for the OS ro Paste Name dialog box; Choose the name Year and press OK) In Atray 2, type Tuition_| Click on OK Format cell B17 to 2 decimal places Fans he creation coef! bebgen no data se Note: You will be presented with 2 strong positive cortelation of +089 between Year and Tuition Fee increases Hol 19 Microsoft Exes: Date Anaiysis UO Library THE UNIVERSITY Bia end Suideot LT. Waring BE oF qucenstanp Exercise 8. Create Chart and Linear Regression ‘Create a chart 1, Select cette(B4:C14) 2, Insert Tab > Charts group > Recommended Charts 3, Select Scatter ‘Add the regression line 1. Click Add Ghart Element button — re Trendiine — Linear Trendline 2. The Trendline will appear on the chart nese Format Trendline 3. Right click the Trendiine = 4. Choose Format Trendline eee a ¢ 8. Within Trendiine Options... o ll 6. Select Checkbox to ‘Display Equation on Chant" + Trenaline Options Select Checkbox to “Display R-squared value on chart” Note: The equation and R squared value will appear Sl res ee pate Ot towards the tap right ofthe chart. Ifthe formulas aro 7 Display B-squared valueon chart bbecured by the Trendline, you can move them by ‘selecting the text box wilh the formulas and then drag f fo where you want 140819 Nicmech Excel: Date Anatysis Ud Library Siatf and Student LT. Tearing THE UNIVERSITY QUEENSLAND: To Find Regression Summary 1. Click on Data Analysis on Data tab (far right on ribbon) 2, Select Regression 3. Click on OK 4, Input Y range, Select C4014 5, Input X range, Select B4:B14 6, Output Range, Select A22 7, Click on OK Note: You willbe presented with Summary Output which Inches ragransion analysis a Interpreting results: A demonstrated strong positive correlation: Equation (Yerxsc) Y= 308.63x + 4018.1 Matches the coetficionts in regression summary Intercept indicates the predicted cost of tuition in the Year 2000. This is the line of best fit value not the actual value(the line of best fit value for ¥ if X-9) X Variable indicates the average increase in § in tuition fees year to year approximately $308.63 Forecasting Forecasting is estimating the likelihood of an event taking place in the future, based on avaiable data. Statistical forecasting concentrates on sing the past to predict the future by identifying trends, pattems and business drives within the data to develop a forecast Exercise 9 Forecasting Use worksheet "Correlation & Linear Regression’ In Excel the FORECAST function takes raw trendline data, an input (independent variable) and returns the dependent variable 4. Click in $€$20 2, Click the Insert Function button 3. Select Forecast from the list of functions (search for Forecast in the search tox if you cannot see it) 180 19 Micwsoft Excel: Date Analysis THE UNIVERSITY OF QUEENSLAND: Ud Library Siatf and Student LT. Tearing X, select B20 Known_y’s, select 04-014 (the range name: Tuition Fees will appear) ere Boing & Known_»’s, select B4:874 (the range name eat =e gece Year will appear) - 7. Note how the indicaled answer matches the nkercept Vall of he reqretion analy 8. Click OK 9. In cell B20 type 20 to forecast the cast of a) tuition fees in year 20 SS ee T Tests ‘TTests are performed when yau have two sets of measurements of resus from given populations and you would like to compare them to-see if they are significantly different For example you may have two lists of measurements from the same set of people. The first set of measurements may have been taken in the morning and the second set in the afternoon. This type of Testis Known as a retated TTest or a paired TTest because you have tested the same population twice, Alternatively i you had two sets of measurements taken trom two sets of people with one set being in the moming and the other in the afternoon you would have an unpaired or independent TTest. This is because you have tested two different populations, If you are sure about the direction of differences, for example that the morning measurements are faster than the afternoon then you perform a one tail {test Ifyou are unsure about the difference between the values perform a two tail test Avresultis called "statistically significant” if the result of the t test comes in at belew .05. This is often refered to as the P Value. Exercise 10. Significance tests ¥ = a ‘ancien Trees 2 Morning. afeenson Ajvecens 985m 2s On the T-Test spreadsheet are two setles of ‘S penent 967 Lr measurements, Siren acm J Peoons 3900 tos These measurements are paired as they are from eyrween §. 2 war the same population but taken at different times. 4 — ; = Past a 18019 (Micreoft Excl Date Analysis Ud Library Siatf and Student LT. Tearing 1. Select cell B12 Using the Insert Function button search for and locate the T.Test function, Note: The TTest function i lll araBable for compatibility Purposes with Exee! 2007 and below, In the T-Test Function Argurhents diaiog box Arrayt and Array? are the cell anges containing the two columns of measurements. In this case B3-810 and C3:010 Tails can be either a Tora 2 Use 1 if you are sure about the direction of the differences. Use 2 if you are tinsure about the direction of the differences. ‘Type can either be a1, 2013 ‘Use 1 if your data is from a paired population. \Use 2 if your data is from an unpaired popuiaticn wath an equal variance. ‘Use 3 if your data is from an unpaired population ‘wath an unequal variance, trot BR) or qutsnstann Iscrcofi Exenl: Date Anaiysie THE UNIVERSITY OF QUEENSLAND: Ud Library Siatf and Student LT. Tearing ANOVA: Analysis of Variance In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups are ail equal. The ANOVA test is the initial step in identifying factors that are influencing a given data Sel. Anova should be performed on 3-ar more groups of data, Exercise 11. ANOVA: Use worksheet "ANOVA - Rank & percentile” To conduct the onesway ANOVA 2. Click on Data Analysis on the Data Tab (far right on ribbon) 3. Select Anova: Single Factor 4. Click OK. Analysis of Variance §. Select the input range (A1:C13) (automatically absolute references) Click “Labels in first row" option Select Output Range (A16) Click OK, © cuoutnance: se | Note: Descriptive statistics and ANOVA summary table are displayed on screen Interpreting results: in the summary section we can see the mean exam resuits for each class, But are these differences statstically significant? ‘There are two types of hypotheses. Null (negative) er Allernative (positive). Its best practice to use null hhypotheses co mo personal opinions creep in to the testing statoment. ‘Acnull hypothesis is 2 defauit postion and can never be proven, Statistically results can only reject ar fail to eject the null hypotheses, [Null hypotheses are always phrased as a negative statement e-g. There is no reel difference between the effectiveness of lectures, online delivery and video delivery, ‘The test result shows F =0.83 With a critcal P-value of 4, the critcal F = 3.285, Theretore, sinoe the F statistic is Smaler than the critical value, we failto reject the null hypothesis. Remember from before the P ‘value is statistically significant i tts below .05. This value of + shows there is some connection in the data though, So, we fall 1 reject iat here ts no difference between the effectiveness of Iectures, oniing delivery ‘and video deilvery, These values may be explained by the small sample size, A larger semple of data may ‘give more statistically significant resuls. Apparently, the diferences we saw inthis sample were simply due to random sarrpling errar. thoi 0 Nicmeoh Exenl: Date Anatysie Uo Library THE UNIVERSITY Sta ond Sudent LT, Tring BE oF qucenstanp Rank and Percentiles Percentile rank means the percentage of scores thal fell “at or below" a certain number, Percentiles are most often used for determining the relative standing of an individual in a population or the rank position of the individual. Percentiles measure position from the bottom. Exercise 12. Obtaining your Rank Use worksheet “ANOVA - Rank & percentile” 1. Click Data Analysis on the Data Tab (lar tight on riabon) 2. Click Rank and Percentile 3. Click OK Complete dialog box 4° Highlight cells $AS1:8C813 for Input Range NB: In this instanee, do not merety cick on eotumn A header asthe program wil process every raw nthe Raritan Peele = | spreadsheet aoe 1 Sn Grouper echecd al tong | in Grouped By, lumns i 2. Click Labels in first row er eo 3. Select Output Range os SMS1 Biuicciekwies 4. Click OK | cupstconse guoutharse en | new wnarerith | Interpreting resul Point » The location of the value within the original list. This can be used to quickly sort the output table into the same order of the original is I - This is the colurmn coniaining the original values. This colurnn has the same-colurnn name.as the ‘original lst since we used! labels in the first row. Rank - This i the rank of the corresponding number in the list Porcent - This is the numbers percentage rank within the list. This percentage indicates the proportion of the list which sre pelovr this given number, Source: hitps:/lweb library ug.edu.auifles/142204/20210831_Excel_Data_Analysis pdt swore Microsoft Excel: Date Analysis

You might also like