Data Analysis

Data Analysis
Outline
What is data analysis Goal of an analysis Role of statistics in reserch Nature of the Data , Nominal , Ordinal, Interval, Ratio
Tools of data analysis Descriptive Statistics
Inferential Statistics
Statistical Software Packages
Data analysis The Concept

Approach to de-synthesizing data, informational, and/or factual elements to answer research questions Method of putting together facts and figures to solve research problem Systematic process of utilizing data to address research questions Breaking down research issues through utilizing controlled data and factual information
Goal of an analysis
* To explain cause-and-effect phenomena * To relate research with real-world event * To predict/forecast the real-world phenomena based on research * Finding answers to a particular problem * Making conclusions about real-world event based on the problem * Learning a lesson from the problem
Processing operations
Editing-detect errors Coding-assigning numerals or other symbols Classification-grouping data on basis of common
characteristics
Tabulation-summarising raw data in a compact form

further analysis (in the form of statistical tables)
for
Role of Statistics in Research

With Statistics , we can summarize large bodies of data, make predictions about future trends ,and determine when different experimental treatments have led to significantly different outcomes.
Statistics are among the most powerful tools in the research's toolbox.
Considering the Nature of the Data
Continuous or discrete Nominal, ordinal, interval or ratio scale
Continuous versus Discrete Variables

Continuous Data :takes on any value within a finite or infinite
interval. You can count, order and measure continuous data. Example :height, weight, temperature, the amount of sugar in an orange, the time required to run a mile.
Discrete Data : values / observations belong are distinct and

separate, i.e. they can be counted (1,2,3,....). Example: the number of kittens in a litter; the number of patients in a doctors surgery; the number of flaws in one metre of cloth; gender (male, female); blood group (O, A, B, AB).
Nominal Data
the numbers are simply labels. You can count but not order or measure nominal data
Example: males could be coded as 0, females as 1; marital status of an individual could be coded as Y if married, N if single.
classification data, e.g. m/f no ordering, e.g. it makes no sense to state that M > F arbitrary labels, e.g., m/f, 0/1, etc
Ordinal Data
ordered but differences between values are not important

e.g., Like scales, rank on a scale of 1..5 your degree of satisfaction rating of 2 rather than 1 might be much less than the difference in enjoyment expressed by giving a rating of 4 rather than 3.
You can count and order, but not measure, ordinal data.
Interval Data
ordered, constant scale, but no natural zero differences make sense, but ratios do not
e.g.: 30-20=20-10, but 20/10 is not twice as hot! e.g.: Dates: the time interval between the starts of years 1981 and 1982 is the same as that between 1983 and 1984, namely 365 days. The zero point, year 1 AD, is arbitrary; time did not begin then
Ratio Data
Like interval data but has true zero
Ordered, Constant scale, natural zero e.g., height, weight, age, length
Data analysis tools

* Descriptive statistics
Measurers of Central Tendency. Measurers of variability.
* Inferential statistics
Testing Hypothesis
Meta analysis
* Statistical Software Packages
Descriptive Statistics
Descriptive Statistics describes data
Points of Central Tendency Amount of Variability
Relation of different variables to each other
Measure of Central Tendency

Mode Median
Arithmetic mean Geometric mean The Most frequently occurring score is identified. The midpoint of the data Data on nominal, ordinal, interval and ratio Ordinal, interval, and ratio
All scores are added and the sum is divided by the number of scores All scores are multiplied together, and the nth root of their product is computed.
Interval and ration
Ratio scales
Measures of Variability
How great is the Spread?
Range=Highest Score-Lowest score the quartiles: The pth percentile of a distribution is the value such that p percent of the observations fall at or below it. The 50th percentile = median, M The 25th percentile = first quartile, Q1 The 75th percentile = third quartile, Q3 Interquartile: Quartile 3- Quartile 1
Example: 13 13 16 19 21 21 23 23 24 26 26 27 27 27 28 28 30 30 M=?, Q1=?, Q3=?
Measure of Relationship: Correlation

correlation indicates the strength and direction of a linear relationship between two variables.
Notes about Correlation

Substantial correlations between two characteristics needs reasonable Validity and Reliability in measuring
Correlation does not indicate causation
We use the samples as estimate of population parameter. The quality of all statistical analysis depends on the quality of the sample data
Sample
Random Sampling: every unit in the population has an equal chance to be Chosen A random sample should represent the population well, so sample statistics from a random sample should provide reasonable estimates of population parameters
Population
Estimate a population parameter from a random sample Test statistically hypotheses
Inferential Statistics: Estimate a Population Parameter from Sample
All sample statistics have some error in estimating population parameters

Example: estimate mean height of 10 year old boys in Chicago, Sample:200 boys How close the sample mean is to the population mean? we dont know but we know:
The mean from an infinite number of samples form a normal distribution. The population mean equals the average (mean) of all samples.
The Standard deviation of sample distribution ( standard error) is directly related to the std of the characteristic in question for the overall population.
Testing Hypothesis
In any study we start with certain assumption about the population from which sample is drawn This assumption about the population or about parameters of population This assumption is called Hypothesis. (Parameter: describes a population)
Example
To determine whether the mean nicotine content of a brand of cigarettes is greater than the advertised value of 1.4 milligrams, a health advocacy group takes a sample of 500 cigarettes and measures the amount of nicotine in the sample. Sample values: The sample average of nicotine = 1.51 mlg The standard deviation = 1.016. The estimated amount of nicotine is 1.51mlg, based on the sample values. The standard error of the sample average is S.E.=s.d./sqrt(n-1)=0.045
Is there an actual difference between the sample value (1.51mlg) and the advertised value (1.4 mlg)? Or is it just due to sampling error?
To answer this question we need a Test of Significance:
Stating an hypotheses
The null hypothesis H0 expresses the idea that the observed difference is due to chance. It is a statement of no effect or no difference, and is expressed in terms of the population parameter.
Let denote the true average amount of nicotine. H0 : =1.4mlg

The alternative hypothesis Ha represents the idea that the difference is real. It is expressed as the statement we hope or suspect is true instead of the null hypothesis. The alternative hypothesis states that the cigarettes contain a higher amount of nicotine, that is: Ha : > 14mlg
General comments on stating hypotheses

It is not easy to state the null and the alternative hypothesis! The hypotheses are statements on the population values. The alternative hypothesis Ha is often called researcher hypothesis, because it is the hypothesis we are interested about. A significance test is a test against the null hypothesis Often we set Ha first and then Ho is defined as the opposite statement!
Meta- Analysis
Meta-analysis refers to the analysis of analyses...the statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings.
Conduct a fairly extensive search for relevant studies

Identify appropriate studies to include in meta-analysis Convert each studys results to a common statistical index
Using Statistical Software Packages

SPSS SAS Matlab Statistics toolbox SYSTAT, Minitab, Stat View, Statistica
SPSS
SPSS is a computer program used for survey authoring and deployment (IBM SPSS Data Collection), data mining (IBM SPSS Modeler), text analytics, statistical analysis, and collaboration and deployment (batch and automated scoring services).
SPSS stands Statistical Package

for the Social Sciences
About SPSS
Developer(s) Initial release Stable release Operating system Platform Type IBM Corporation 1968 20.0 August 16, 2011 Windows, zLinux, Linux / UNIX & Mac Java Statistical analysis, Data Mining, Text Analytics, Data Collection, Collaboration & Deployment Proprietary software www01.ibm.com/software/analytics/spss/
License Website
SPSS (originally, Statistical Package for the Social Sciences) was released in its first version in 1968 after being developed by IBM. SPSS is among the most widely used programs for statistical analysis in social science. It is used by market researchers, health researchers, survey companies, government, education researchers, marketing organizations and others.
In addition to statistical analysis, data management (case selection, file reshaping, creating derived data) and data documentation are features of the base software.
Statistics included in the base software:

Descriptive statistics : Cross tabulation, Frequencies, Descriptives, Explore, Descriptive Ratio Statistics Bivariate statistics: Means, tests, ANOVA, Correlation (bivariate, partial, distances), Nonparametric tests
Prediction for numerical outcomes: Linear regression
Prediction for identifying groups:

Factor analysis, cluster analysis (two-step, Kmeans, hierarchical), Discriminant

Data Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Analysis

Uploaded by

Copyright:

Available Formats

Data Analysis

Tools of data analysis Descriptive Statistics

Data analysis The Concept

Tabulation-summarising raw data in a compact form

Role of Statistics in Research

Considering the Nature of the Data

Continuous or discrete Nominal, ordinal, interval or ratio scale

Continuous versus Discrete Variables

Discrete Data : values / observations belong are distinct and

ordered but differences between values are not important

Data analysis tools

* Statistical Software Packages

Points of Central Tendency Amount of Variability

Relation of different variables to each other

Measure of Central Tendency

Interval and ration

Example: 13 13 16 19 21 21 23 23 24 26 26 27 27 27 28 28 30 30 M=?, Q1=?, Q3=?

Measure of Relationship: Correlation

Notes about Correlation

Inferential Statistics: Estimate a Population Parameter from Sample

All sample statistics have some error in estimating population parameters

Let denote the true average amount of nicotine. H0 : =1.4mlg

General comments on stating hypotheses

Conduct a fairly extensive search for relevant studies

Using Statistical Software Packages

SPSS stands Statistical Package

Statistics included in the base software:

Prediction for numerical outcomes: Linear regression

Prediction for identifying groups:

You might also like