You are on page 1of 12

Quantitative Techniques

2012

Project Report On Quantitative Statistics

Submitted to: Prof. Venkatesh Shekhar Course In-charge Managerial Statistics

Submitted by: Arya Pradhan MBA (F&B), Batch III, Term I

Roll No-P301311CMG216

NIIT University, Neemrana Rajasthan

1

Quantitative Techniques

2012

Table of Content Sl. No
1 2 3 7 8 9 9 10 10 11 12 Objectives Introduction Measure of Central Tendency Frequency Distribution Probability on the Curve Sample Probability Hypothesis Testing for Single Population Single Factor Anova Hypothesis Testing for 2 Population F-test Conclusion

Topic

2

Quantitative Techniques

2012

Objective
This project aims at understanding statistics as a tool to explore a collected data for time spent on moodle for the Month of May and June 2012. The project aims to summarise and interpret data in the correct perspective with the use of statistical models and formulae. The inference for all statistical results aims to understand various concepts like:           Measure of Central Tendency Measure of Dispersion Concept of Outliers Frequency Distribution Probability on the Curve Sample Probability Hypothesis Testing for Single Population Hypothesis Testing for 2 Population Single Factor Anova F- test

3

Quantitative Techniques

2012

Introduction
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments. A statistician is someone who is particularly well versed in the ways of thinking necessary for the successful application of statistical analysis. Such people have often gained this experience through working in any of a wide number of fields. There is also a discipline called mathematical statistics that studies statistics mathematically. Statistical methods can be used for summarizing or describing a collection of data; this is called descriptive statistics. This is useful in research, when communicating the results of experiments. In addition, patterns in the data may be modelled in a way that accounts for randomness and uncertainty in the observations, and are then used for drawing inferences about the process or population being studied; this is called inferential statistics. Inference is a vital element of scientific advance, since it provides a means for drawing conclusions from data that are subject to random variation.

4

Quantitative Techniques

2012

POPULATION DATA SET Population data is composed of observations of time spent on moodle at various times, with the data from each observation serving as a different member of the overall group. In short it is a complete set of data for conducting any statistical analysis. The table below represents data collected for the last two months.
Date 01-May 02-May 03-May 04-May 05-May 06-May 07-May 08-May 09-May 10-May 11-May 12-May 13-May 14-May 15-May 16-May 17-May 18-May 19-May 20-May 21-May 22-May 23-May 24-May 25-May 26-May 27-May 28-May 29-May 30-May 31-May 01-Jun 02-Jun 03-Jun 04-Jun 05-Jun 06-Jun 07-Jun 08-Jun 09-Jun 10-Jun Time spent on Moodle 0.0 0.0 0.0 0.0 2.5 5.0 7.5 2.7 6.9 7.0 0.0 15.5 2.5 2.0 2.7 8.6 0.0 0.0 2.5 15.0 5.3 4.4 16.8 6.7 0.0 7.1 17.5 0.0 10.5 4.9 6.9 8.1 11.2 6.9 7.3 0.0 16.8 8.1 19.9 8.1 6.9

5

Quantitative Techniques

2012

11-Jun 12-Jun 13-Jun 14-Jun 15-Jun 16-Jun 17-Jun 18-Jun 19-Jun

16.8 7.0 5.3 10.6 0.0 6.7 0.0 3.9 0.0

DESCRIPTIVE STATISTICS: Summarizes the population data by describing what was observed in the sample numerically or graphically. Numerical descriptors include mean and standard deviation for continuous data types (like heights or weights), while frequency and percentage are more useful in terms of describing categorical data. The table below represents descriptive analysis of population data and its inference.
Mode Median Mean Qmin Q1 Q2 Q3 Qmax Variance Standard Deviation Mean Absolute Deviation Coefficient Of Variation Skewness 0.000 6.003 6.086 0.000 0.508 6.003 8.090 19.926 30.627 5.534 4.337 91% 0.817625

INFERENCES Range - In the descriptive statistics, the range is the length of the smallest interval which contains all the data. It is calculated by subtracting the smallest observation (sample minimum) from the greatest (sample maximum) and provides an indication of statistical dispersion. In our case the range of time spent on moodle is 19.9. Mean - For a data set, the mean is the sum of the values divided by the number of values. The mean of a set of numbers x1, x2, ..., xn is typically denoted by, pronounced "x bar". This mean is a type of arithmetic mean. If the data set were based on a series of observations obtained by sampling a statistical population, this mean is termed the "sample mean" to

6

Quantitative Techniques

2012

distinguish it from the "population mean". In our case the population mean is 6.086, which is average daily time spent on moodle. Median - The median of a set of data values is the middle value of the data set when it has been arranged in ascending order. That is, from the smallest value to the highest value. In our case the median is 6.003. Outlier - An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs. Outliers can occur by chance in any distribution, but they are often indicative either of measurement error or that the population has a heavy-tailed distribution. In the former case one wishes to discard them or use statistics that are robust to outliers, while in the latter case they indicate that the distribution has high kurtosis and that one should be very cautious in using tools or intuitions that assume a normal distribution.
Q min Q1 Q2 Q3 Q max

Quartile

0.0

0.508

6.003

8.090

19.926

Standard Deviation - In statistics, standard deviation (represented by the symbol σ) shows how much variation or "dispersion" exists from the average (mean, or expected value). A low standard deviation indicates that the data points tend to be very close to the mean, whereas high standard deviation indicates that the data points are spread out over a large range of values. The Standard Deviation of 5.534 represents the measure of dispersion in data. Skewness- It is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined. Qualitatively, a negative skew indicates that the tail on the left side of the probability density function is longer than the right side and the bulk of the values lie to the right of the mean. A positive skew indicates that the tail on the right side is longer than the left side and the bulk of the values lie to the left of the mean. A zero value indicates that the values are relatively evenly distributed on both sides of the mean, typically but not necessarily implying a symmetric distribution. In this case the data is positively skewed (0.817). Frequency Distribution - In statistics, a frequency distribution is an arrangement of the values that one or more variables take in a sample. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval, and in this way, the table summarizes the distribution of values in the sample. Frequency distributions are used for both qualitative and quantitative data. From the histogram we can infer that the most of the time spent on moodle lie within 0 -0.5 minutes bucket.

7

Quantitative Techniques

2012

Frequency Distribution
14 12 10 8 6 4 2 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 Frequency

From this we can infer that the most the time moodle has been used only for downloading the study material. Random Sampling - A random sample is one chosen by a method involving an unpredictable component. Random sampling can also refer to taking a number of independent observations from the same probability distribution, without involving any real population. The random sample drawn in this case is:
Random Sample 1 6.71 2 2.50 3 2.70 4 7.04 5 7.14 6 7.14 7 2.70 8 4.93 9 0.00 10 0.00 11 0.00 12 6.71 13 7.30 14 16.81 15 8.09 16 8.09 17 0.00 18 17.50 19 7.14 20 0.00 21 0.00 22 0.00 23 6.89 24 11.19

8

Quantitative Techniques

2012

Probability for the Population and Interval Estimates: Let us consider an example of time spent on moodle. The probability of spending less than 4 minutes is 43.15%. Now we will estimate the population mean from sample mean and the same can be done with confidence interval approach and the details as follows ESTIMATING POPULATION MEAN FROM SAMPLE Confidence Interval t Value Sample Mean Sample Size Point Estimator Interval Estimate (Upper value) Interval Estimate (Lower value) Population Mean HYPOTHESIS TESTING ABOUT SINGLE POPULATION H1:U= 4.80 H2:U not equal to 4.8 t calculated Confidence Interval t Critical Value (Two tailed test) Hypothesis cannot be rejected
-0.60 95% +-2.069 95% 2.07 5.44 24 5.44 7.66 3.22 6.09

Population mean is within the confidence interval of 7.23 minutes to 2.60 minutes. Hypothesis Test (Single Population) Let us consider Null hypothesis to be µ = 4.80. Alternate hypothesis is µ not equal to 4.80. Since the sample size is less than 30, we have used the t- distribution. For the same we have considered the random sample of 24 values and the mean sample has also been found out, which is 4.92 minutes. Using the t- distribution we have found out the t- calculated value as 0.11 which is less than the t critical value of +-2.069 for a two-tailed test (Confidence 9

Quantitative Techniques

2012

Interval = 95%, degree of freedom = 23). Since the t calculated is within the acceptance region, we have accepted null hypothesis (µ= 4.80). 2 Population Tests: A Z-test is any statistical test for which the distribution of the test statistic under the null hypothesis can be approximated by a normal distribution. Because of the central limit theorem, many test statistics are approximately normally distributed for large samples. For each significance level, the Z-test has a single critical value (for example, 1.96 for 5% two tailed) which makes it more convenient than the Student's t-test which has separate critical values for each sample size. Therefore, many statistical tests can be conveniently performed as approximate Z-tests if the sample size is large or the population variance known. If the population variance is unknown (and therefore has to be estimated from the sample itself) and the sample size is not large, the Student t-test may be more appropriate. Now we are considering the two sample test and here we have taken the sample from population of time spent by Devesh and Dennis. Z- Distribution is being utilised to find that there is any difference in time spent on moodle by both the persons. Let us consider Null hypothesis to be Ho:µ1=µ2. Alternate hypothesis is Ha:µ1≠µ2 z-Test: Two Sample for Means DENNIS Mean Known Variance Observations Hypothesized Mean Difference z P(Z<=z) one-tail z Critical one-tail P(Z<=z) two-tail z Critical two-tail 5.09 26.56 31.00 0.00 1.26 0.10 1.64 0.21 1.96 ARYA 3.52 21.32 31.00

HYPOTHYSIS TEST ABOUT THE DIFFERENCE IN TWO MEANS (Population variances known ) Alpha=5% Ho: µ1=µ2 Ha:µ1≠µ2

Because observed value of Z is not in the rejection region thus we will accept null hypothesis.Implication:There is no significant difference in the average time taken by two persons. ANOVA Test: ANOVA is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation. Doing multiple two-sample t-tests would result in an increased 10

Quantitative Techniques

2012

chance of committing a type I error. For this reason, ANOVAs are useful in comparing two, three, or more means. Based on the above samples, we shall undertake the following hypothesis. H1:µ1=µ2=µ3 H2: Any of the sample means are not equivalent to the others. Anova: Single Factor SUMMARY Groups Count Dennis 31 Devesh 31 Arya 31 ANOVA Source of Variation Between Groups Within Groups Total

Sum Average 157.90 5.09 109.18 3.52 236.39 7.62

Variance 28.26 26.30 28.80

SS 265.77 2501.42 2767.20

df 2 90 92

MS 132.88 27.79

F 4.78

P-value 0.0106

F crit 3.098

Since F calculated is greater than F critical, we should reject the null hypothesis.Which means that there is a difference in time spent on moodle by all three persons. F – Test: An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fit to a data set, in order to identify the model that best fits the population from which the data were sampled. Exact F-tests mainly arise when the models have been fit to the data using least squares F-TEST TWO-SAMPLE FOR VARIANCES Dennis
3.52

Devesh
7.63 28.81 31 30

Mean
26.31

Variance
31.00

Observations
30.00

df
0.91

F
0.40

P(F<=f) one-tail
0.54

F Critical one-tail

11

Quantitative Techniques

2012

F-test is rejected for the above case as F calculated < F critical. Hence the null hypothesis of Ho: µ1=µ2 is accepted.

CONCLUSION: We have done statistical analysis upon the time spent on moodle by members within the group. The time spend pattern patterns between Members are out of sync as the measures of dispersion are too wide.

12