You are on page 1of 6

A statistic (singular) is a single measure of some attribute of a sample (e.g., its arithmetic mean value).

It is calculated by applying a function (statistical algorithm) to the values of the items of the sample, which are known together as a set of data. tatistics[plural] (INFORMAL stats) information based on a study of the number of timessomething happens or is present, or other numerical facts: the science of using information discovered from studying numbers a fact in the form of a number that shows information about something: A statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we are interested in making generalizations about all crows, then the statistical population is the set of all crows that exist now, ever existed, or will exist in the future. Since in this case and many others it is impossible to observe the entire statistical population, due to time constraints, constraints of geographical accessibility, and constraints on the researcher's resources, a researcher would instead observe a statistical sample from the population in order to attempt to learn something about the population as a whole. In statistics and quantitative research methodology, a data sample is a set of data collected and/or selected from a statistical population by a defined procedure. A population is a collection of data whose properties are analyzed. The population is the complete collection to be studied, it contains all subjects of interest. A sample is a part of the population of interest, a sub-collection selected from a population. A parameter is a numerical measurement that describes a characteristic of a population, while a sample is a numerical measurement that describes a characteristic of a sample. In general, we will use a statistic to infer something about a parameter. 1. A numerical datum. 2. A numerical value, such as standard deviation or mean, that characterizes the sample or population from which it was derived. 3. One viewed as a nameless item of statistical information There are two main forms of statistics, descriptive and inferential. Descriptive statistics is used to describe or summarize data. Average, standard deviation, frequency, and percentage are all used in descriptive statistics. Inferential statistics draws an inferences about the data. This includes estimations, correlations and predictions that are made using the data. Levels of Measurement

  

Nominal Ordinal Interval

In addition, there are two other categories that often get used

 

Dictotomous (often treated as Nominal) Ratio (often treated as Interval)

Dichotomous variables are variables that only have two values. This level of measurement may be treated as nominal, however, sometimes an ordinal quality may exist. Some examples

      

Gender - male, female Race - black, white Agreement - yes, no T/F - true, false Value - high, low and others less easy to name o war, no war o vote, no vote

Nominal Variables

Nominal variables are those which can be named, but not quantified. Examples include

   

Religion (Protestant Catholic, Hebrew, Buddhist, etc) Race (Caucasian, African-American, Hispanic,Asian, etc) Linguistic Group Marital Status (Married, Single, Divorced)

Nominal variables may be coded with numbers, but the magnitude of the number assigned is arbitrary. Changing the coding scheme will not change the inference.

Ordinal Variables With ordinal variables, there is a rough quantitative sense to their measurement, but the differences between scores are not necessarily equal. They are thus in order, but not fixed.

  

Rankings (1st, 2nd, 3rd, etc) Grades (A, B, C, D. F) Evaluations o Hi, Medium, Low o Likert Scales  5 pt (Strongly Agree, Agree, Neither Agree nor Disagree, Disagree, Strongly Disagree)  7 pt liberalism scale (Strongly Liberal, Liberal, Weakly Liberal, Moderate, Weakly Conservative, Conservative, Strongly Conservative)

Interval Variables Variables or measurements wghere the difference between values is measured by a fixed scale.

  

Money People Education (in years)

ratio variables on the other hand are at the other end of the scale. ratio variables are numbers with some base value. percentages are perhaps the best indicator here. ethods of Probability Sampling There are a number of different methods of probability sampling including: 1. Random Sampling Random sampling is the method that most closely defines probability sampling. Each element of the sample is picked at random from the given population such that the probability of picking that element can be calculated by simply dividing the frequency of the element by the total number of elements in the population. In this method, all elements are equally likely to be picked if they have the same frequency. 2. Systematic Sampling Systematic sampling is the method that involves arranging the population in a given order and then picking the nthelement from the ordered list of all the elements in the population. The probability of picking any given element can be calculated but is not likely to be the same for all elements in the population regardless of whether they have the same frequency. 3. Stratified Sampling

Stratified sampling involves dividing the population into groups and then sampling from those different groups depending on a certain set criteria. For example, dividing the population of a certain class into boys and girls and then from those two different groups picking those who fall into the specific category that you intend to study with your sample. 4. Cluster Sampling Cluster sampling involves dividing up the population into clusters and assigning each element to one and only one cluster, in other words, an element can't appear in more than one cluster. 5. Multistage Sampling Multistage sampling involves use of more than one probability sampling method and more than one stage of sampling, for example for using the stratified sampling method in the first stage and then the random sampling method in the second stage and so on until you achieve the sample that you want. 6. Probability Proportional to Size Sampling Under probability proportional to size sampling, the sample is chosen as a proportion to the total size of the population. It is a form of multistage sampling where in stage one you cluster the entire population and then in stage two you randomly select elements from the different clusters, but the number of elements that you select from each cluster is proportional to the size of the population of that cluster. Statistics Notation This appendix describes how symbols are used on the Stat Trek web site to represent numbers,variables, parameters, statistics, etc. Capitalization In general, capital letters refer to population attributes (i.e., parameters); and lower-case letters refer to sample attributes (i.e., statistics). For example,

  

P refers to a population proportion; and p, to a sample proportion. X refers to a set of population elements; and x, to a set of sample elements. N refers to population size; and n, to sample size.

Greek vs. Roman Letters Like capital letters, Greek letters refer to population attributes. Their sample counterparts, however, are usually Roman letters. For example,

 

μ refers to a population mean; and x, to a sample mean. σ refers to the standard deviation of a population; and s, to the standard deviation of a sample.

Population Parameters By convention, specific symbols represent certain population parameters. For example,

      

μ refers to a population mean. σ refers to the standard deviation of a population. σ2 refers to the variance of a population. P refers to the proportion of population elements that have a particular attribute. Q refers to the proportion of population elements that do not have a particular attribute, so Q= 1 - P. ρ is the population correlation coefficient, based on all of the elements from a population. N is the number of elements in a population.

Sample Statistics By convention, specific symbols represent certain sample statistics. For example,

      

x refers to a sample mean. s refers to the standard deviation of a sample. s2 refers to the variance of a sample. p refers to the proportion of sample elements that have a particular attribute. q refers to the proportion of sample elements that do not have a particular attribute, so q = 1 -p. r is the sample correlation coefficient, based on all of the elements from a sample. n is the number of elements in a sample.

Simple Linear Regression

     

Β0 is the intercept constant in a population regression line. Β1 is the regression coefficient (i.e., slope) in a population regression line. R2 refers to the coefficient of determination. b0 is the intercept constant in a sample regression line. b1 refers to the regression coefficient in a sample regression line (i.e., the slope). sb1 refers to the refers to the standard error of the slope of a regression line.

Probability

         

P(A) refers to the probability that event A will occur. P(A|B) refers to the conditional probability that event A occurs, given that event B has occurred. P(A') refers to the probability of the complement of event A. P(A ∩ B) refers to the probability of the intersection of events A and B. P(A ∪ B) refers to the probability of the union of events A and B. E(X) refers to the expected value of random variable X. b(x; n, P) refers to binomial probability. b*(x; n, P) refers to negative binomial probability. g(x; P) refers to geometric probability. h(x; N, n, k) refers to hypergeometric probability.

Counting

  

n! refers to the factorial value of n.
nP r nCr

refers to the number of permutations of n things taken r at a time. refers to the number of combinations of n things taken r at a time.

Set Theory

   

A ∩ B refers to the intersection of events A and B. A ∪ B refers to the union of events A and B. {A, B, C} refers to the set of elements consisting of A, B, and C. {∅} refers to the null set.

Hypothesis Testing

   

H0 refers to a null hypothesis. H1 or Ha refers to an alternative hypothesis. α refers to the significance level. Β refers to the probability of committing a Type II error.

Random Variables

     

Z or z refers to a standardized score, also known as a z score. zα refers to the standardized score that has a cumulative probability equal to 1 - α. tα refers to the t score that has a cumulative probability equal to 1 - α. fα refers to a f statistic that has a cumulative probability equal to 1 - α. fα(v1, v2) is a f statistic with a cumulative probability of 1 - α, and v1 and v2 degrees of freedom. Χ2 refers to a chi-square statistic.

Special Symbols Throughout the site, certain symbols have special meanings. For example,

       

Σ is the summation symbol, used to compute sums over a range of values. Σx or Σxi refers to the sum of a set of n observations. Thus, Σxi = Σx = x1 + x2 + . . . + xn. sqrt refers to the square root function. Thus, sqrt(4) = 2 and sqrt(25) = 5. Var(X) refers to the variance of the random variable X. SD(X) refers to the standard deviation of the random variable X. SE refers to the standard error of a statistic. ME refers to the margin of error. DF refers to the degrees of freedom.

Applied statistical inference
Statistical theory provides the basis for a number of data analytic methods that are common across scientific and social research. Some of these are: Interpreting data is an important objective of statistical research:     Estimating parameters Testing statistical hypotheses Providing a range of values instead of a point estimate Regression analysis

Many of the standard methods for these tasks rely on certain statistical assumptions (made in the derivation of the methodology) actually holding in practice. Statistical theory studies the consequences of departures from these assumptions. In addition it provides a range of robust statistical techniques that are less dependent on assumptions, and it provides methods checking whether particular assumptions are reasonable for a give data-set.