Terminologies in Eda: Population

TERMINOLOGIES IN EDA
1. Sample - it is called the subset of population.

2. Parameter - it is a numerical measurement that describes some characteristics of a
population.
3. Statistic is a numerical measurement describing some characteristics of a sample.
4. Ordinal level - this involves data that may be arranged in some order, but differences
between data values either cannot be determined or are meaningless.
5. Nominal level - this is characterized by data consists of names, labels ,or categories only.
The data cannot be arranged in an ordering scheme.
6. Interval level – is the same as the ordinal level, with an additional property that we can
determine meaningful amounts of differences between the data. Data at this level may lack
an inherent zero starting point.
7. Ratio level – is an interval level modified to include the inherent zero starting point. The
difference and ratios of data are meaningful. This is also the highest level of measurement.
8. Random sampling – this is done by using chance methods or random numbers.
9. Systematic sampling – this is done by numbering each subject of the population and then
selecting every kth number.
10. Stratified sampling - if a population has distinct groups, it is possible to divide the
population into these groups and to draw SRS’s from each of the groups are called strata.
Strata are designed so that members in each strata are more homogenous, that is. More
similar to each other. The results are then grouped together to form the sample. This
technique is particularly useful in populations that can be stratified into groups by gender,
race, geography.
11. Cluster sampling – this method uses intact groups called clusters.
12. Sigma is used to denote the sum of all values or summation.
13. When observations are sorted into classes of single values, the result is called frequency
distribution for ungrouped data.
14. When observations are sorted into classes of more than one values the result is called a
frequency distribution for grouped data.
15. If a population has distinct groups, it is possible to divide the population into these groups
and to draw SRS’s from each of the groups are called strata.
16. Frequency distribution - it is a collection of observation by sorting the min to classes and
showing their frequency (or numbers) of occurrences in each class.
17. Class boundaries - these are used to separate the classes so that there are no gaps in the
frequency distribution
18. Class width - the difference between two consecutive lower class limits
19. Mean - it is commonly used measure of central tendency.
20. Median - it is the midpoint of the data array.
21. Mode - it is the value that occurs most often in the data set.
22. Standard deviation - it is the positive square root of the variance.
23. Lower class limit - the smallest data value that can be included in the class.
24. Upper class limit - the largest data value that can be included in the class.
25. Cumulative frequency - it is the sum of the frequencies for that class and all succeeding
classes.
26. Population frequency - the total number of observations in a population.
27. Sample frequency - the total number of observations in the sample.
28. The unbiased estimator of the population variance is a statistic value approximates the
expected value of a population variance.
29. The modal class is the class with the largest frequency.
30. To find the median class of the data set, simply divide the frequency by 2 to get the halfway
point and look the cumulative frequency closest the data.
31. The class midpoint (or class mark) is a specific point in the center of the bins (categories) in
a frequency distribution table.
32. The standard deviation measures absolute variability and not relative variability. It can only
compare two samples that have the same units of measures.
33. A statistic that allow us to compare two different data sets that have different units of
measurement is called coefficient of variation.
34. A measure to determine the skewness of a distribution is called Pearson coefficient of
skewness.
35. When the distribution is symmetrical, the coefficient is zero.
36. When the distribution is positively skewed, the coefficient is positive.
37. When the distribution is negatively skewed, the coefficient is negative.
38. Even if the curves of distributions have the same coefficient of skewness, these curves may
still differ in the sharpness of their peaks. This property of curves can be described using the
measure of kurtosis.
39. The symmetrical curves have three types: the normal or mesokurtic curves; the leptokurtic
curves which are more peaked; and the platykurtic curves which are flat-topped curves.
40. Mesokurtic curves have a kurtosis of zero, meaning that the probability of extreme, rare, or
outlier data is zero or close to zero. Mesokurtic distributions are known to match that of the
normal distribution, or normal curve, also known as a bell curve. (K = 3)
41. Leptokurtic curves are statistical curves with kurtosis greater than three. It can be described
as having a wider or flatter shape with fatter tails resulting in a greater chance of extreme
positive or negative events. (K > 3)
42. Platykurtic curves refers to a statistical distribution in which the excess kurtosis value is
negative. For this reason, a platykurtic curves or distribution will have thinner tails than a
normal distribution will, resulting in fewer extreme positive or negative events. (K < 3)
43. A z score measures the distance between an observation and the mean, measured in units
of standard deviation.
44. The standard score is obtained by subtracting the mean from the value/observation and
diving the result by the standard deviation.
45. If the z score is positive, the score is above the mean.
46. If the z score is 0, the score is the same as the mean.
47. If the z score is negative, the score is below the mean.
48. A quartile is a measure of relative standing.
49. The first quartile, Q1, is the value of x that exceeds one-fourth of median.
50. The third quartile, Q3, is the value of x that exceeds three-fourths of the measurement and
is less than one-fourth.
51. When the Q1 and Q2 are not integers, the quartile are found by interpolation.
52. Percentiles are position measures used in educational and health-related fields to indicate
the position of an individual in a group. It is symbolized by P1, P2, P3, …, P99 and divide the
distribution into 100 groups.
53. Deciles divide the distribution into tenths or 10 equal parts. A data set has nine deciles
which is denoted by D1, D2, D3, …, D9.
54. The first decile, D1, is the number that divides the bottom 10% of the data from the top
90%. To obtain the deciles, divide the date set into tenths and then determine the number
dividing the tenths.
55. Note that the second quartile (Q2), fifth decile (D5), and fiftieth percentile (P50) of a data
set are all the same and all equal to the median.

Terminologies in Eda: Population

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Terminologies in Eda: Population

Uploaded by

Copyright:

Available Formats

TERMINOLOGIES IN EDA

1. Sample - it is called the subset of population.

You might also like