Professional Documents
Culture Documents
A population comprises all the data points in a domain. A data point or a datum is also termed as a
record or an observation. Identifying the population for your business problem is essential to perform
data analysis.
A sample is a selection of data from the population and is representative of the population.
The following distinction will help you get a better understanding of the characteristics of a
population and a sample.
Population Sample
It provides true insights about your It has some margin of error in the insights about
problem, assumptions or opinions. your problems, assumptions or opinions.
Following are some key terms that are fundamental for data analysis.
The measurement of variables can take on different forms or types, which can be broadly divided
into four categories as shown below.
Another dimension to data is whether they are discrete or continuous. Discrete data have distinct
values, that is, one can say that a value is different from another. All nominal, categorical and ordinal
data are discrete.
For continuous data, a distinct value cannot be located. For instance, the number of cars pulling into
a gas station at an exact time cannot be determined, although it is possible to do so for a particular
time interval.
Cross-sectional data are variables across different sources of the same kind. Data can also be
collected over time and then these are called time-series data.
The following table will help you get a clear distinction between cross-sectional data and time-series
data.
The most common types of data issues are missing, outdated, invalid and unreliable data.
You also learnt that outliers are legitimate values but they need to be treated with care to ensure that they
do not influence a model's results.