You are on page 1of 4

Summary

Getting To Know Your Data

In this session, you learnt how to identify the:


● Amount of data needed to analyse a business challenge
● Characteristics of your data to understand the insights that can be generated
● Structure of your data to gain a better understanding of the insights that can be generated
● Issues in your data to make your data reliable

How Much Information Is Presented?

A population comprises all the data points in a domain. A data point or a datum is also termed as a
record or an observation. Identifying the population for your business problem is essential to perform
data analysis.

A sample is a selection of data from the population and is representative of the population.

The following distinction will help you get a better understanding of the characteristics of a
population and a sample.

Population Sample

It is a complete set of data points. It is a subset of population.

It provides true insights about your It has some margin of error in the insights about
problem, assumptions or opinions. your problems, assumptions or opinions.

Measurable qualities or insights are Measurable qualities or insights are called


called parameters. statistics.

© Copyright UpGrad Education Pvt. Ltd. All rights reserved


What Type of Information Is Presented?

Following are some key terms that are fundamental for data analysis.

The measurement of variables can take on different forms or types, which can be broadly divided
into four categories as shown below.

Another dimension to data is whether they are discrete or continuous. Discrete data have distinct
values, that is, one can say that a value is different from another. All nominal, categorical and ordinal
data are discrete.

For continuous data, a distinct value cannot be located. For instance, the number of cars pulling into
a gas station at an exact time cannot be determined, although it is possible to do so for a particular
time interval.

© Copyright UpGrad Education Pvt. Ltd. All rights reserved


How Is The Information Presented?

Cross-sectional data are variables across different sources of the same kind. Data can also be
collected over time and then these are called time-series data.

The following table will help you get a clear distinction between cross-sectional data and time-series
data.

Factors Time-Series Data Cross-Sectional Data

Definition A type of data comprising A type of data comprising


observations of a single subject at observations of many subjects at
multiple time intervals the same point in time

Main Focusses on the same variable over Focusses on several variables at


Focus a period of time same point in time

Example Sales of an organisation over a Rainfall in several cities on a single


period of 5 years day

A combination of cross-sectional and time series data is known as panel data.

How Clean Is The Information Presented?

The most common types of data issues are missing, outdated, invalid and unreliable data.

You also learnt that outliers are legitimate values but they need to be treated with care to ensure that they
do not influence a model's results.

© Copyright UpGrad Education Pvt. Ltd. All rights reserved


Disclaimer: All content and material on the upGrad website is copyrighted, either belonging to upGrad or
its bonafide contributors and is purely for the dissemination of education. You are permitted to access print
and download extracts from this site purely for your own education only and on the following basis:
● You can download this document from the website for self-use only.
● Any copy of this document, in part or full, saved to disk or to any other storage medium may only
be used for subsequent, self-viewing purposes, or to print an individual extract or copy for non-
commercial personal use only.
● Any further dissemination, distribution, reproduction, copying of the content of the document
herein or the uploading thereof on other websites, or use of the content for any other
commercial/unauthorised purposes in any way which could infringe the intellectual property rights
of upGrad or its contributors, is strictly prohibited.
● No graphics, images or photographs from any accompanying text in this document will be used
separately for unauthorised purposes.
● No material in this document will be modified, adapted or altered in any way.
● No part of this document or upGrad content may be reproduced or stored in any other website or
included in any public or private electronic retrieval system or service without upGrad’s prior written
permission.
● Any right not expressly granted in these terms is reserved.

© Copyright UpGrad Education Pvt. Ltd. All rights reserved

You might also like