You are on page 1of 8

Subunit -3

Data Acquisition
Preprocessing Stage for Data Acquisition
● Before Data Acquisition the following Parameters are taken care of:
○ Data features needed - A data feature is an individual measurable property or
characteristic of the data object being recorded or stored.
○ Source of data - Data to be gathered from reliable source i.e from Interview,
survey etc.
○ Type of data - Data can be of two types : Categorical or Numerical
■ Categorical Data or Qualitative Data - The categorical data consists of categorical
variables which represent the characteristics such as a person’s gender, hometown
etc. Categorical measurements are expressed in terms of natural language
descriptions, but not in terms of numbers.
■ Numeric Data ot Quantitve Data - Numerical data or Quantitative data comprising
numbers or numerical values to represent the data, such as height, weight, age of a
person
○ Frequency of collection of data - It means how often should the data be collected.
It is very important to collect the data at least as often as the process is expected
to change i.e at regular intervals.
○ Size of data sample - Data collection should be enough and should represent the
correct factors that are required by the project.
○ Type of analysis of Data - Data analysis may be done Quantitatively or
qualitatively.
■ Quantitative Analysis - It is done when the data is numeric in nature. Tools
like average mean,mode,median,variance etc are used.
■ Qualitative Analysis - It is done when the data is categorical in nature. Tools
like content analysis, narrative analysis etc are used.
○ Validation of the data - it is done to ensure that the data entered is sensible and
reasonable.
Data Acquisition
● Data can be defined as a representation of facts or instructions about some
entity (students, school, sports, business, animals etc.) that can be processed
or communicated by human or machines.
● Data is a collection of facts, such as numbers, words, pictures, audio clips,
videos, maps, measurements, observations or even just descriptions of
things.
● Data may be represented with the help of characters such as alphabets (A-Z,
a-z), digits (0-9) or special characters (+, -, /, *, <, >, = etc.)
Dataset
● A Data set is a set or collection of data.
● This set is normally presented in a tabular pattern.
● Every column describes a particular variable.
● And each row corresponds to a given member of the data set, as per the given question. This is a part of
data management.
● Data sets describe values for each variable for unknown quantities such as height, weight, temperature,
volume, etc of an object or values of random numbers. The values in this set are known as a datum.
● The data set consists of data of one or more members corresponding to each row.
● Data sets are divided in two parts :
○ Train Data - A training dataset is a database of examples used during the learning process and is used
to fit the parameters. Maximum part of the dataset comes under training data (Usually 80%).
○ Test Data - A test set is set of example used only to access the performance the fully specified
classifier. A very little part of dataset is used for test data (Usually 20%)
Data Features
● A measurable piece of data that can be used for analysis.
● In CSV and Excel files they could be seen as columns.
● Features are also sometimes referred to as “variables” or “attributes.”
● Depending on what we're trying to analyze, the features we include in our
dataset can vary widely.
Sources of Data Acquisition
Surveys
● A research method used for collecting data from a predefined group of
respondents to gain information and insights into various topics of interest.
Cameras
● A camera captures a visual image.
● A device for recording visual images in the form of photographs, film, or
video signals.
● Could be used to collect data for CV projects.
Web Scraping
● Web scraping is the process of collecting structured web data in an automated fashion.
It’s also called web data extraction.
● Some of the main use cases of web scraping include price monitoring, price
intelligence, news monitoring, lead generation, and market research among many
others.
Observation
● Some data we can acquire through monitoring and close inspection.
Sensors
● A device which detects or measures a physical property and records, indicates, or
otherwise responds to it. E.g. - Temperature Sensors, Humidity Sensors, Pressure
Sensors, Proximity Sensors, Level Sensors, Accelerometers, Gyroscope, Infrared
Sensors, etc.
Application Programming Interface
● An API is a software intermediary that allows two applications to talk to each other.

You might also like