Professional Documents
Culture Documents
Data Preparation
Gather Data
This is an initial process for each business. In this phase, it is necessary
to collect data from various sources — the sources can be of any type
such as from catalogs or ad-hoc can be added.
Discover Data
The next step is discovering the data; here, it is very important to
understand the data and categorize it into different datasets. This step
might take a long time to filter because of the huge collection of
datasets.
Storing Data
This is the final step after going through all the above processes. Once
the data is cleaned, it is ready to offer third-party tools, such as business
intelligence tools for analysis.
You may also like: How to do Data Exploration for Image
Segmentation and Object Detection
1. Variable Identification
2. Univariate Analysis
3. Bi-variate Analysis
4. Missing Values Treatment
5. Outlier Treatment
6. Variable Transformation
7. Variable Creation
Variable Identification
In this step, you have to first identify the input and output variables.
Then, identify the data type and category of the variables. Let's focus
more by applying one real-time example
Suppose a school wants to predict the ratio (pass or fail) of student
results. Here, you need to collect predictor variables, target variables,
data types, and category of the variable.
Below, the variables have been defined in a different category:
Univariate Analysis
In univariate analysis, variables are explored one-by-one. This method
depends on whether a variable type is categorical or continuous:
Bi-variate Analysis
Bivariate analysis is the analysis of bivariate data. It used to find out if
there is a relationship between two sets of values. It usually involves the
variables X and Y.
Examples of Bi-Variate Analysis:
Scatter Plots
Regression Analysis
Correlation Coefficients
Bivariate scatter example
Outlier Treatment
The outlier is a data point that is distant from another different point.
These outliers should remove from datasets. This can be identified
directly by looking at the data table or worksheet.
Cleaning outliers
Variable Transformation
Data does not always come in a form that is immediately suitable for
analysis. We often have to change variables before analysis. A
transformation is a recursion of data using a function or some
mathematical operation on each observation.
Conclusion
It is clear from the above discussion that by using the right tools, an
organization can easily detect and present data effectively. However, as
with anything, having a plan and focus yields the best results.
You can use this detail information on data discovery and data
preparation before you start analyzing data.