Professional Documents
Culture Documents
Chapter 1: Data
Topic
Astronomy
2. Qualitative/Quantitative
1. Quantitative data: Data that can be described using
numbers, and basic mathematical procedures,
including addition, subtraction etc can be performed.
b) Square Transform
• Square transform is the type of transformer in which the square of the data is
considered instead of the normal data.
• In this case, data is applied with the square function, where the square of every
single observation will be considered as the final transformed data.
• The transformation is: where x is an attribute in the dataset.
c) Square Root Transform
• In this transform, the square root of the data is calculated.
• This transform performs very well on the left-skewed data and efficiently
transforms the left-skewed data into normally distributed data.
• The transformation is: where x is an attribute in the dataset.
d) Reciprocal Transform
• In this transform, the reciprocal of every observation is considered.
• This transformation can be only used for non-zero values.
• The transformation is: where x is an attribute in the dataset.
e) Custom Transform
• On every dataset, the log and square root transforms can not be used, as
every data can have different patterns and complexity.
• Based on the domain knowledge of the data, custom transformations can be
applied to transform the data into a normal distribution.
• The custom transforms can be any function or parameter like sin, cos, tan,
cube, cube root etc.
2) Power Transformers
• Power Transformation techniques are the type of data transformation
technique where the power is applied to the data observations for
transforming the data.
• There are two types of Power Transformation techniques:
a) Box-Cox Transform
b) Yeo-Johnson Transform
a) Box-Cox Transform
• This transform technique is mainly used for transforming the data
observations by applying power to them.
• The power of the data observations is denoted by Lambda(λ).
• There are mainly two conditions associated with the power in this transform,
which is lambda equals zero and not equal to zero.
• After sampling the data we can get a balanced dataset for both majority and
minority classes. So, when both classes have a similar number of records
present in the dataset, we can assume that the classifier will give equal
SMOTE (Synthetic Minority Oversampling Technique): It is another
technique to oversample the minority class. Simply adding duplicate
records of minority class often don’t add any new information to the model.
In SMOTE, new instances are synthesized from the existing data. SMOTE
looks into minority class instances and use k nearest neighbor to select a
random nearest neighbor, and a synthetic instance is created randomly in
feature space.
Time series data
• A time series is a group of observations on a single entity over time
(regular time intervals).
• It is a type of data that tracks the evolution of a variable over time, such as
sales, stock prices, temperature, heart-rate etc.
• The regular time intervals can be daily, weekly, monthly, quarterly, or
annually, and the data is often represented as a line graph or time-series
plot.
• Time series data is commonly used in fields such as economics, finance,
weather forecasting, and operations management, among others, to analyze
trends and patterns, and to make predictions or forecasts.
• Time series analysis is a machine learning technique that forecasts target
value based solely on a known history of target values. It is a
specialized form of regression, known as auto-regressive model.
• Example of a time-series dataset: A CSV file which has
monthly balance of the users bank account starting from
January 1973 to September 1977.
Components of a time series data:
• A time series can be analyzed in detail by breaking down it into its primary
components. This process is called as time series decomposition. Time
series data is composed of Trend, Seasonality, Cyclic and Residual
components.