You are on page 1of 18

1.

Time-Series Analysis

It is a method of analyzing a collection of data points over a period of


time. Instead of recording data points intermittently or randomly,
time series analysts record data points at consistent intervals over a
set period
Significance of TSA:
• Analysing historical dataset and its patterns.
• Helps organizations understand the underlying causes of trends
or systemic patterns over time.
• Using data visualizations, business users can see seasonal trends
and dig deeper into why these trends occur.
Main goals of time series analysis:
(I) Identifying the nature of the phenomenon represented by the sequence
of observations
(II) Forecasting (predicting future values of the time series variable).
(III) Classification
(IV) Descriptive analysis
(V) Intervention analysis
Time series examples

Weather records, economic indicators and patient health


evolution metrics — all are time series data.

Time series data could also be server metrics, application


performance monitoring, network data, sensor data,
events, clicks and many other types of analytics data.
Steps of TSA:
(1)building a model that represents a time series
(2)validating the model proposed
(3)using the model to predict (forecast) future values and/or
impute missing values.
Appication:
It has tons of practical applications including:
• weather forecasting, climate forecasting, economic forecasting,
healthcare forecasting engineering forecasting, finance
forecasting, retail forecasting, business forecasting,
environmental studies forecasting, social studies forecasting, and
more.
Time Series Modelling Techniques:
Technique Description Use Cases
Models the relationship b/w a data point
Autoregression (AR) and its lagged values. Stock prices, temperature forecasting.

a statistical analysis model that uses time


series data to either better understand the
data set or to predict future trends. ARIMA
Autoregressive Integrated works on univariate TS; to handle multiple
Moving Average (ARIMA) variables VARIMA is used Financial time series, economic data.
Assigns exponentially decreasing weights
Exponential Smoothing to past observations. Sales forecasting, demand planning.
Seasonal Autoregressive
Integrated Moving Average Extends ARIMA by incorporating Retail sales with strong seasonal
(SARIMA) seasonality. patterns.
Long Short-Term Memory A type of recurrent neural network (RNN) Natural language processing, stock
(LSTM) for sequence modeling. prediction.
Simplified version of LSTM, also used for
Gated Recurrent Unit (GRU) sequence modeling. Speech recognition, fraud detection.
Components of TSA
Understanding the components is essential for effective TSA, as it helps in choosing appropriate modeling techniques and
making informed decisions based on time-dependent data.

Component Description Example

Increasing or decreasing behavior of a variable with time. Indicates Upward trend in monthly
Trend no fixed interval. Trend would be negative, positive or NULL trend sales data.

Higher sales during the


Seasonality Reflects regular, repeating patterns at fixed intervals. holiday season.

Cyclical Not tied to calendar. No fixed interval, uncertainty in movement and Economic recessions and
Patterns its pattern expansions.

Sudden stock price spikes


Irregularity Unexpected situation/events and spikes in a short time span unrelated to trends.
Data types of TSA: Two major types are:

1. Stationary Non-Stationary

Stationary: it means that certain attributes of the data do not change over time. Ex. Mean,
Variance and Covariance.
Non-stationary: values and associations between and among variables do vary with time. In
finance, many processes are non-stationary, and so must be handled appropriate.

Limitations of TSA:
• Missing values are not supported
• Data points must be linear in their relationship
• Data transformation is mandatory, so expensive
• Models mostly work on univariate data.
2. Linear systems analysis and nonlinear dynamics

Are two different approaches used in the study of dynamic systems, which are
systems that evolve or change over time.
Commonly applied in various fields, including physics, engineering, biology,
economics, and more, to understand and analyze the behavior of systems.

A. Linear Systems Analysis(LSA):

• Focuses on systems that exhibit linear behavior.


• A system is considered linear if it satisfies the principle of superposition and
homogeneity.

In other words, when you apply a linear transformation (e.g., scaling, addition)
to the input, the output undergoes the same transformation.
Key characteristics of linear systems analysis:

- Superposition: If you have multiple inputs acting on a system simultaneously, the system's
response is the sum of the responses to each individual input.

- Homogeneity: Scaling the input signal scales the output signal proportionally.

- Linearity: The system's equations are linear, which means they involve only addition,
subtraction, and multiplication by constants.

- Time-Invariance: The system's behavior remains the same over time; its properties do not
change with time.

• LSA uses mathematical tools such as differential equations, Laplace transforms, transfer
functions, and eigenvalues/eigenvectors to analyze and solve linear systems.
• It is a well-established & widely used approach in engineering disciplines like control systems
engineering & signal processing.
B. Nonlinear Dynamics

• Deals with systems that do not satisfy the linearity assumptions.


• Characterized by complex interactions & behaviors that can't be easily analyzed using
linear techniques.

Key characteristics of nonlinear dynamics:

- Nonlinearity: The system's equations involve nonlinear terms, making them more
challenging to solve and analyze.

- Complex Behavior: can exhibit a wide range of behaviors, including chaos, bifurcations, limit
cycles, and more.

- Sensitivity to Initial Conditions: Small changes in initial conditions of a nonlinear system can
lead to dramatically different outcomes over time, also known as the butterfly effect.
- Nonlinear dynamics employs techniques such as phase space
analysis, bifurcation analysis, and numerical simulations to
understand and predict the behavior of nonlinear systems.

- This field is crucial for studying complex phenomena like weather


patterns, population dynamics, neural networks, and chaotic systems

Both approaches are valuable in their respective domains, and the choice
between them depends on the nature of the system under study and the specific
questions being addressed.
3. RULE INDUCTION
Process of Rule Induction:
1. Data The first step is to collect a dataset that contains both input
Collection: features & corresponding labels or target values. This data is
used to train the RI algorithm.
2. Feature Depending on the nature of the data, it is needed to select
Selection/Extra relevant features or extract meaningful information to
ction: reduce the dimensionality and improve the efficiency of the
RI process.
3. RI Algorithm: Some popular algorithms include:
Decision Trees, Rule-Based Systems, Association Rule
Mining, CART (Classification and Regression Trees), FP-
growth Algorithm,
RI algorithms are used to discover rules or patterns in data, especially for classification and
decision-making tasks. These algorithms take tabular data as input, where each row
represents an observation or data point, and columns represent attributes or features.
Table below given to illustrate how a RI algorithm might work on a small dataset:

Consider a binary classification problem where we want to predict whether a student will pass or fail an exam based on two
features: Study Hours and Previous Exam Score.
Stude Study Previous Exam Rule 1: If Study Hours ≥ 3 AND Previous Exam Score ≥ 70, THEN Pass.
nt Hours Score Pass/Fail - a pattern that students who study for 3 or more hours and scored 70 or
A 2 60 Pass higher on the previous exam tend to pass.
B 1.5 55 Fail
C 3 75 Pass Rule 2: If Study Hours < 2 AND Previous Exam Score < 60, THEN Fail.
D 4 80 Pass - a pattern that students who study for less than 2 hours and scored below
60 on the previous exam tend to fail.
E 2.5 70 Pass
F 1 45 Fail
G 3.5 65 Pass Rule 3: Otherwise, it's not possible to make a clear prediction based on
H 2 50 Fail these rules.
This rule serves as a catch-all for cases that do not match Rule 1 or Rule 2.
Applications of Rule Induction:
- Classification: commonly used for classification tasks, such as spam email detection, medical
diagnosis, and customer churn prediction.

- Regression: It can also be adapted for regression tasks, where rules are used to predict
continuous numeric values, like predicting house prices or stock prices.

- Association Rule Mining: In retail and market basket analysis, it is used to discover patterns in
customer purchase behavior.

- Expert Systems: RI can be used to build expert systems that mimic human decision-making by
encoding domain-specific knowledge into rules.

RI is a valuable technique for knowledge discovery and interpretable ML, as the


resulting rules are often easy to understand and explain, making it a useful tool
in domains where transparency and interpretability are important.
4. SUPERVISED LEARNING AND UNSUPERVISED LEARNING

a) Supervised Learning:
- is a type of ML where the algorithm learns from a labeled dataset, which means it is
provided with input data along with corresponding correct output or target values during
training.
Goal: to learn a mapping or relationship from inputs to outputs, enabling the model to make
predictions or classifications on new, unseen data.

Examples:
1. Classification: Given a dataset of images, each labeled with the object it contains (e.g.,
cat, dog, car), the algorithm learns to recognize and classify new images into these
predefined categories.

2. Regression: In housing price prediction, the model is trained on historical data that
includes features like the number of bedrooms, square footage, and location, along with the
actual sale prices. It learns to predict the price of a new house based on its features.
b) Unsupervised Learning:

• is a ML paradigm where the algorithm works with unlabeled data, meaning it doesn't have
access to explicit output or target values during training. Instead, the algorithm seeks to
discover patterns, structures, or relationships within the data on its own.

• Often used for data exploration, clustering, dimensionality reduction, and feature extraction.
Examples:

Clustering: Given a dataset of customer purchase histories without any predefined categories,
the algorithm groups customers with similar buying behavior into clusters, such as "frequent
shoppers" or "occasional buyers."
Dimensionality Reduction: In high-dimensional data, like images or genomic data,
unsupervised learning techniques like Principal Component Analysis (PCA) can be used to
reduce the number of features while retaining important information.
Anomaly Detection: When monitoring network traffic for cybersecurity, it can identify unusual
patterns or potential attacks by flagging data points that deviate significantly from the norm.
In summary, supervised learning uses labeled data to teach the model specific tasks, while
unsupervised learning explores the data's inherent structure without the need for labeled
examples. Both approaches have their unique applications and are essential in the field of ML

You might also like