You are on page 1of 15

Machine Learning for

Signal Processing
Laboratory

Assignment 3
Baye’s Classifier

Kushagra Parmeshwar

20EE38011

1
Signal representation as a vector 3

Feature extraction of signal 4

Bayes Classifier 5

Dataset 6

Observations 6

Audio Data Loading and Pre-processing: 6

Segmentation for Feature Extraction: 6

Dimensionality Reduction and Visualization: 6

Data Preparation and Handling: 7

Statistical Analysis and Model Preparation: 7

Code and Plots: 8

1. Bayes’ Classifier
Signal representation as a vector
The audio sequence is denoted as {s[n]} = {s[0], s[1], ..., s[ND − 1]}, where ND
represents the total number of digital samples in the audio. It is assumed that there are
no overlapping samples between the vectors in the alternative representation of this
signal, where each vector si ∈ R D×1.

We additionally depict the collection of all these N vectors as a matrix S ∈ R_D×N


created by column-wise stacking every single S in order to have:
here we denote yi to be the label for si such that Y ∈ Z_1×N

Feature extraction of signal


The Discrete Cosine Transform (DCT) is a mathematical technique widely used in
signal and image processing, with applications in digital computer vision and
watermarking, among others. It is employed to transform signals from the time
domain to the frequency domain. In essence, it takes a series of values, such as those
representing a digital audio or image signal, and converts them into a set of
coefficients that represent the contributions of different frequency components. This
allows for signal compression by removing high-frequency elements that may not be
perceptually significant.

Compared to the Fourier Transform (FT), which uses both sine and cosine functions,
the DCT exclusively employs cosine functions. This characteristic makes the DCT
particularly efficient for processing real-valued signals, such as audio and images, as
it produces real-valued output. To apply the DCT, the input signal is typically
segmented into smaller blocks, and the transform is applied to each block
individually. For a signal s0 of length D, the k-th DCT coefficient x(k)0 can be
expressed as follows:
Let xi represent a column vector with D dimensions, signifying the feature vector.
This vector comprises real-valued numbers and is denoted as xi ∈ R^D×1

Additionally, we represent the collection of all N samples as a matrix X, created by


arranging all the xi vectors column by column. Thus, X ∈ R^D×N.

Likewise, let yi denote the label for xi, such that the labels for all samples are
combined into a row vector Y, represented as Y ∈ R^1×N.

Bayes Classifier
Consider a scenario involving K classes labeled as C1, C2, ..., CK, and a dataset
containing n training examples (x1, y1), ..., (xn, yn). Here, xi ∈ R^d×1 represents a
d-dimensional feature vector, and yi ∈ {1, 2, ..., K} indicates the corresponding class
label. The prior probability of class Ck, denoted as P(Ck), represents the probability
that a randomly selected example belongs to class Ck. Moreover, let μk and Σk denote
the mean vector and covariance matrix for class Ck. To classify a new example x, we
compute the posterior probability of each class Ck given x using Bayes' rule.
The term |Σk| represents the determinant of the covariance matrix Σk, while Σ^-1_k
signifies the inverse of Σk. Subsequently, the class with the highest posterior
probability is chosen as the predicted class for the given example x:

Dataset
Observations
Audio Data Loading and Pre-processing:

The code begins by loading an audio file using librosa.load, specifying a sample rate
of 44100 Hz. This initial step is essential for ensuring consistent data quality and
format for subsequent analysis.

Next, it divides the audio data into two classes ('voice' and 'non-voice') by splitting
the first 60 seconds into two 30-second segments. This segmentation is fundamental
for supervised learning tasks, where each segment is labeled according to its class.

Segmentation for Feature Extraction:

The Segment class from seglearn.transform is utilized to partition the audio data into
non-overlapping segments based on a predetermined frame duration (20 milliseconds
in this case). This is a crucial step in transforming the time-series audio data into a
structured format suitable for machine learning models.

Following segmentation, the Discrete Cosine Transform (DCT) is applied to the


segments to extract frequency domain features. DCT is chosen for its efficiency in
representing signal energy compactly in a few coefficients, which is advantageous for
audio analysis.
Dimensionality Reduction and Visualization:

The code utilizes Principal Component Analysis (PCA) to reduce the dimensionality
of the DCT-transformed data. By projecting the data onto the first two principal
components, it enables a visual understanding of the data's structure and the
separability of classes in a reduced dimensional space.

The visualization step with PCA is particularly informative as it unveils the


distribution and overlap between the two classes, offering a visual cue on the
complexity of the classification task.

Data Preparation and Handling:

The code adopts a systematic approach to preparing the training data, including
shuffling to ensure randomness and prevent any bias induced by the order of data
samples. This practice enhances model robustness and reliability.

A function is defined to stack vectors by their corresponding labels, facilitating the


organization of data for class-specific analysis, such as computing mean vectors and
covariance matrices. This structured approach is crucial for methods like Linear
Discriminant Analysis (LDA) or Gaussian Mixture Models (GMMs) that rely on
class-specific statistics.

Statistical Analysis and Model Preparation:

The code calculates mean vectors and covariance matrices for each class, which are
fundamental statistical measures in various classification algorithms. These measures
describe the data's distribution within each class and are essential for probabilistic
models like Naive Bayes or Gaussian Mixture Models.
Additionally, it checks for the positive definiteness of the covariance matrices,
ensuring that the multivariate normal distributions used in likelihood calculations are
well-defined. This step demonstrates careful attention to the mathematical
assumptions underlying the statistical models being prepared.
Code and Plots:

You might also like