You are on page 1of 8

Electric Guitar Playing Style Feature

fusion using Filter Banks


-Aryan S Nair (CH.EN. U4ECE22010), Priyanshu Kumar Singh (CH.EN. U4ECE22036),
Sankeerthana Sanikommu (CH.EN. U4ECE22061)

Abstract
We, in our project aim to Develop an effective and versatile CNN model for classifying
electric guitar playing styles into nine different categories. The model is trained on a dataset
of audio recordings, where the audio files are converted into various types of spectrograms,
such as Mel-Frequency Cepstral Coefficients (MFCC), Linear Frequency Cepstral
Coefficients (LFCC) and Triangular Filter Banks. These spectrograms serve as input features
to the CNN model, which identifies sequences in the audio data that are characteristic of
different playing styles. We calculate the efficiency metrics such as accuracy, loss, confusion
matrix, and F1 scores. Additionally, the project explores the potential of combining Mel
Spectrogram +Filter Banks and LFCC+MFCC to potentially improve classification accuracy.

Introduction:
Guitar playing style classification is a challenging task due to the diverse range of techniques
employed by guitarists. The ability to automatically categorize guitar playing styles can have
practical applications not in music analysis, genre identification, and music recommendation
systems but also in other applications involving a spectral analysis of sound to use in
practical scenarios like to detect machine malfunctions from sound, or in healthcare to
identify respiratory issues through cough analysis. Environmental applications include
recognizing gunshots or animal sounds for security or wildlife monitoring. Automatic
instrument recognition can even analyse background noise to optimize audio settings on
smart devices.

In this project, we use Convolutional Neural Networks (CNNs), to tackle this classification
problem. CNNs have proven to be highly effective in various domains, including image and
audio processing, due to their ability to capture local patterns and extract relevant features
from raw data. We sort the Audio into 9 distinct classes:

• Picking Styles: Alternate picking creates a smooth sound, while sweep picking
delivers a rapid flow.
• Legato: Notes are played smoothly, transitioning smoothly.
• Tapping: Both hands are used on the fretboard for a percussive sound
• Vibrato: Contains subtle pitch fluctuations.

So, we analyse their spectral characteristics to try to understand and classify their
variance from each other.

Seeing their Mean Filter Bank for each class across each different class helps us get a
perspective:

LFCC AND MFCC calculation by Librosa:


Literature Survey:

Author Title Methodology/P Findings


arameters used
Alexandros A The research introduces a Their information collected
Mitsou, Antonia multimodal audio+video dataset for addresses a notable gap in
recognising electric guitar comprehensive datasets for
Petrogianni, dataset for playing styles, comprising 549 recognising different playing styles,
Eleni Amvrosia electric mp4 files and their respective offering diverse techniques beyond
Vakalaki, guitar audio files. Recordings were conventional ones. Its inclusion of
Christos Nikou, playing made using an Android video and audio data enables
Theodoros technique smartphone device in a home exploration of new deep learning
studio setup, featuring three architectures. Machine learning
Psallidas, recognition different instruments and experiments using SVM and CNN
Theodoros utilised simulated amplifiers models showed promising
Giannakopoulos to extract sound. The dataset performance spanning 67.8% to
includes MuseScore files for 84.2% for SVM and 76.5% to 81.1%
educational purposes and for CNN. However, limitations exist
future expansion. Support due to reliance on recordings from
Vector Machine (SVM) and a single player, suggesting potential
Convolutional Neural improvements through
Network (CNN) models were diversification of samples from
employed to classify the files multiple players to capture broader
into different styles. playing styles.
Vincent Extended The research conducted by The research findings revealed
Lostanlen, playing Lostanlen, Andén, and significant advancements in both
techniques: the Lagrange focused on instrument and IPT recognition
Joakim Anden, advancing automatic tasks. Evaluation on the SOL
next milestone
Mathieu in musical identification of instrumental dataset demonstrated remarkable
Lagrange, instrument playing techniques (IPTs) in precision, with a precision at rank 5
recognition music. They conducted a of 99.7% for instrument
benchmark study utilizing the recognition and 61.0% for IPT
Studio On Line (SOL) dataset, recognition. The incorporation of
which contains samples of second-order scattering
extended IPTs for 16 different coefficients, consideration of long-
instruments. The study range temporal dependencies, and
implemented machine application of metric learning using
listening systems for query- LMNN contributed to substantial
by-example browsing, improvements in accuracy
evaluating 143 extended IPTs. compared to the traditional MFCC
Three key conditions for baseline. These findings indicate
improving performance promising opportunities for
beyond the traditional mel- enhancing machine listening
frequency cepstral coefficient systems, particularly in the context
(MFCC) baseline were of music information retrieval and
identified: incorporation of recommendation systems. Overall,
second-order scattering the study underscores the
coefficients, integration of importance of considering gesture
long-range temporal and playing techniques in music
dependencies, and utilization analysis, offering valuable insights
of metric learning with large- for future research in the field.
margin nearest neighbors
(LMNN) to reduce intra-class
variability. The methodology
involved comparing the
performance of different
feature sets, including MFCCs
and scattering transforms,
while varying parameters
such as time scale for
amplitude modulation and
distance metrics.
Qinyang Xi, Guitarset: A The research presented by Xi The findings from the baseline
Rachel M dataset for et al. introduces GuitarSet, a experiments demonstrate the
comprehensive dataset effectiveness of GuitarSet in
Bittner, Johan guitar designed for guitar evaluating algorithm performance
Pauwels, transcription transcription research, for guitar transcription tasks.
Xuzhou Ye, addressing the challenges Evaluation results on note
Juan P Bello posed by the instrument's transcription accuracy using the
polyphonic nature and varied Deep Salience algorithm reveal an
playing styles. The dataset overall accuracy of approximately
comprises high-quality 46%, with variations observed
recordings of acoustic guitar across genres, recording modes,
excerpts played by six tempos, and players. Similarly,
experienced guitarists, chord recognition performance,
utilizing a hexaphonic pickup assessed against state-of-the-art
to capture individual string algorithms, showcases variations in
recordings and automate accuracy across different genres,
annotation processes. indicating challenges associated
Detailed annotations include with chord complexity and
string and fret positions, variations in playing styles. Beat
chords, beats, downbeats, and downbeat detection
and playing style, enabling a algorithms, evaluated on GuitarSet,
wide range of analysis tasks exhibit substantial differences in
beyond transcription, such as performance across players,
performance analysis and highlighting the impact of
chord estimation. The individual playing characteristics on
methodology involves careful algorithmic accuracy. Overall,
data collection using GuitarSet serves as a valuable
hexaphonic pickups and resource for advancing guitar
condenser microphones, transcription research, offering
coupled with an automated researchers a standardized dataset
annotation process leveraging with rich annotations to explore
tools like pYIN-note for pitch various aspects of guitar playing
estimation and custom and inform the development of
annotation tools for precise robust transcription algorithms.
note-level transcription.
Christian Automatic The paper introduces a new The proposed algorithm represents
Kehling, Jakob algorithm designed for the a significant advancement in guitar
Tablature auto-analysis,transcribing and transcription technology,
Abeßer, Transcription
Christian of Electric feature fusion from demonstrating outstanding
Dittmar, Gerald Guitar polyphonic instrument audio. accuracy in automatically
The algorithm focuses on extracting essential parameters
Schuller Recordings retrieving both score-related crucial for creating guitar scores
by information such as note from audio recordings. Through
Estimation onset, duration, and pitch, as meticulous experimentation and
well as instrument-specific optimization, the system achieved
of Score- details like the plucked string remarkable results, with accuracy
and and applied playing values exceeding 98% for onset and
Instrument- techniques. To achieve this, offset detection and multipitch
Related the researchers adapted estimation. Moreover, the
cutting edge techniques for algorithm displayed strong
Parameters string recognition, multipitch performance in estimating
estimation, feature instrument-related parameters,
extraction, and start and with accuracy values surpassing
offset detection. They also 80% for string number, plucking
looked into a powerful partial style, and expression style. These
tracking algorithm that could concludes the robustness of the
handle inharmonicity, extract project in accurately transcribing
a large amount of data complex guitar recordings, paving
features, and use instrument- the way for applications in music
based knowledge to make education software, music games,
predictions that were more and expressive performance
accurate. The method analysis. Additionally, the creation
demonstrated remarkable of a novel dataset facilitates further
precision, achieving 98% in research and evaluation in the field
offset and onset detection of guitar transcription, highlighting
and multipitch estimation. It the paper's contributions to
also shown strong advancing the state-of-the-art in
performance in terms of music information retrieval for
instrument-related guitar recordings.
characteristics, obtaining 82%
in string number, 93% in
plucking style, and 83% in
expressive style.

Methodology:
Step Description
Data Preprocessing Audio Files are converted to different types
of spectrograms (MFCC, Triangular Filter
Banks) using librosa library.

The Spectrograms are resized to a fixed


input shape for the CNN model
Feature Extraction The input features comprise of mel-
spectrograms extracted from 1-second
audio segments using a STFT window fifty
miliseconds with no overlapping. The mel-
spectrograms are computed with 128 mel-
frequency bins, resulting in a two
dimensional form of frame 20 × 128
(timestamps x frequency bins) for each one
second sound part.
Model Architecture Four convolutional layers and a three-layer
linear classifier make up the CNN model.
The latter converts the fed graphs are
converted into 9-D data segments that
represents various electric guitar playing
styles. Every convolutional layer uses five x
five kernels with a padding of two and a
stride of one by one to preserve the spatial
dimensions. After 32 channels, each
convolutional layer has 256 channels, which
is a twofold increase. After every
convolutional layer, Batch Normalization
and LeakyReLU activation functions are
implemented. With a 2 × 2 kernel size, 2D
Max Pooling is used in each convolutional
layer to halve the spatial dimensions. This
2048D data segment is mapped to the final
9D data segments utilising 3 paths.
Every linear layer is made up of a
LeakyReLU activation after a linear
mapping. These linear layers have output
dimensions of 1048, 256, and 9, in that
order.

Model training Then train the CNN model using Adam


optimizer and sparse categorical cross-
entropy loss. Monitor training and
validation metrics (loss, accuracy)
Model Evaluation Then we evaluate the trained model on a test
dataset- Compute metrics: accuracy, loss,
confusion matrix, F1 scores
Concatenations Combine MFCC and Filter Banks as input
features - Train a separate CNN model on
the combined features - Evaluate the
concatenated model's performance
Visualisations Plot the graphs for each method - Visualize
the spectrograms (MFCC, Filter Banks) for
sample audio files

Conclusion:
Data Split Method Accuracy F1 score (%)
(%) [avg]
1 MFCC 92.77 90.66
2 Triangular filter- 91.18 90.55
bank+ Mel
Spectrogram
3 LFCC 90.57 88.66
4 LFCC+MFCC 91.79 90.22

The Electric Guitar playing style classification by Filter Bank Feature Fusion project
successfully develops a CNN model capable of classifying electric guitar playing styles into
nine different categories. The model used a dataset of audio recordings, where the audio files
were converted into various types of spectrograms, such as MFCC and Triangular Filter
Banks.

Through extensive experimentation, we demonstrate the effectiveness of using spectrograms


as input features to the CNN model, enabling it to identify patterns in the audio data that are
characteristic of different playing styles.

The project also explored the potential of combining Mel Spectrograms,Filter Banks and
LFCCs to create an ensemble model, which showed promising results accuracy tests by
leveraging the slightly complementary information provided by the both feature
representations. While Mel spectrograms represent the energy distribution across different
frequencies (mimicking the human auditory perception), filter banks provide a more detailed
representation of the spectral envelope. By concatenating these two features, the model can
potentially learn from the complementary information they provide, leading to a more
comprehensive representation of the audio signal. Concatenating multiple features can make
the model more robust to variations in the input data. If one type of feature fails to capture
certain aspects of the audio signal due to noise or other factors, the other feature type may
still provide useful information, thereby increasing the overall robustness of the model.
It's worth noting that the effectiveness of using concatenated features can vary depending on
the specific task and dataset. In some cases, using only one type of feature (e.g., Mel
spectrogram or filter bank) may be sufficient (like in the case here), or even better than using
concatenated features.

The trained model achieved satisfactory performance, as evidenced by the reported metrics
such as accuracy, loss, confusion matrix, and F1 scores. Additionally, the project included
visualizations of the spectrograms and mean Filter Banks for each class, providing insights
into the underlying patterns learned by the model.

Overall, this project contributes to the field of music information retrieval and paves the way
for various applications, such as automatic genre classification, music recommendation
systems, and assisting musicians in refining their techniques.

You might also like