Professional Documents
Culture Documents
Abstract
We, in our project aim to Develop an effective and versatile CNN model for classifying
electric guitar playing styles into nine different categories. The model is trained on a dataset
of audio recordings, where the audio files are converted into various types of spectrograms,
such as Mel-Frequency Cepstral Coefficients (MFCC), Linear Frequency Cepstral
Coefficients (LFCC) and Triangular Filter Banks. These spectrograms serve as input features
to the CNN model, which identifies sequences in the audio data that are characteristic of
different playing styles. We calculate the efficiency metrics such as accuracy, loss, confusion
matrix, and F1 scores. Additionally, the project explores the potential of combining Mel
Spectrogram +Filter Banks and LFCC+MFCC to potentially improve classification accuracy.
Introduction:
Guitar playing style classification is a challenging task due to the diverse range of techniques
employed by guitarists. The ability to automatically categorize guitar playing styles can have
practical applications not in music analysis, genre identification, and music recommendation
systems but also in other applications involving a spectral analysis of sound to use in
practical scenarios like to detect machine malfunctions from sound, or in healthcare to
identify respiratory issues through cough analysis. Environmental applications include
recognizing gunshots or animal sounds for security or wildlife monitoring. Automatic
instrument recognition can even analyse background noise to optimize audio settings on
smart devices.
In this project, we use Convolutional Neural Networks (CNNs), to tackle this classification
problem. CNNs have proven to be highly effective in various domains, including image and
audio processing, due to their ability to capture local patterns and extract relevant features
from raw data. We sort the Audio into 9 distinct classes:
• Picking Styles: Alternate picking creates a smooth sound, while sweep picking
delivers a rapid flow.
• Legato: Notes are played smoothly, transitioning smoothly.
• Tapping: Both hands are used on the fretboard for a percussive sound
• Vibrato: Contains subtle pitch fluctuations.
So, we analyse their spectral characteristics to try to understand and classify their
variance from each other.
Seeing their Mean Filter Bank for each class across each different class helps us get a
perspective:
Methodology:
Step Description
Data Preprocessing Audio Files are converted to different types
of spectrograms (MFCC, Triangular Filter
Banks) using librosa library.
Conclusion:
Data Split Method Accuracy F1 score (%)
(%) [avg]
1 MFCC 92.77 90.66
2 Triangular filter- 91.18 90.55
bank+ Mel
Spectrogram
3 LFCC 90.57 88.66
4 LFCC+MFCC 91.79 90.22
The Electric Guitar playing style classification by Filter Bank Feature Fusion project
successfully develops a CNN model capable of classifying electric guitar playing styles into
nine different categories. The model used a dataset of audio recordings, where the audio files
were converted into various types of spectrograms, such as MFCC and Triangular Filter
Banks.
The project also explored the potential of combining Mel Spectrograms,Filter Banks and
LFCCs to create an ensemble model, which showed promising results accuracy tests by
leveraging the slightly complementary information provided by the both feature
representations. While Mel spectrograms represent the energy distribution across different
frequencies (mimicking the human auditory perception), filter banks provide a more detailed
representation of the spectral envelope. By concatenating these two features, the model can
potentially learn from the complementary information they provide, leading to a more
comprehensive representation of the audio signal. Concatenating multiple features can make
the model more robust to variations in the input data. If one type of feature fails to capture
certain aspects of the audio signal due to noise or other factors, the other feature type may
still provide useful information, thereby increasing the overall robustness of the model.
It's worth noting that the effectiveness of using concatenated features can vary depending on
the specific task and dataset. In some cases, using only one type of feature (e.g., Mel
spectrogram or filter bank) may be sufficient (like in the case here), or even better than using
concatenated features.
The trained model achieved satisfactory performance, as evidenced by the reported metrics
such as accuracy, loss, confusion matrix, and F1 scores. Additionally, the project included
visualizations of the spectrograms and mean Filter Banks for each class, providing insights
into the underlying patterns learned by the model.
Overall, this project contributes to the field of music information retrieval and paves the way
for various applications, such as automatic genre classification, music recommendation
systems, and assisting musicians in refining their techniques.