Music Note Recognition Using FFT

MUSIC NOTE RECOGNITION USING FFT
E.N.V PURNA CHANDRA RAO 1, I. NARASIMHA RAO 2
1
professor & HOD, 2 Assistant professor
Department of electronics and communication Engineering
CMR College Of Engineering & Technology, Kandlakoya, Medchal Road, Hyderabad-501401
Abstract- Signals and sound waves are part of our everyday life. However, music is a
distinct type of signal. Musical sounds each have a certain pitch that we can differentiate
as notes. A song contains basically two things, vocal and background music. Reading a
song on sheet music and then playing it on an instrument is an easy task for any musician.
In this century, computer software has also been designed to do just this. Programs can
create audio files (music we can hear) from sheet music very effectively for a whole range
of instruments. A major problem is that the reverse task, listening to or recording audible
music and then generating the sheet music for that piece, is much more difficult to
complete for both computers and talented musicians alike. Extracting the characteristics of
a song becomes more important for various objectives like learning, teaching, and
composing. The idea of this project is to develop a program that would take an audio input
(a song) and process it, in order to give musical notes as an output.
Keywords: Time-Frequency Analysis; Musical Note; Sampling Frequency; Recording;
Extraction.
I. Introduction various objectives like learning,
Songs play a vital role in our day-to- teaching, and composing. This project
day life. A song contains two things, takes the song as an input, extracts the
vocal and background music. Where features, and detects and identifies the
the characteristics of the voice depend notes, each with a duration. First, the
on the singer and in the case of song is recorded and digital signal
background music, it involves a processing algorithms are used to
mixture of different musical identify the characteristics. The
instruments like piano, guitar, drum, experiment is done with several piano
etc. Extracting the characteristics of a songs where the notes are already
song becomes more important for known, and identified notes are
compared with original notes until the because of a situation we call
detection rate goes higher. And then harmonic ambiguity; this occurs when
the experiment is done with piano one pitch whose fundamental
songs with unknown notes with the frequency is an integer multiple of
proposed algorithm. The ability to another pitch. The problem is solved
derive the relevant musical by careful signal processing in both
information from a live or recorded the time domain signals and frequency
performance is relatively easy for a domain signals. The main objective of
trained listener but highly non-trivial this project is to create an aid tool for
for a learner and computer. For learning for Musicians, Producers,
several practical applications, it would Composers, DJs, remixers, Teachers,
be desirable to obtain this information and Music Students. This project can
in a quick, error-free, automated be treated as a box, where you give
fashion. This thesis discusses the any song as input and get the features
design of a software system that of the song out. This project aims to
accepts as input a digitized waveform propose methods to analyze and
representing an acoustical music describe a signal, from which the
signal, and that attempts to derive the musical parameters can be easily and
notes from the signal so that a musical objectively obtained, in a sensible
score could be produced. This signal manner. A common limitation found
processing algorithm involves event in the musical literature is that how
detection, or precisely where within such parameters are obtained is
the signal the various notes begin and intuitively satisfactory but, to our
end, and pitch extraction, or the view, not very sound from a signal
identification of the pitches being processing perspective.
played in each interval. The event
detection is carried out using the time
domain analysis of the signal, where
the problem arises at a different speed.
Pitch detection (nothing but frequency
identification) is more complicated
Fig 1: Note durations called Arohana and Avarohana. The home
Every song has a tempo or a speed at which notes are named Griha Swara, the dominant
the music is to be played. Tempo is defined is called Vaadi, and Subdominant is called
as beats per minute, where a beat is usually Samvadi. The Dissonant is called Vivaldi.
defined to be a particular length of the note. The landing notes or resting notes are called
All notes’ lengths are then given a value, Nyasa Swara. Minimum an octave must
such as a quarter or a half. This value consist of 5 notes at least in a Raga. Notes
determines how many beats that note should sung with 3 or 4 notes in an octave are very
last. Interestingly enough, a beat is usually rarely performed. The root note is “Sa” and
defined to be a one-quarter note, and thus a a Raga must use “Ma” or “Pa” by default.
quarter note is 1 beat, a half note is 2 beats, Either one of them or both can be used in the
and an eighth note is half a beat. same raga. Other notes are exceptional.
Various combinations can be performed
using the octaves. To expertise in the
concepts and to develop an algorithm, some
more references were viewed and have been
discussed below. In Musical Notes
Identification using Digital Signal
Processing [1] paper, the input is taken as an
audio file and it is processed to extract the
Fig 2: Frequency table music
features to identify the note of the song. To
identify the characteristic of the song, digital
II. LITERATURE SURVEY
signal processing techniques are used and
The Scales are fundamental to all music. In
has been explained. The piano songs are
Ancient Greece, the scales are referred to as
only allowed as input. The piano songs are
modes. Some commonly known scales are
used because the notes are known by us
Major scale, Minor scale, jazz sale, blue
already, the identified notes and the original
scale, etc. A Raga can be derived from a
notes are compared until a higher rate is
scale. It is a unique personality or distinct
detected. This method used here for the
flavor which has no fixed rules precisely
identification of notes is more optimized
which combination. The ascendants and
than the previous methods. We can get the
descendants in Indian classical music are
results by varying the parameters like 2.1 System Model
threshold values and width, with the time A sound can be characterized by the
duration of each note. Thus, it can be used following three quantities:
as a tool for learning the notes of a song. (i) Pitch.
In pyAudioAnalysis: An Open- (ii) Quality.
Source Python Library for Audio Signal (iii) Loudness.
Analysis [8], is all about an open-source Pitch is the frequency of a sound as
Python library that provides a wide range of perceived by the human ear. A high
audio analysis procedures including feature frequency gives rise to a high pitch note and
extraction, classification of audio signals, a low frequency produces a low pitch note.
and supervised and unsupervised A pure tone is the sound of only one
segmentation. PyAudioAnalysis is licensed frequency, such as that given by a tuning
under the Apache License. This paper fork or electronic signal generator. The
provided the implemented methodologies by fundamental note has the greatest amplitude
the theoretical background, along with an and is heard predominantly because it has a
evaluation of some metrics of the methods. larger intensity. The other frequencies such
Several audio analysis research applications as 2fo, 3fo, and 4fo, are called overtones or
use the pyAudioAnalysis speech emotion harmonics and they determine the quality of
recognition, depression classification based the sound. Loudness is a physiological
on audiovisual features, smart-home sensation. It depends mainly on sound
functionalities through audio event pressure but also on the spectrum of the
detection, music segmentation, multimodal harmonics and the physical duration.
content-based movie recommendation, and 2.2 Musical Notes
health applications such as monitoring Humans can hear signal frequencies ranging
eating habits. SVM regression map the from 20-20 kHz. From this wide range,
audio features extracted from the previous some parts are associated with the piano.
steps to one or more supervised variables. Different pianos have different ranges. Each
The library also provides a semi-supervised tone of the piano has one particular
silence removal functionality. Based on the fundamental frequency and is represented by
literature survey, future enhancements have a note like C, D, ...etc. as shown in fig 1.
been drafted. The later C is 12 half steps away from the
previous one and has doubles the 2.4 Frequency & Fourier Transforms
fundamental frequency. Hence this portion (FFT)
(from one C immediate next C) is called one A Fourier transform provides the means to
octave. Different octaves are differentiated break up a complicated signal, like musical
by C1, C2, etc. tone, into its constituent sinusoids. This
method involves many integrals and a
continuous signal. We want to perform a
Fourier transform on a sampled (rather than
continuous) signal, so we have to use the
Discrete Fourier Transform instead.
Fig 3. An octave of a piano

III. Proposed Methodology
2.3 Digital Signal Processing for Music

When a sound wave is created by your
voice (or a musical instrument), it's an
analog wave of changing air pressure.
However, for a computer to store a sound
wave, it needs to record discrete values at Fig 4: Block diagram of the proposed
discrete time intervals. The process of methodology
recording discrete time values is called The audio signals stored in the computer are
sampling, and the process of recording different in format, sampling rate, the
discrete pressures is called quantizing. number of bits, or the original audio signal
Recording studios use a standard sampling containing sharp noise, which can affect the
frequency of 48 kHz, while CDs use the rate processing effect. At the same time, the unit
of 44.1 kHz. Signals should be sampled at of audio processing is in a frame, so before
twice the highest frequency present in the the feature extraction, the original audio data
signal. Humans can hear frequencies from needs to be pre-processed: into a unified
approximately 20-20,000 Hz, which format, pre-emphasis, segmentation, and
explains why common sampling frequencies windowing framing.
are in the 40 kHz range. The envelope of a sound can be measured in

four ways:
1. Attack – The attack is the portion of the string, on the other hand, may build up with
envelope that represents the time taken for a slow attack, sustain for a short period and
the amplitude to reach its maximum level. then release. The concept of hearing
Essentially it is the initial build-up of a envelopes relies upon the root-mean-square
sound. values (RMS) values of amplitude and not
2. Decay – Decay is the progressive peak-to-peak amplitudes. High peaks in the
reduction in the amplitude of a sound over signal will not necessarily make an
time. The decay phase starts as soon as the instrument sound loud unless the amplitude
attack phase has reached its peak. In the is sustained for some time. Short peaks tend
decay phase, the signal level drops until it to contribute to the character (timbre) of the
reaches the sustain level. sound rather than the loudness.
3. Sustain – The sustain is the period during Edge Detection: We need to determine the
which the sound is held before it begins to edges of the audio wave to identify if a note
fade out. Many instruments do not contain a was played or not. Edge detection includes a
sustained phase. variety of mathematical methods that aim at
Sound with long sustain (and no attach or identifying edges. There are many methods
delay) for edge detection, but most of them can be
4. Release – The release is the final fade or grouped into two categories, search-based
reduction in amplitude over time. and zero-crossing based. The search-based
methods detect edges by first computing a
measure of edge strength, usually a first-
order derivative expression such as the
gradient magnitude, and then searching for
local directional maxima of the gradient
magnitude using a computed estimate of the
local orientation of the edge, usually the
Fig 5 : Attack, decay, sustain, release graph gradient direction. The zero-crossing-based
Every sound is different. Percussion methods search for zero crossings in a
sounds start suddenly, then decay and second-order derivative expression
release quickly as no more energy is being computed from the image to find edges,
applied to sustain the sound. A bowed usually the zero-crossings of the Laplacian
or the zero-crossings of a non-linear The integral is evaluated for all values of
differential expression. As a pre-processing shift, producing the convolution function.
step to edge detection, a smoothing stage, Convolution provides a way of multiplying
typically Gaussian smoothing, is almost together two arrays of numbers, generally of
always applied. different sizes, but of the same
Convolution: Convolution is a dimensionality, to produce a third array of
mathematical operation on two functions (f numbers of the same dimensionality.
and g) that produces a third function (f*g) Fast Fourier Transform (FFT) : The "Fast
that expresses how the shape of one is Fourier Transform" (FFT) is an important
modified by the other. The term convolution measurement method in the science of audio
refers to both the result function and to the and acoustics measurement. It converts a
process of computing it. It is defined as the signal into individual spectral components
integral of the product of the two functions and thereby provides frequency information
after one is reversed and shifted. about the signal. FFTs are used for fault
Convolution is a mathematical operation analysis, quality control, and condition
that expresses a relationship between an monitoring of machines or systems. This
input signal, the output signal, and the article explains how an FFT works, the
impulse response of a linear-time invariant relevant parameters, and their effects on the
system. An impulse response is the response measurement result. Strictly speaking, the
of any system when an impulse signal (a FFT is an optimized algorithm for the
signal that contains all possible frequencies) implementation of the "Discrete Fourier
is applied to it. A linear time-invariant Transformation" (DFT). A fast Fourier
system is a system that a) behaves linearly, transform (FFT) is an algorithm that
and b) is time-invariant (a shift in time at the computes the discrete Fourier transform
input causes a corresponding shift in time in (DFT) of a sequence, or its inverse (IDFT).
the output). Fourier analysis converts a signal from its
original domain (often time or space) to a
representation in the frequency domain and
vice versa.
Fig 6: Convolution Process
● The selected number of samples; the block
length BL. This is always an integer power
to the base 2 in the FFT (e.g., 2^10 = 1024
samples)
From the two basic parameters fs and BL,
further parameters of the measurement can
be determined.
Bandwidth fn (= Nyquist frequency).
This value indicates the theoretical
Fig 7: View of a signal in the time and maximum frequency that can be determined
frequency domain by the FFT.
In the first step, a section of the fn = fs / 2
signal is scanned and stored in the memory For example, at a sampling rate of 48 kHz,
for further processing. Two parameters are frequency components up to 24 kHz can be
relevant: theoretically determined. In the case of an
analog system, the practically achievable
● The sampling rate or sampling frequency value is usually somewhat below this, due to
fs of the measuring system (e.g. 48 kHz). analog filters - e.g. at 20 kHz.
This is the average number of samples
obtained in one second (samples per
second).
IV. Simulation Results
Fig 8: Various steps involved

(a) (b) (c)
Fig 9: Data set songs recording out information (a), (b), (c).
(b) (b) (c)

Fig 10: Spectral Analysis of The Output for database audio music.
V. Conclusion up to eighth notes, a high success rate in

In this paper, we successfully designed and note detection when single notes were
implemented an effective and user-friendly played at a time, and a very precise note-
music note recognition system with FFT. length or tempo determination for all notes
The target users of the system are not only which were detected and analysed.
the people practicing music or learning to However, there are a few shortcomings of
play a musical instrument but also our project such as poor note-hit detection in
professional musicians who cannot waste busy or fast songs, overlapping note-hits
their time figuring out the notes of an audio causing single, held notes to appear as
sample. We have achieved a high precision multiple shorter notes, and faults in an
for hit detection for reasonably paced songs instrument and non-ideal behaviour that can
cause anomalies in output. Finally, our [6] Youssef, K. and P. Y. Wo,” Music Note
Recognition Based on Neural Network” in
system is not real-time. We can't implement
2008 in Fourth International Conference on Natural
our process of musical note recognition as
Computation. IEEE.
the instrument is playing in real time. In [7] Jay k.Patel and E.S.Gopi ,”Musical Notes
fact, through our MATLAB-based code, we Identification using Digital Signal
first input the sound file and it takes around Processing” in Proceedings of 3rd International
60 seconds to calculate. This computation Conference on Recent Trends in Computing 2015.

[8] Yanfang Wang, “Research on handwritten note
was performed on a 1.3 GHz computer with
recognition in digital music classroom based on deep
lots of RAM. Thus, we may need to improve learning” in 2006.
efficiency in real time. [9] Surekha B. Puri1, Dr.S. P. Mahajan, “Automatic
Note and Chord Recognition for Harmonium Music:
REFERENCES A Deep Learning Approach” in 2020.

[10] Nisha B. Dervaliya and Dipesh G. Kamdar,
[1] Yappy Sazaki, Rosa Ayuni, and S.Kom, “Musical
“Software implementation of reproducing music from
Note Recognition Using Minimum Spanning Tree
musical notes (mozart)” in 2018.
Algorithm” in 2014 8th International Conference on
[11] M Akbari, AT Targhi and MM Dehshibi,
Telecommunication Systems Services and
“TeMu- app: Music Character Recognition using
Applications (TSSA).
HOG and SVM” in Conference:2015 IEEE.
[2] Frederico Malheiro and Sofia
[12] A Pacha and H Eidenberger, “Towards a
Cavaco ,“Automatic Musical Instrument and Note
Universal Music Symbol Classifier using Deep
Recognition” in January 2011 INForum Conference.
Learning” in 2017 by IEE.
[3] Ansam Nazar Younis and Fawziya Mahmood
[13] K Chin Lee and C Y Ting, “A comparison of
Ramo,”A new parallel bat algorithm for musical note
HMM, Naïve Bayesian, and Markov model in
recognition” in February 2019 International Journal
exploiting knowledge content in digital ink: A case
of Electrical and Computer Engineering.
study on hand written music notation recognition”
[4] Allabakash Isak Tamboli and Rajendra D. Kokate
published in 2010 IEEE.
,”An Effective Optimization-Based Neural Network
[14] Ian VonSeggern, “Note Recognition in Python”
for Musical Note Recognition” published online June
in 2020.
30, 2017 ,Journal of Intelligent Systems.
[15] Adarsh Baghel from IIT, “Music Note
[5] D. G. Bhalke, C. B. Rama Rao and D. S.
Detection” in 2019.
Bormane, Dynamic time warping technique for
musical instrument recognition for isolated notes, in:
Proceedings of International Conference on
Emerging Trends in Electrical and Computer
Technology (ICETECT), 2011.

Music Note Recognition Using FFT

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Music Note Recognition Using FFT

Uploaded by

Copyright:

Available Formats

MUSIC NOTE RECOGNITION USING FFT

E.N.V PURNA CHANDRA RAO 1, I. NARASIMHA RAO 2

Fig 3. An octave of a piano

2.3 Digital Signal Processing for Music

discrete time intervals. The process of methodology

of 44.1 kHz. Signals should be sampled at of audio processing is in a frame, so before

approximately 20-20,000 Hz, which format, pre-emphasis, segmentation, and

explains why common sampling frequencies windowing framing.

are in the 40 kHz range. The envelope of a sound can be measured in

Fig 8: Various steps involved

(b) (b) (c)

V. Conclusion up to eighth notes, a high success rate in

60 seconds to calculate. This computation Conference on Recent Trends in Computing 2015.

REFERENCES A Deep Learning Approach” in 2020.

You might also like