You are on page 1of 13
You have 2 free member-only stories left this month, ‘Signup for Medium and got an extra one e Taposh Duta-Roy (Gn) vome ti 04t20,2019 - minread - + Memberonly - © Usten Basics of Audio File Processing In today’s day and age, digital audio has been part and parcel of our life. One can talk to Siri or Alexa or “Ok Google” to search for information. Siri or Alexa or Google knows who is asking for information, and can search for the ask and reply contextually. ‘The idea of writing this post is to provide basic information on audio processing using R as the programming language. However, before we go to using R as our choice of language, let's walk through and understand some basics of sound and digital audio. What is Sound? Sound is a pressure change of air molecules created by a vibrating object. This pressure change by vibrating object creates a wave. A wave is how sound propagates. Sound is a thus a mechanical wave that results from the back and forth vibration of the particles of the medium through which the sound wave is moving [4]. Sound is processed through our ears via the ‘auditory’ sense. Thus, sound can also be called as audio. Audio processing a hugely researched domain and lot of very good papers talk about audio processing. As part of this post we will only talk about very basic but helpful information to develop an intuitive understanding. |t—\wavelength_——| Velocity of propagation — 7 Amplitude Pats of aucio signal Sound waves can be described by the number of waves per second and the size of the waves. The number of waves per second (or the distance between, high or low points) is the frequency of the sound. As shown in the figure above, the horizontal distance between any two successive equivalent points on the wave is called the wavelength. Thus, the wavelength is the horizontal length of one cycle of the wave. The period of a wave is the time required for one complete cycle of the wave to pass by a point. So, the period is the amount of time it takes for a wave to travel a distance of one wavelength. This understanding is important for analysis of sound as we will move from time domain to frequency domain, Thus frequency of sound = Velocity of propagation / Wavelength SSP lt Low Frequency High Frequency The size of each wave is described by the amplitude, Amplitude determines how loud or soft a sound will be. Components of Sound or Audio Sound can be divided into multiple components depending on how you want to analyze it. For the purpose of this article, we will classify sound into 2 main components — Amplitude and Frequency. The frequency components can be further divided into — Pitch, Formant, Bandwidth, Sampling Rate, and others (overtone, harmonics etc.) ED eooe fae en fase Components of Sound ‘Amplitude: As noted, amplitude determines how loud or soft a sound will be. Loudness is a measure of sound wave intensity. Intensity is the amount of energy a sound has over an area. The same sound is more intense if you hear it in a smaller area. In general, we call sounds with a higher intensity louder. Amplitude is a thus a measure of energy. The more energy a wave has, the higher its amplitude. As amplitude increases, intensity also increases. Some Basic Frequency Components: Literature provides a variety of frequency components, for the purpose of this article we will talk about the pitch, sampling rate, format, and bandwidth, Sample rate (or sampling frequency) is the number of samples per second in ‘Sound. For example: if the sampling rate is 4000 hertz, a recording with a duration of 5 seconds will contain 20,000 samples. Pitch is the frequency of the fundamental component in the sound, that is, the frequency with which the waveform repeats itself. Pitch depends on the frequency of a sound wave. Frequency is the number of wavelengths that fit into one unit of time. Formant is a concentration of acoustic energy around a particular frequency in the speech wave. Thus, they are the peaks that are observed in the spectrum envelope. Bandwidth is the range of frequencies within a given band, in particular that used for transmitting a signal. There are a few packages in R which do audio analysis. The key ones that we have seen are : tuneR, wrassp and audio. We use “readr” package to read the wave form. tuneR Package: Documentation: hutpsil tation orp/packages/tuneR/versions/1.3.3 Before we go ahead and analyze the entire training data-set, lets analyze a single wave. #Read a Wave File library(readr) library(tuneR) path offile {file_audio_path <- ‘audio_file.wav’ #Read Files train_audio = readWave({fle_audio_path) #Lets see the structure of the audio. str(train_audio) Alte sterreacing Observations : The way file has one channel (@left) containing 18593 sample points each, considering the sample rate (train_audio@samp.rate = 4000) this corresponds to a duration of 46s: 18593 /train.audio@samp.rate = 4.6 sec Our way file has a 16-bit depth (train_eudio@bit), this means that the sound pressure values are mapped to integer values that can range from ~ Sto (25-1. We can convert our sound array to floating point values ranging from -Lto Las follows: 51 <= si / 2%(trotn_qudtogbtt -1) Plotting the Wave: A time representation of the sound can be obtained by plotting the pressure values against the time axis. However we need to create an array containing the time points first: timeArray < (0:(18593-1))/ train_audio@samp.rate #Plot the wave plot(timearray, s1, type="l, col=’black, xlab=’Time (ms); ylab="Amplitude’) HERE Advanced : The R-package “tuneR” also provides complex frequency domain analysis, functions such as melfec(8,9,10], audspec ete, There are some good articles that talk about MELFCC such as “Che dummy’s guide to MECC” and “Mel ‘Frequency Cepstral Coefficient (MECC) tutorial #tuneR m2 <-melfec(train_audio, numcep-9, usecmp=TRUE, modelorder=8, spec_out=TRUE, frames_in_rows=FALSE) one List of 4 cepstra + num [1:9, 1:463] 2.1454 -0.2319 -0.3497 -0,0713 -0.313 «.. aspectrun: num (1:40, 1:463] 0.938 0.938 1.699 3.518 5.285 ... spectrum: num (1:64, 1:463] 246780 193267 1879763 1036173 275925 ... peas + mun (2:9, 12463] 0.117 0.0271 8.0301 9.0108 0.0208 ... Wrassp Package Documentation: https:/ips-Imu.github.io/The-EMU-SDMS-Manualichap- wrassp.him] The package wrassp is capable of more than just the mere reading and writing of specific signal file formats. One can use wrassp to calculate the formant values, their corresponding bandwidths, the fundamental frequency contour and the RMS energy contour of the audio file. library(wrassp) # create path to wav file {file_audio_path <- ‘audio_file.wav’ # read audio file au = read. AsspData0bj(file_audio_path) str(au) Output of wrassp package, aus Observations : Review the similarities and differences between the output of tuneR package and wrassp. Both provide the same sample rate, number of bits etc. However, wrassp provides a class object AsspDataObj for further use. # (only plot every 10th element to accelerate plotting) plot(seq(0,numRecs.AsspDataObj(au) — 1, 10) / rate. AsspDataObj(au), auSaudiofe(TRUE, rep(FALSE,9))}, type="l, xlab='time (5), ‘ylab='Audio samples’) deeb = = = Cutputotaudo tle using Wassp package Calculate Formant and Bandwidth : In the initial part of the tutorial we talked about frequency components — pitch, formant and bandwidth, Let's compute the formant and bandwidth with wrassp. # calculate formants and corresponding bandwidth values _fmBwVals = forest(fileaudio_path, toFile-F) fnBwVals © fmpwvats List of 2 fm: int [1:930, 1:3] 755 811 803 77 58 5657 64 900 974 ... bw: int [1:930, 1:3] 372 381 445 120 56 52 69 121 216 328 ... attr(s, “trackFormats")= chr [1:2] “INTI6" “INTI6" attr(s, “sampleRate")= nun 200 attr(#, “origFreq")= num 4000 attr(*, "startTime")= nun 0.0025 attr(™, "startRecord")= int 1 attr(s, "endRecord”)= int 930 attr(s, chr “AsspDatadb}" attr(s, int (1:2) 20 2 Formantand dondwth # plot the first 100 P1 values over time: plot(fmBwVals$fin[1:100, 1] type="l’) # plot all the formant values matplot(seq(0,numRecs. AsspDataObj(fmBwVals) — 1)/ rate.AsspDataObj(fmBwVals) + atir(fmBwVals,‘startTime’), Forman Frequencies vs time #plot Bandwidth plot(fmBwvalssbw) plot the formant plot(fmBwValssfm) 1500 000 mwas) 500 o 200 400 600 300 1000 Balint} Advanced Functions: There are a lot of advanced functions that can be explored such as ftSepectrum, RMS energy contour (rmsana), acfana, rfcana etc to add as a feature to your model. Feature Set Development Lets use both tuneR and wrassp R-packages and develop an initial set of features for our audio signal. extract_audio_features <- function(s) { #tuneR tr: readWave(x) # load file aprint(t@left) ar <- read.AsspDataObj(x) #File Name frame <-file_path_sans_ext(basename(x)) #add Feature Number of Samples rnum_samples <- numRecs. AsspDataObj(ar) # calculate formants and corresponding bandwidth values _fmBwVals < forest(x toFile-F) “finals <. fmBwValssfm bwVals <. fmBwWalssbw #add Feature Sample Rate sample_rate <- tr@samp.rate left= tr@left left range_audio = range(tr@left) #add Feature min_amplitude_range min_range =range_audio[1] #add Feature min_amplitude_range max_range =range_audio[2] normvalues-lefi/2*(tr@bit -1) rnormal_range < range(normvalues) #add Feature normalized_min_amplitude_range normal_min_ampl_range <- normal_rangef1] #add Feature normalized_min_amplitude_range normal_max_amplrange <- normal_range[2] mylist < c(fname=fname,num_samples=num_samples,sample_rate=sample_rate, ‘min_vange-min_range, max_range=max_range, normal_min_ampl_range-normal_min_ampl range, normal_max_ampl range-normal_max_ampl_range,finVals=fmVals,bwVals= bwvals) return(as.data.frame(mylist)) Q Sexren Mesto wite . {file_audio_path <‘./audio_file.wav’ output = extract_audio_features(file_audio_path) head(output,10) ed Preeer ttt Fr} Pou ety Te SoTL ae Uae Poot reee OS merc’ aN MET PONCE ray Note, I have used heart beat audio file for this tutorial from kagele. Finally, use this data for any processing you might need to do. Ihave shared this code and audio file in my github account. let me know your thoughts. Author's Note ‘Taposh Dutta Roy, leads Innovation Team of KPInsight at Kaiser Permanente. These are his thoughts based on his personal research, These thoughts and recommendations are not of Kaiser Permanente and Kaiser Permanente is not responsible for the content. Ifyou have questions Mr. Dutta Roy can be reached via linkedin. References Heartbeat Sounds Download Open Datasets on 1000s of Projects + Share Projects on ‘One Platform. Explore Popular Topics" ~ * https:/lcran.+-project.org/web/packages/seewave/vignettes/seewave_10.pdf Individual sound files for each: ‘warbleR function) Atiend of mine wants to“ereate individual sound files for each ‘selection’ ina selection table, This isa good. ction orhow te createa Disclaimer 1: This articles only an introduction to MECC features ‘and le meant for those in need far an easy and |httpuswww.cs.toronto.edu/~gpenn/csc401/soundASR pdf https:/theproaudiofiles.com/understanding-soun (What are the ifforences between audio and sound? Answer (1of):In physics, sound (no'se, note din, racket, ow bang, report hubbub, resonance, reverberation) is ‘Sound aa Mechanical Wave ‘A sound wave isa mechanical wave that propagates along or through a medium by partcl-to-partcl interaction, As a. httpsulworwnde: ss.org/EducationResources/HighSchool/Sound/components. him hitnsilansivorg/ndt/1207.5104.pdf ‘unsiimedium,com/prathena/the-dummys-guide-to-mfcc-aceab?4s0fd ‘https./towardsdatascience,com/setting-to-know-the-mel-spectrogram: ‘subeasezdoda ttpaltvwat practicaleryptography.com/miscellaneous/machine. lcamning/euide-mel-frequency-cepsiral-coefficients-mfcos/ ‘upsulblogs.rstudio.com/tensorflow/posts/2019.02-07-audio-backgrounds [http:/practicaleryptography.com/miscellancous/machine-Learning/guide- best Hep Tes Facy

You might also like