Professional Documents
Culture Documents
Abstract - Computerized detection of voice disorders have patient if located in a remote setting where the availability and
attracted considerable academic and clinical interest in the hope spread of doctors is very limited. Having a simple Mobile
of providing an effective screening method for voice diseases diagnostic tool to detect the voice condition before endoscopic
before endoscopic confirmation. The goal of this paper is to apply confirmation makes the treatment very patient friendly. The goal
neural networks and Machin learning techniques to detect of the paper is to develop mobile diagnostic voice disorder App.
pathological voice and classify three disordered categories from
acoustic waveforms collected on the mobile phone. From a health In order to perform audio classification, we need to extract
science perspective, a pathological status of the human voice can the most appropriate and informative acoustic parameters.
substantially reduce quality of life and occupational performance, Traditionally pitch, jitter and shimmer have been used for this
which results in considerable costs for both the patient and society. purpose. In recent years, Mel Frequency Cepstral Coefficient
(MFCC) has gained popularity as a successful parameter for
The paper summarizes the various techniques and feature audio classification. We are using MFCC [1] to classify between
engineering processes that we have applied for the Voice Data healthy and pathological patient audio.
collected for classification of voice disorders. We have used Mel
scaled spectrograms and MFCC components as audio features to
train various Neural Network Architectures. We have trained a 5-
layer plain network, 5-layer CNN and RNN. We discuss the
challenges faced and solutions to improve model performance,
model parameter tuning and model evaluation.
1 2
THE PROBLEM OF LIMITED HEALTH CARE Chennai doctors help patients find their voice -
COVERAGE VOICE DISORDER TREATMENT https://timesofindia.indiatimes.com/city/chennai/Chennai-
[TRANSCRIPT] - https://blog.asha.org/the-problem-of- doctors-help-patients-find-their-
limited-health-care-coverage-voice-disorder-treatment- voice/articleshow/16886456.cms
transcript/
341
logspec = logspec.T.flatten()[:, np.newaxis].T # Transposed list A. Neural Network with a 5-layer architecture
of MFCCs We started with a plain Neural Network with a 5-layer
# print('logspec final:', len(logspec))
architecture [9]. The tuning of the hyper parameters of the model
feature.extend(logspec)
improved the accuracy by 10%. Each parameter value was tuned
melspec = librosa.feature.melspectrogram(signal, n_mels = with the following values.
bands)
• regularization_rate = 0.1, 0.01, 0.001
logspec = librosa.amplitude_to_db(melspec)
logspec = logspec.T.flatten()[:, np.newaxis].T • activation = 'tanh', 'relu'
feature.extend(logspec) • Number of hidden nodes = 64, 40, 100
342
# Layer 2 - Convolution with 48 filters + Maxpooling Parameters Accuracy ROC
Convolution_filter_size 2x2 76% 0.939
model.add(Convolution2D(64, (filter_size, filter_size)))
Convolution_filter_size 3x3 82% 0.959
model.add(MaxPooling2D(pool_size=(2, 2)))
Max_Pooling_filter_size 2x2 84% 0.976
model.add(LeakyReLU(alpha=0.01))
Learning rate = 0.01 69% 0.918
# Layer 3 - Convolution with 24 filters + Maxpooling SGD_momentum = 0.9 90% 0.986
model.add(Convolution2D(64, (filter_size, filter_size), Adam optimizer 93% 0.994
border_mode='valid'))
Tanh for all layers 94% 0.994
model.add(LeakyReLU(alpha=0.01))
Tanh for1st layer & relu for rest of 94% 0.996
layers
2)Parameters Tanh for1st layer & LeakyReLU for 93% 0.99
rest of layers
Below are the parameter values experimented:
343
a result, LSTM can capture long term dependencies in a 1)Leaky ReLU
sequence than RNN.
Leaky ReLU is used to solve dying ReLU problem. Leaky
The architecture used for LSTM, is a 4-layer network. We
ReLU function is nothing but an improved version of the ReLU
used the below parameters to tune our RNN to give an accuracy
function. The leak helps to increase the range of the ReLU
function. Usually, the value is 0.01 or so. When it is not 0.01
then it is called Randomized ReLU. Therefore, the range of the
Leaky ReLU is (-infinity to infinity). Leaky ReLU functions are
monotonic in nature. Also, their derivatives also monotonic in
nature.
As we know that for the ReLU function, the gradient is 0
for x<0, which made the neurons die for activations in that
region. Leaky ReLU is defined to address this problem. Instead
of defining the ReLU function as 0 for x less than 0, we define
it as a small linear component of x. The main advantage of this
is to remove the zero gradient. So, in this case the gradient of
the negative values is non-zero and so we would no longer
encounter dead neurons in that region. Also this speeds up the
of 90%: training process of the neural network.
• recurrent_dropout = 0.35
• dropout = 0.5 However, in case of a parameterized ReLU function, y = ax,
• optimizer = Adam when x<0. The network learns the parameter value of ‘a’ for
• learning rate = 0.001 faster and more optimum convergence. The parametrized ReLU
function is used when the leaky ReLU function still fails to
We achieved Sensitivity: 8% & Specificity: 97.7% on the RNN solve the problem of dead neurons and the relevant information
prediction evaluation. CNN performed better than RNN with is not successfully passed to the next layer.
respect to these metrics.
TABLE IV. MODEL EVALUATION RESULTS
Code:
NETWORK ARCHITECTURES SENSITIVITY SPECIFICITY
model = Sequential()
model.add(LSTM(units = 128, return_sequences=True,
recurrent_dropout=0.35, input_shape=(bands, frames))) 5-LAYER PLAIN NEURAL 12% 96%
model.add(Dropout(0.5)) NETWORK
model.add(LSTM(units = 128, return_sequences=True,
recurrent_dropout=0.35))
model.add(Dropout(0.5)) 5-LAYER CNN 96% 18%
model.add(LSTM(units = 128, return_sequences=True,
recurrent_dropout=0.35))
model.add(Dropout(0.5)) 4-LAYER RNN 8% 97.7%
model.add(Flatten())
model.add(Dense(num_labels, activation='softmax'))
# sgd = SGD(lr=0.001, momentum=0.9, decay=0.0,
nesterov=False) D. Architecture Deployment
adam = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-
08, decay=0.0) The 5-Layer CNN is deployed into Mobile Application by
model.compile(loss='categorical_crossentropy', Freezing the model, Convert into TFLite and deployed.
metrics=['accuracy'], optimizer=adam)
344
</configuration> ACKNOWLEDGMENT
</facet> We sincerely thank you to the team in Far Eastern Memorial
</component>
<component name="NewModuleRootManager"
Hospital (FEMH) for providing valuable voice data without the
LANGUAGE_LEVEL="JDK_1_8" inherit-compiler-output="true"> development of Neural Network is impossible. We acknowledge
<exclude-output /> and sincerely credit the support of FEMH. Additionally, we
<content url="file://$MODULE_DIR$"> would thank the management of Hanumayamma Innovations
<excludeFolder url="file://$MODULE_DIR$/.gradle" /> and Technologies, Inc., for active support they provided in
</content> helping and providing resources needed to work on the
<orderEntry type="inheritedJdk" /> challenge.
<orderEntry type="sourceFolder" forTests="false" />
</component> REFERENCES
</module>
345