You are on page 1of 6

Image Pre-Processing

In computer aided systems preprocessing images is a valuable step since it improves the quality of
the original images by removing unrelated parts of the image. Preprocessing facilitates the
visibility of regions to be detected such as border detection. Image pre-processing is the term for
operations on images at the lowest level of abstraction. The aim of pre-processing is an
improvement of the image data that suppresses unwilling distortions or enhances some image
features important for further processing, although geometric transformations of images (e.g.,
rotation, scaling, translation) are classified among pre-processing methods here since similar
techniques are used.

Image Pre-Processing techniques

1) Pixel brightness Transformations


Brightness transformations modify pixel brightness and the transformation depends on the
properties of a pixel itself. In Pixel brightness Transformations, output pixel’s value depends only
on the corresponding input pixel value.
There are two types of Brightness transformations and they are below.
I. Brightness corrections
II. Gray scale transformation
The most common Pixel brightness transforms operations are
I. Gamma correction or Power Law Transform: - Gamma correction is a non-linear
adjustment to individual pixel values.
II. Sigmoid stretching: - Sigmoid function is a continuous nonlinear activation function. The
name, sigmoid, is obtained from the fact that the function is “S” shaped. Statisticians call
this function the logistic function.
III. Histogram equalization: - Histogram equalization is a well-known contrast enhancement
technique due to its performance on almost all types of images.
2) Geometric Transformations
Geometric transforms permit the elimination of geometric distortion that occurs when an image
is captured. The normal Geometric transformation operations are rotation, scaling and
distortion or un distortion! of images.
There are two basic steps in geometric transformations:
I. Spatial transformation of the physical rearrangement of pixels in the image
II. Grey level interpolation, which assigns grey levels to the transformed image change the
perspective of a given image or video for getting better insights about the required
information.
Interpolation Methods:
After the transformation methods, the new point co-ordinates (x’,y’) were obtained. The brightness
interpolation problem is usually expressed in a dual way.
Different types of Interpolation methods are
I. Nearest neighbor interpolation is the simplest technique that re samples the pixel values
present in the input vector or a matrix
II. Linear interpolation explores four points neighboring the point (x,y), and assumes that the
brightness function is linear in this neighborhood.
III. Bicubic interpolation improves the model of the brightness function by approximating it
locally by a bicubic polynomial surface.
3) Image Filtering and Segmentation
The goal of using filters is to modify or enhance image properties and/or to extract valuable
information from the pictures such as edges, corners, and blobs. Some of the basic filtering
techniques are
I. Low Pass Filtering (Smoothing): A low pass filter is the basis for most smoothing methods.
An image is smoothed by decreasing the disparity between pixel values by averaging
nearby pixels
II. High pass filters (Edge Detection, Sharpening): High-pass filter can be used to make an
image appear sharper.
III. Directional Filtering: Directional filter is an edge detector that can be used to compute the
first derivatives of an image.
IV. Laplacian Filtering: Laplacian filter is an edge detector used to compute the second
derivatives of an image, measuring the rate at which the first derivatives change.
Image Segmentation
Image segmentation is a commonly used technique in digital image processing and analysis to
partition an image into multiple parts or regions, often based on the characteristics of the pixels in
the image. Image segmentation could involve separating foreground from background, or
clustering regions of pixels based on similarities in color or shape.
Image Segmentation mainly used in: - Face detection, medical imaging, Machine vision,
Autonomous Driving
There are two types of image segmentation techniques.
1. Non-contextual thresholding: Thresholding is the simplest non-contextual segmentation
technique. With a single threshold, it transforms a greyscale or color image into a binary image
considered as a binary region map. The below are the types of thresholding techniques
I. Simple thresholding
II. Adaptive thresholding
III. Color thresholding
2. Contextual segmentation: Contextual segmentation can be more successful in separating
individual objects because it accounts for closeness of pixels that belong to an individual
object. The below are the types of Contextual segmentation.
A. Pixel connectivity
B. Region similarity
C. Region growing
D. Split-and-merge segmentation
4) Fourier Transform

The Fourier Transform is an important image processing tool which is used to decompose an image
into its sine and cosine components. The output of the transformation represents the image in the
Fourier or frequency domain, while the input image is the spatial domain equivalent. In the Fourier
domain image, each point represents a particular frequency contained in the spatial domain image.
The Fourier Transform is used in a wide range of applications, such as image analysis, image
filtering, image reconstruction and image compression. The Discrete Fourier Transform is the
sampled Fourier Transform and therefore does not contain all frequencies forming an image, but
only a set of samples which is large enough to fully describe the spatial domain image. The number
of frequencies corresponds to the number of pixels in the spatial domain image.

i.e., the image in the spatial and Fourier domain are of the same size: For a square image of size
N×N
Audio Pre-processing
Many real-world applications rely on machine learning. One of the important applications of
machine learning is audio processing. Audio processing aims to extract meaningful information
(descriptions or explanations) from audio, such as the type of a sound event, the content of a speech
or the artist of music. Preprocessing audio data includes tasks like resampling audio files to a consistent
sample rate, removing regions of silence, and trimming audio to a consistent duration. Audio is highly
dimensional and contains redundant and often unnecessary information. Historically, mel-frequency
cepstral coefficients and low-level features, such as the zero-crossing rate and spectral shape descriptors,
have been the dominant features derived from audio signals for use in machine learning systems. Machine
learning systems trained on these features are computationally efficient and typically require less training
data.
In development of an automatic sound recognition system, preprocessing is considered the first
phase of other phases in speech recognition to differentiate the voiced or unvoiced signal and
create feature vectors. Preprocessing adjusts or modifies the audio signal, x(n), so that it will be
more acceptable for feature extraction analysis. The major factor to consider when it comes to
audio signal processing is to check the speech, x(n) if is corrupted by some background or
ambient noise, d(n), for example as additive disturbance.

Where s(n) is the clean speech signal. In noise reduction, there are different methods that can be
adopted to perform the task on a noisy speech signal.

Techniques of pre-processing of audio


1) Normalization
A technique used to adjust the volume of audio files to a standard set level; if this isn’t done, the
volume can differ greatly from word to word, and the file can end up unable to be processed clearly.
At this stage, the aim of normalization of energy is to normalize the speech energy E(l). The
normalization of energy is performed by finding the maximum energy value Emax over the spoken
words as:
2) Pre-emphasis

Pre-emphasis is done before starting with feature extraction. We do this by boosting only the
signal’s high-frequency components, while leaving the low-frequency components in their original
states. This is done in order to compensate the high-frequency section, which is suppressed naturally
when humans make sounds. A spoken audio signal may have frequency components that fall off
at high frequencies. High frequency components are emphasized and low frequency components
are attenuated. This is quite a standard preprocessing step. By pre-emphasis, we imply the
application of a high pass filter, which is usually a first-order FIR of the form

Normally, a single coefficient filter digital filter known as pre-emphasis filter is used:

3) Background/Ambient Noise Removal


Ambient noise is any signal other than the signal being monitored. It is a form of noise pollution
or interference. As a matter of fact, background noise is an important concept in setting noise levels
in automatic sound recognition systems. The performance measure of speech recognition systems
degrades drastically when training and testing data are carried out with different noise levels.
Signal-to-Noise Ratio (SNR) is the ratio of the power of the correct signal to the noise. SNR is
usually measured in decibels (dB).

Where Vsignal is the voltage of correct signal, Vnoise is the voltage of the noise. Background or
ambient noise is normally produced by sounds of air conditioning system, fans, fluorescent lamps,
type writers, computer systems, back conversation, footsteps, traffic noise, alarms, bird’s noise,
opening and closing of doors. The filter adopted to remove the background or ambient noise is as
follows:

Where, the Es is log energy of block of N samples and ϵ is a small positive constant added to
prevent the computing of log zero. S(n) be the nth speech sample in the block of N samples.
4) Voice Activity Detection /Speech Word Detection

The major issue of getting or locating the endpoints of a signal in an audio is a main problem for
the speech recognizer. Inaccurate endpoint detection will decrease the performance of the speech
recognizer. However, in detecting endpoints of a speech utterance, it seems to be relatively trivial,
and has been found to be very difficult in practice in speech recognition systems. When a proper
SNR is given, the work of developing automatic sound recognition system is made easier.

5) Framing or Frame Blocking

Framing is the process of breaking the continuous stream of audio samples into components of
constant length to facilitate block-wise processing of the signal. In the same vein, speech can be
thought of been a quasi-stationary signal and is stationary only for a short period of time. This
simply means that the signal is divided or blocked in to frames of typically 20-30 msec. In this
aspect, adjacent frames normally overlap each other with 30-50%, this is done in order not to lose
any vital information of the speech signal due to the windowing.

6) Windowing

At this stage the signal has been framed into segments, each frame is multiplied with a window
function w(n) with length N, where N is the length of the frame. Windowing is the process of
multiplying a waveform of speech signal segment by a time window of given shape, to stress pre-
defined characteristics of the signal. To reduce the discontinuity of Audio signal at the beginning
and end of each frame, the signal should be tapered to zero or close to zero, and hence minimize
the mismatch. Moreover, this can be arrived at by windowing each frame of the signal to increase
the correlation of the Mel Frequency Cepstrum Coefficients (MFCC) and spectral estimates
between consecutive frames.

You might also like