You are on page 1of 17

LAB - 1

Name: Paayas P
Roll No: CH.EN.U4AIE20046
Subject: 21AIE312 - Deep Learning for Signal and Image Processing

1. Investigate and understand the working functionalities of Deep learning


libraries and toolbox used in python/matlab.
With the recent push in the AI business, machine learning and deep learning have
become more popular, and the early adopters of this technology are starting to reap the benefits.
Several programming languages can be used while working with AI, ML and DL. The go-to
choice for Machine learning, Artificial Intelligence developers for a long time is Python due to
its flexibility and features.
Hence the best Python libraries for Deep learning are:
1. Tensorflow: Tensorflow is a fast, flexible and scalable open-source machine learning library
for research and production. It is one of the best library available for working with ML, DL on
python. Offered by Google, TensorFlow makes ML model building easy for beginners and
professionals alike.
Using TensorFlow, you can create and train ML models on not just computers but also mobile
devices and servers by using TensorFlow Lite and TensorFlow Serving that offer the same
benefits but for mobile platforms and high-performance servers.
Some of the essential areas in ML and DL where TensorFlow shines are:
● Handling deep neural networks
● Natural Language Processing
● Partial Differential Equation
● Abstraction capabilities
● Image, Text, and Speech recognition
● Effortless collaboration of ideas and code
Core Task: Build Deep Learning models

1|Page
2. Keras: It is one of the most popular and open-source neural network libraries for Python.
Initially designed by a Google engineer for ONEIROS, short for Open-Ended Neuro Electronic
Intelligent Robot Operating System, Keras was soon supported in TensorFlow‘s core library
making it accessible on top of TensorFlow. Keras features several of the building blocks and
tools necessary for creating a neural network such as:
● Neural Layers
● Activation and cost functions
● Objectives
● Batch Normalization
● Dropout
● Pooling
Keras extends the usability of TensorFlow with these additional features for ML and DL
programming. With a helpful community and a dedicated Slack channel, getting support is easy.
Support for the convolutional and recurrent neural network also exists along with standard neural
networks.
Core Task: Build Deep Learning models
3. PyTorch: Developed by Facebook, PyTorch is one of the few machine learning libraries for
Python. Apart from Python, PyTorch also has support for C++ with its C++ interface if you‘re
into that. Considered among the top contenders in the race of being the best Machine Learning
and Deep Learning framework, PyTorch faces touch competition from TensorFlow. Some of the
vital features that set PyTorch apart from TensorFlow are:
● Tensor computing with the ability for accelerated processing via Graphics Processing
Units
● Easy to learn, use and integrate with the rest of the Python ecosystem
● Support for neural networks built on a tape-based auto diff system
The various modules PyTorch comes with, that help create and train neural networks:
● Tensors — torch.Tensor
● Optimizers — torch.optim module
● Neural Networks — nn module
● Autograd
Core task: Developing and training deep learning models.

2|Page
4. Scikit-learn: Scikit-learn is another actively used machine learning library for Python. It
includes easy integration with different ML programming libraries like NumPy and Pandas.
Scikit-learn comes with the support of various algorithms such as:
● Classification
● Regression
● Clustering
● Dimensionality Reduction
● Model Selection
● Preprocessing
Built around the idea of being easy to use but still be flexible, Scikit-learn is focussed on data
modeling and not on other tasks such as loading, handling, manipulation and visualization of
data. It is considered sufficient enough to be used as an end-to-end ML, from the research phase
to the deployment.
Core Task: Modeling
5. Pandas: Pandas is a Python data analysis library and is used primarily for data manipulation
and analysis. It comes into play before the dataset is prepared for training. Pandas make working
with time series and structured multidimensional data effortless for machine-learning
programmers. Some of the great features of Pandas when it comes to handling data are:
● Dataset reshaping and pivoting
● Merging and joining of datasets
● Handling of missing data and data alignment
● Various indexing options such as Hierarchical axis indexing, Fancy indexing
● Data filtration options
Pandas make use of DataFrames, which is just a technical term for a two-dimensional
representation of data by offering programmers with DataFrame objects.
Core task: Data manipulation and analysis
6. NLTK: NLTK stands for Natural Language Toolkit and is a Python library for working with
natural language processing. It is considered as one of the most popular libraries to work with
human language data. NLTK offers simple interfaces along with a wide array of lexical
resources such as FrameNet, WordNet, Word2Vec and several others to programmers. Some of
the highlights of NLTK are:

3|Page
● Searching keywords in documents
● Tokenization and classification of texts
● Recognition on voice and handwriting
● Lemmatization and Stemming of words
NLTK and its suite of packages are considered a reliable choice for students, engineers,
researchers, linguists and industries that work with language.
Core Task: Text processing
7. Numpy: The NumPy library for Python concentrates on handling extensive multi-dimensional
data and the intricate mathematical functions operating on the data. NumPy offers speedy
computation and execution of complicated functions working on arrays. Few of the points in
favor of NumPy are:
● Support for mathematical and logical operations
● Shape manipulation
● Sorting and Selecting capabilities
● Discrete Fourier transformations
● Basic linear algebra and statistical operations
● Random simulations
● Support for n-dimensional arrays
NumPy works on an object-oriented approach and has tools for integrating C, C++ and Fortran
code, and this makes NumPy highly popular amongst the scientific community.
Core task: Data cleaning and manipulation

4|Page
2. Investigation on publicly available dataset for medical image classification,
segmentation and natural image classification problems (minimum 10 dataset
information in each domain).
Datasets for Medical Image Classification:
1. Chest X-Ray Images (Pneumonia): The dataset is organized into 3 folders (train, test, val)
and contains subfolders for each image category (Pneumonia/Normal). There are 5,863 X-Ray
images (JPEG) and 2 categories (Pneumonia/Normal).
Chest X-ray images (anterior-posterior) were selected from retrospective cohorts of pediatric
patients of one to five years old from Guangzhou Women and Children‘s Medical Center,
Guangzhou. All chest X-ray imaging was performed as part of patients‘ routine clinical care.
For the analysis of chest x-ray images, all chest radiographs were initially screened for quality
control by removing all low quality or unreadable scans. The diagnoses for the images were then
graded by two expert physicians before being cleared for training the AI system. In order to
account for any grading errors, the evaluation set was also checked by a third expert.
Reference: https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia
2. Blood Cell Images: This dataset contains 12,500 augmented images of blood cells (JPEG)
with accompanying cell type labels (CSV). There are approximately 3,000 images for each of 4
different cell types grouped into 4 different folders (according to cell type). The cell types are
Eosinophil, Lymphocyte, Monocyte, and Neutrophil. This dataset is accompanied by an
additional dataset containing the original 410 images (pre-augmentation) as well as two
additional subtype labels (WBC vs WBC) and also bounding boxes for each cell in each of these
410 images (JPEG + XML metadata). More specifically, the folder 'dataset-master' contains 410
images of blood cells with subtype labels and bounding boxes (JPEG + XML), while the folder
'dataset2-master' contains 2,500 augmented images as well as 4 additional subtype labels (JPEG
+ CSV). There are approximately 3,000 augmented images for each class of the 4 classes as
compared to 88, 33, 21, and 207 images of each in the folder 'dataset-master'.
Reference: https://www.kaggle.com/datasets/paultimothymooney/blood-cells
3. COVID-19 Radiography Database: A team of researchers from Qatar University, Doha,
Qatar, and the University of Dhaka, Bangladesh along with their collaborators from Pakistan and
Malaysia in collaboration with medical doctors have created a database of chest X-ray images for
COVID-19 positive cases along with Normal and Viral Pneumonia images. This COVID-19,

5|Page
normal and other lung infection dataset is released in stages. In the first release we have released
219 COVID-19, 1341 normal and 1345 viral pneumonia chest X-ray (CXR) images. In the first
update, we have increased the COVID-19 class to 1200 CXR images. In the 2nd update, we have
increased the database to 3616 COVID-19 positive cases along with 10,192 Normal, 6012 Lung
Opacity (Non-COVID lung infection) and 1345 Viral Pneumonia images. We will continue to
update this database as soon as we have new x-ray images for COVID-19 pneumonia patients.
Reference: https://www.kaggle.com/datasets/tawsifurrahman/covid19-radiography-database
4. Breast Ultrasound Images Dataset: Breast cancer is one of the most common causes of
death among women worldwide. Early detection helps in reducing the number of early deaths.
The data reviews the medical images of breast cancer using ultrasound scan. Breast Ultrasound
Dataset is categorized into three classes: normal, benign, and malignant images. Breast
ultrasound images can produce great results in classification, detection, and segmentation of
breast cancer when combined with machine learning. The data collected at baseline include
breast ultrasound images among women in ages between 25 and 75 years old. This data was
collected in 2018. The number of patients is 600 female patients. The dataset consists of 780
images with an average image size of 500*500 pixels. The images are in PNG format. The
ground truth images are presented with original images. The images are categorized into three
classes, which are normal, benign, and malignant.
Reference: https://www.kaggle.com/datasets/aryashah2k/breast-ultrasound-images-dataset
5. Retinal OCT Images (optical coherence tomography): The dataset is organized into 3
folders (train, test, val) and contains subfolders for each image category
(NORMAL,CNV,DME,DRUSEN). There are 84,495 X-Ray images (JPEG) and 4 categories
(NORMAL,CNV,DME,DRUSEN). Images are labeled as (disease)-(randomized patient ID)-
(image number by this patient) and split into 4 directories: CNV, DME, DRUSEN, and
NORMAL. Optical coherence tomography (OCT) images (Spectralis OCT, Heidelberg
Engineering, Germany) were selected from retrospective cohorts of adult patients from the
Shiley Eye Institute of the University of California San Diego, the California Retinal Research
Foundation, Medical Center Ophthalmology Associates, the Shanghai First People‘s Hospital,
and Beijing Tongren Eye Center between July 1, 2013 and March 1, 2017. Before training, each
image went through a tiered grading system consisting of multiple layers of trained graders of
increasing exper- tise for verification and correction of image labels. Each image imported into

6|Page
the database started with a label matching the most recent diagnosis of the patient. The first tier
of graders consisted of undergraduate and medical students who had taken and passed an OCT
interpretation course review. This first tier of graders conducted initial quality control and
excluded OCT images containing severe artifacts or significant image resolution reductions. The
second tier of graders consisted of four ophthalmologists who independently graded each image
that had passed the first tier. The presence or absence of choroidal neovascularization (active or
in the form of subretinal fibrosis), macular edema, drusen, and other pathologies visible on the
OCT scan were recorded. Finally, a third tier of two senior independent retinal specialists, each
with over 20 years of clinical retina experience, verified the true labels for each image. The
dataset selection and stratification process is displayed in a CONSORT-style diagram in Figure
2B. To account for human error in grading, a validation subset of 993 scans was graded
separately by two ophthalmologist graders, with disagreement in clinical labels arbitrated by a
senior retinal specialist.
Reference: https://www.kaggle.com/datasets/paultimothymooney/kermany2018
6. Ocular Disease Recognition: Ocular Disease Intelligent Recognition (ODIR) is a structured
ophthalmic database of 5,000 patients with age, color fundus photographs from left and right
eyes and doctors' diagnostic keywords from doctors. This dataset is meant to represent a ‗‗real-
life‘‘ set of patient information collected by Shanggong Medical Technology Co., Ltd. from
different hospitals/medical centers in China. In these institutions, fundus images are captured by
various cameras in the market, such as Canon, Zeiss and Kowa, resulting into varied image
resolutions. Annotations were labeled by trained human readers with quality control
management. They classify patient into eight labels including:
● Normal (N),
● Diabetes (D),
● Glaucoma (G),
● Cataract (C),
● Age related Macular Degeneration (A),
● Hypertension (H),
● Pathological Myopia (M),
● Other diseases/abnormalities (O)
Reference: https://www.kaggle.com/datasets/andrewmvd/ocular-disease-recognition-odir5k

7|Page
7. Leukemia Classification: Acute lymphoblastic leukemia (ALL) is the most common type of
childhood cancer and accounts for approximately 25% of the pediatric cancers. These cells have
been segmented from microscopic images and are representative of images in the real-world
because they contain some staining noise and illumination errors, although these errors have
largely been fixed in the course of acquisition. The task of identifying immature leukemic blasts
from normal cells under the microscope is challenging due to morphological similarity and thus
the ground truth labels were annotated by an expert oncologist. In total there are 15,135 images
from 118 patients with two labeled classes: Normal cell and Leukemia blast.
Reference: https://www.kaggle.com/datasets/andrewmvd/leukemia-classification
8. NIH Chest X-rays: Chest X-ray exams are one of the most frequent and cost-effective
medical imaging examinations available. However, clinical diagnosis of a chest X-ray can be
challenging and sometimes more difficult than diagnosis via chest CT imaging. The lack of large
publicly available datasets with annotations means it is still very difficult, if not impossible, to
achieve clinically relevant computer-aided detection and diagnosis (CAD) in real world medical
sites with chest X-rays. One major hurdle in creating large X-ray image datasets is the lack of
resources for labeling so many images. Prior to the release of this dataset, Openi was the largest
publicly available source of chest X-ray images with 4,143 images available. This NIH Chest X-
ray Dataset consists of 112,120 X-ray images with disease labels from 30,805 unique patients.
To create these labels, the authors used Natural Language Processing to text-mine disease
classifications from the associated radiological reports. The labels are expected to be >90%
accurate and suitable for weakly-supervised learning. The original radiology reports are not
publicly available but you can find more details on the labeling process in this Open Access
paper: "ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-
Supervised Classification and Localization of Common Thorax Diseases." (Wang et al.)
Reference: https://www.kaggle.com/datasets/nih-chest-xrays/data
9. Tuberculosis (TB) Chest X-ray Database: A team of researchers from Qatar University,
Doha, Qatar, and the University of Dhaka, Bangladesh along with their collaborators from
Malaysia in collaboration with medical doctors from Hamad Medical Corporation and
Bangladesh have created a database of chest X-ray images for Tuberculosis (TB) positive cases
along with Normal images. In our current release, there are 700 TB images publicly accessible

8|Page
and 2800 TB images can be downloaded from NIAID TB portal[3] by signing an agreement, and
3500 normal images.
Reference: https://www.kaggle.com/datasets/tawsifurrahman/tuberculosis-tb-chest-xray-dataset
10. Dermnet: The data consists of images of 23 types of skin diseases taken from
http://www.dermnet.com/dermatology-pictures-skin-disease-pictures. The total number of
images are around 19,500, out of which approximately 15,500 have been split in the training set
and the remaining in the test set.
Reference: https://www.kaggle.com/datasets/shubhamgoel27/dermnet
Datasets for Segmentation:
1. Liver Tumor Segmentation: Liver cancer is the fifth most commonly occurring cancer in
men and the ninth most commonly occurring cancer in women. There were over 840,000 new
cases in 2018. The liver is a common site of primary or secondary tumor development. Due to
their heterogeneous and diffusive shape, automatic segmentation of tumor lesions is very
challenging. In light of that, we encourage the development of automatic segmentation
algorithms to segment liver lesions in contrast--enhanced abdominal CT scans. The data and
segmentations are provided by various clinical sites around the world.
This dataset was extracted from LiTS – Liver Tumor Segmentation Challenge (LiTS17)
organized in conjunction with ISBI 2017 and MICCAI 2017.
Reference: https://www.kaggle.com/datasets/andrewmvd/liver-tumor-segmentation
2. Aerial Semantic Segmentation Drone Dataset: The Semantic Drone Dataset focuses on
semantic understanding of urban scenes for increasing the safety of autonomous drone flight and
landing procedures. The imagery depicts more than 20 houses from nadir (bird's eye) view
acquired at an altitude of 5 to 30 meters above ground. A high resolution camera was used to
acquire images at a size of 6000x4000px (24Mpx). The training set contains 400 publicly
available images and the test set is made up of 200 private images.
Reference: https://www.kaggle.com/datasets/bulentsiyah/semantic-drone-dataset
3. MRI Hippocampus Segmentation: Hippocampus segmentation on magnetic resonance
imaging is of key importance for the diagnosis, treatment decision and investigation of
neuropsychiatric disorders. Automatic segmentation is an active research field, with many recent
models using deep learning. The dataset has two directories ‗label‘ and ‗original‘, inside label

9|Page
there are 100 label various images kept in one directory and 35 label various images in other.
The overall dataset is about 167 MB.
Reference: https://www.kaggle.com/datasets/sabermalek/mrihs
4. Brain Tumor Segmentation(BraTS2020): All BraTS multimodal scans are available as
NIfTI files (.nii.gz) and describe a) native (T1) and b) post-contrast T1-weighted (T1Gd), c) T2-
weighted (T2), and d) T2 Fluid Attenuated Inversion Recovery (T2-FLAIR) volumes, and were
acquired with different clinical protocols and various scanners from multiple (n=19) institutions,
mentioned as data contributors here. All the imaging datasets have been segmented manually, by
one to four raters, following the same annotation protocol, and their annotations were approved
by experienced neuro-radiologists. Annotations comprise the GD-enhancing tumor (ET — label
4), the peritumoral edema (ED — label 2), and the necrotic and non-enhancing tumor core
(NCR/NET — label 1), as described both in the BraTS 2012-2013 TMI paper and in the latest
BraTS summarizing paper. The provided data are distributed after their pre-processing, i.e., co-
registered to the same anatomical template, interpolated to the same resolution (1 mm^3) and
skull-stripped.
Reference: https://www.kaggle.com/datasets/awsaf49/brats2020-training-data
5. People Clothing Segmentation: The dataset contains 1000 images and 1000 corresponding
semantic segmentation masks each of size 825 pixels by 550 pixels in PNG format. The
segmentation masks belong to 59 classes, the first being the background of individuals, and the
rest belong to 58 clothing classes such as shirt, hair, pants, skin, shoes, glasses and so on. A CSV
file containing the list of 59 classes is included in the dataset. The dataset contains data in both
JPEG formats and PNG formats. However, JPEG is found to be lossy, while PNG is lossless
with the essence of Originality.
Reference: https://www.kaggle.com/datasets/rajkumarl/people-clothing-segmentation
6. Semantic segmentation of aerial imagery: The dataset consists of aerial imagery of Dubai
obtained by MBRSC satellites and annotated with pixel-wise semantic segmentation in 6 classes.
The total volume of the dataset is 72 images grouped into 6 larger tiles. The classes are:
● Building: #3C1098
● Land (unpaved area): #8429F6
● Road: #6EC1E4
● Vegetation: #FEDD3A

10 | P a g e
● Water: #E2A929
● Unlabeled: #9B9B9B
Reference: https://www.kaggle.com/datasets/humansintheloop/semantic-segmentation-of-
aerial-imagery
7. Wine Customer Segmentation: These data are the results of a chemical analysis of wines
grown in the same region in Italy but derived from three different cultivars. The analysis
determined the quantities of 13 constituents found in each of the three types of wines.
Reference: https://www.kaggle.com/datasets/sadeghjalalian/wine-customer-segmentation
8. Segmentation Full Body MADS Dataset: A total of 1192 images under collages, images and
masks directory. The overall size of the dataset is 487 MB.
Reference: https://www.kaggle.com/datasets/tapakah68/segmentation-full-body-mads-dataset
9. Nails segmentation: This small dataset can be used as a start point for the nails segmentation
model. It has two directories containing the images and the labels. The size of the dataset is 27
MB.
Reference: https://www.kaggle.com/datasets/vpapenko/nails-segmentation
10. Semantic Segmentation for Self Driving Cars: This dataset provides data images and
labeled semantic segmentations captured via CARLA self-driving car simulator. The data was
generated as part of the Lyft Udacity Challenge . This dataset can be used to train ML algorithms
to identify semantic segmentation of cars, roads etc in an image. The data has 5 sets of 1000
images and corresponding labels.
Reference: https://www.kaggle.com/datasets/kumaresanmanickavelu/lyft-udacity-challenge
Datasets for Natural image classification:
1. Intel Image Classification: This Data contains around 25k images of size 150x150
distributed under 6 categories.
{'buildings' -> 0,
'forest' -> 1,
'glacier' -> 2,
'mountain' -> 3,
'sea' -> 4,
'street' -> 5 }

11 | P a g e
The Train, Test and Prediction data is separated in each zip files. There are around 14k images in
Train, 3k in Test and 7k in Prediction.
Reference: https://www.kaggle.com/datasets/puneet6060/intel-image-classification
2. English Handwritten Characters: This dataset contains 3,410 images of handwritten
characters in English. This is a classification dataset that can be used for Computer Vision tasks.
It contains 62 classes with 55 images of each class. The 62 classes are 0-9, A-Z and a-z.
Reference: https://www.kaggle.com/datasets/dhruvildave/english-handwritten-characters-dataset
3. Arthropod Taxonomy Orders Object Detection Dataset: The dataset consists of images of
arthropods in jpeg format and object boundary boxes in json format. There are between one and
50 objects per image. This dataset is actively maintained, and new orders will be added on a
regular basis. Currently, the following orders are covered with at least 2000 objects per order.
Reference: https://www.kaggle.com/datasets/mistag/arthropod-taxonomy-orders-object-
detection-dataset
4. NumtaDB: Bengali Handwritten Digits: The dataset is a combination of six datasets that
were gathered from different sources and at different times. However, each of them was checked
rigorously under the same evaluation criterion so that all digits were at least legible to one human
being without any prior knowledge. Descriptions of these datasets including collection
methodology, image segmentation and extraction and image formats of these datasets are
described in https://bengali.ai/datasets.The sources are labeled from 'a' to 'f'. The training and
testing sets have separate subsets depending on the source of the data (training-a, testing-a, etc.).
All the datasets have been partitioned into training and testing sets so that handwriting from the
same subject/contributor is not present in both. Dataset-f had no corresponding metadata for
contributors for which all of it was added to the testing set (testing-f). The metric for the
competition is selected to be the Unweighted Average Accuracy (UAA). Starter codes for the
competition are available at https://github.com/BengaliAI.
Reference: https://www.kaggle.com/datasets/BengaliAI/numta
5. Visual Question Answering- Computer Vision & NLP: VQA is a multimodal task wherein,
given an image and a natural language question related to the image, the objective is to produce a
natural language answer correctly as output. It involves understanding the content of the image
and correlating it with the context of the question asked. Because we need to compare the
semantics of information present in both of the modalities the image and natural language

12 | P a g e
question related to it, VQA entails a wide range of sub-problems in both CV and NLP (such as
object detection and recognition, scene classification, counting, and so on). Thus, it is considered
an AI-complete task. The size of the dataset is 431 MB.
Reference: https://www.kaggle.com/datasets/bhavikardeshna/visual-question-answering-
computer-vision-nlp
6. Beauty Classification- Beautiful or Average: The data set consists of a training set, a test
set, a validation set and a consolidated set. Each of these sets contains two folders labeled as
Beautiful and Average. For the training set there are 2000 images in the Beautiful folder and
2000 images in the Average folder. For the test set there are 150 images in the beautiful folder
and 150 images in the Average folder. The validation set is similarly divided into 150 images of
each type. The consolidated folder 2300 images in the Beautiful folder and 2300 images in the
Average folder. The consolidated folder combines the images from the training, test and
validation sets into a single set. This is convenient for users that want to create their own splits of
training, test and validation images. All images are 224 X 224 X 3 color jpg images. Images have
been cropped to only show the face. All data sets were run through a duplicate image detector
and all duplicate images were removed to prevent leakage between the data sets.
Reference: https://www.kaggle.com/datasets/gpiosenka/beauty-detection-data-set
7. Coffee Bean Dataset Resized: Roasted coffee beans have been roasted at JJ Mall Jatujak‘s,
―Bona Coffee.‖ There are four roasting levels. The green or un-roasted coffee beans are Laos
Typica Bolaven (Coffea arabica). Laos Typica Bolaven is a lightly roasted coffee bean (Coffea
arabica). Doi Chaang (Coffea Arabica) is medium roasted, whereas Brazil Cerrado is dark
roasted (Coffea Arabica). The coffee bean photos are captured with an IPhone12Mini with a 12-
megapixel back camera, Ultra-wide, and WideCamera. The camera is set at a location with a
plane parallel to the object‘s path when photographs are being captured. Images of roasted coffee
beans are captured in a variety of settings to validate a wide range of roasted coffee bean image
inputs. This experiment employs both LED light from a light box and natural light to shoot the
dataset; then the image‘s noise is enhanced by putting each variety of coffee beans in a container.
Images are automatically collected and saved in PNG format. Each example bean‘s image is
3024x3032 pixels in size. In each example, the bean‘s image is 3024x3032 pixels in size. There
are 4800 photos in total, classified in 4 degrees of roasting. There are 1200 photos under each
degree.

13 | P a g e
Reference: https://www.kaggle.com/datasets/gpiosenka/coffee-bean-dataset-resized-224-x-224
8. American Sign Language: The dataset contains coloured images of hand signs representing
different American sign language alphabets. The size of the dataset is 5 GB.
9. Scene Classification: This dataset contains about ~25k images from a wide range of natural
scenes from all around the world. The task is to identify which kind of scene can the image be
categorized into. The size of the dataset is 387 MB. It is a 6 class problem:
● Buildings
● Forests
● Mountains
● Glacier
● Street
● Sea
Reference: https://www.kaggle.com/datasets/ayuraj/asl-dataset
10. Metastatic Tissue Classification - PatchCamelyon: The PatchCamelyon benchmark
(PCAM) consists of 327.680 color images (96 x 96px) extracted from histopathologic scans of
lymph node sections. Each image is annotated with a binary label indicating presence of
metastatic tissue. Fundamental machine learning advancements are predominantly evaluated on
straight-forward natural-image classification datasets and medical imaging is becoming one of
the major applications of ML and thus deserves a spot on the list of go-to ML datasets. Both to
challenge future work, and to steer developments into directions that are beneficial for this
domain.
Reference: https://www.kaggle.com/datasets/andrewmvd/metastatic-tissue-classification-
patchcamelyon

14 | P a g e
3. Identify the research problems in signal and image classification domain
(minimum 5 research problem statements) – it may be from the medical
domain or in any other domains.
i) Brain tumor detection from MRI images using deep learning techniques
Healthcare sector is totally different from other industries. It is a high priority sector and
people expect the highest level of care and services regardless of cost. After the success of deep
learning in other real-world applications, it is also providing exciting solutions with good
accuracy for medical imaging and is a key method for future applications in the health sector.
Brain is an organ that controls activities of all the parts of the body. Recognition of automated
brain tumors in Magnetic resonance imaging (MRI) is a difficult task due to complexity of size
and location variability. In this research statistical analysis morphological and thresholding
techniques are proposed to process the images obtained by MRI for Tumor Detection from Brain
MRI Images. Feed-forward backprop neural network will be used to classify the performance of
tumors part of the image. The results produced by this approach will increase the accuracy and
reduce the number of iterations.
Reference: https://iopscience.iop.org/article/10.1088/1757-
899X/1055/1/012115#:~:text=In%20various%20research%20papers%2C%20the,the%20treatme
nt%20to%20the%20patients.
ii) Design a Deep Neural Network with Simulated Data to Detect WLAN Router
Impersonation
In order to mislead network users into connecting to it, a malicious agent will try to
impersonate a genuine router in a technique known as router impersonation. Simple digital
identifiers used in security identification systems, such MAC addresses, IP addresses, and SSID,
are ineffective at spotting such an assault. These identifications are simple to fake. As a result, a
more secure method includes additional data in addition to these straightforward digital IDs,
including the radio link's RF signature. At the receiver, a wireless transmitter-receiver pair
generates a distinctive RF signature that combines RF and channel impairments. The practice of
identifying transmitting radios in a shared spectrum using these fingerprints is known as RF
fingerprinting. The researchers created a deep learning (DL) network that recognises the
transmitting radio from raw baseband in-phase/quadrature (IQ) samples. If the RF impairments
are prevalent or the channel profile remains constant during the operation duration, the network

15 | P a g e
can identify the sending radios. When the receiver position is also fixed, most WLAN networks
have fixed routers that produce a static channel profile. By comparing the received signal's RF
fingerprint and MAC address pair to those of the recognised routers in this situation, the deep
learning network can spot router impersonators.
Reference: https://in.mathworks.com/help/comm/ug/design-a-deep-neural-network-with-
simulated-data-to-detect-wlan-router-impersonation.html
iii) Online Handwritten Signature Verification System
In order to extract dynamic features of a signature in addition to its structure, pressure-
sensitive tablets are used in online (dynamic) signature verification. Dynamic elements, which
make a signature more distinctive and challenging to forge, include the quantity and sequence of
strokes, the general pace of the signature, the pen pressure at each point, etc. Users are initially
enrolled in an online signature verification system by supplying sample signatures (reference
signatures). This test signature is compared to the reference signatures for that person when a
user delivers a signature (test signature) claiming to be that person. The user is disqualified if the
difference exceeds a predetermined level. The test signature is compared to every signature in
the reference set during verification, yielding a number of distance values. To determine the
degree to which the test signature differs from the reference set, one must select a method for
combining these distance values into a single number and comparing it to a threshold. The
minimum, maximum, or average of all the distances can be used to calculate the single
dissimilarity value. A verification system often picks one of these and ignores the others. The
FRR of genuine signatures and the FAR of forged signatures are two crucial variables in
determining how well a signature verification system performs. The EER where FAR equals
FRR is frequently reported because these two faults are inversely connected.
Reference:
https://www.diva-portal.org/smash/get/diva2:1177227/FULLTEXT01.pdf

16 | P a g e
iv) Face Recognition for Real Time Application
In today's high-risk environments, such as the military, organizations, and so forth,
protection and surveillance mechanisms are of critical importance. Face popularity in a
surveillance device is a crucial step for more thorough and accurate surveillance. Face
recognition has a high dimension space, which needs to be reduced using any number of ways.
After extracting face traits from every image in the database, the pattern recognition method then
tries to match them. As a result, feature extraction and pattern recognition are the two main
issues. To improve the system's overall recognition rate, all of the faces must be registered before
this image. These factors encourage researchers to look for novel approaches to all of these
issues and then integrate them to make a fully functional system with high accuracy. The
following is the problem statement for face recognition for real-time applications: Enhance the
Speed (frames/sec) and perform face recognition on high camera resolution to do face
recognition in real time.
Reference:
https://www.grin.com/document/380686#:~:text=The%20main%20problem%20of%20face,imag
es%20present%20in%20the%20database.
v) Real-time traffic sign recognition system
One of the most crucial background research areas for enabling autonomous vehicle
driving systems is traffic sign recognition (TSR). Intelligent and real-time situational analysis is
what autonomous driving systems need; there isn't time for elaborate transformations or
advanced image processing methods. Autonomous driving systems need to handle input data
differently. In a city-like setting, when there are many traffic signs, advertisements, parked cars,
pedestrians, and other moving or background things, it becomes more challenging to meet this
requirement. This work focuses on an effort to create a CNN model that can efficiently classify
traffic signs in real-time using OpenCV. The model was created with minimal computational
expense. This paper also contains an experiment where several parameter combinations are
modified to enhance the performance of the model.
Reference: https://www.researchgate.net/publication/224255280_Real-
time_traffic_sign_recognition_system

17 | P a g e

You might also like