You are on page 1of 2

SWARM AND EVOLUTIONARY COMPUTING

CO-423

Title: Denoising Dirty Documents using Genetic Algorithm

Team Members:
1. Harshit Jindal (2K17/CO/133)
2. Ayush Malik (2K17/CO/087)

Topic: Image Segmentation

Objective:
Given the dataset of dirty Document images involving various type of noise like
coffee stains/bleed-through, folding marks, wrinkles, etc. Our aim is to remove noise
from Dirty Documents and binarize them using Genetic Algorithms. Noise illustrate
undesirable data, which break down the picture quality. The documents, especially
historical manuscripts, often suffer from degradation due to coffee/ink stains, bleed-
through, faded ink, folding marks, wrinkles, etc. which makes the optical character
recognition a challenging task. Document image binarization, the task of classifying a
pixel as foreground or background, is an important pre-processing step in document
image analysis. Foreground subdivision contains texts in documents represented by
black pixels having low intensity whereas document background segment is
represented by white pixels of high intensity.

Idea:
Thresholding is a type of image segmentation, where we change the pixels of an
image to make the image easier to analyse. In thresholding, we convert an image
from colour or grayscale into a binary image, i.e., one that is simply black and white.
Classical Thresholding techniques are not good enough to solve the situation of
foreground and background crossing in grey level statistics. We aim to use genetic
algorithms i.e. (Fitness Evaluation, Selection, Mutation, Crossover) in order to
binarize the document image and remove noises. The genetic algorithm selects the
optimum threshold value for binarization. The proposed algorithm will be tested on
a set of images and the results will be compared with classical thresholding
techniques using various evaluation metrics like PSNR, MSE, etc. The algorithm is not
restricted to document images only but also generalises well to other images.

You might also like