You are on page 1of 1

During my PhD I got a chance to work on text data which is known to be unstructured.

It required
cleaning, formatting etc. so that it could be consumed by the algorithms. Data cleaning was done
creating regex as per requirement and some basic python functions. Removal of html tags and stop
words, conversion of text in one case, removal to punctuations and apostrophe’s were a few activities
performed while cleaning the data.

You might also like