Professional Documents
Culture Documents
Assigoment Noy
Q.l What are 7 practices of text anolytics
i) Search and information retneval
Shorage& retieval of text documerts, includ ing
Search engines 6 keyunord Rarth.
nDacument cluuteing:
GoupingG Categon2ing terMs Shipets, paragaphs
documents ing data mining clustering methods..
i) Dacumeot dasification:
Grauping 6 cate gonzing Snippets, paragraphs a
documents using dta mining classification methods,
based on models +rained on la beled examples.
) Information extraction
ldenti ficahion and ertraction af releuant facts &relaiansh
ips fron unstructured texti the process af making.
gtructured data from unstructured Semistruc tured text.
-eaor Reduce conds to their root or, Reduce sords to the ir camnial
obaye form ofarm boed an their meaning.
iLess accurate as it mag resulk Klore accurate a it onsiders
in on- dichonary uords. he drctionary form of worcs.
of text analgics.
Text oining, alsa knoon as text analutics or text data
mining s the nrocess of exitacting ueful insights 6info
Arom unstructured textual data. tinuglves techniq ue from.
NLPML nd computatonal inguistlcs to analyze &
henteroret large Nolumes af text.
FINOLEX ACADEMY OF MANAGEMENT AND TECHNOLOGY, RATNAGIRI
IPracicp
a)
areog f text anal ysis
Search
Ihis qrRa
and Infor nation Retneval:
ef frienty inuolves Stonng &retrieuing text doc.
Search queriesoftenl+ focuses
wing Search engines 6keyword - baued
on matching uer queries coíth.
elevant documents ftom large
pages databases ar ollectisnsSuch a cweb
doc.repostne
b) Document Custeing:
Document clurtenng
coc ar Snippets based oninuolue grouping & ategaizing text
mining dastesnq_methods thelr similarity Hempbys data
such as k-medns
hierachiml clusteing o
clusters, whichclustenng to
Can aid inarganTe doc. into Tneaningful
dotument organi zotion, and exploratots
topic disco
data analysis
Uery.
QDocument classfiation:
Dacument classifiation is the prscess at
assigning predefined catecgoHes orlcbels to automatially
Snippets or parographs It utilizes data miningtextdocuments.
methods often based on ML alg. trained gn classfiation
eKAmples,to classify doc:into relevant categorie labeled
Commonly ed in Spam detectionDens cote qarizatiOnHs6
sentiment analysis.
L) kleb ining
Web mining focuses an extacting knawledge <insiqhts
from data available on the WNANt enompases text&
Ldota miningtechnique taiored far the unique characte nstCS
f eb data,such a the Scale interconDectedness &
heterogeneity oft weh resOurces. Web mining includes tasks
TECHNOLOGY, RATNAGIRI
MANAGEMENT AND
FINOLEX ACADEMY OF
b) Document Clustenng
Document cluttening involue qrouping G ategaizing text
coc ar Snippets based cn thelr simlaits. Hempbys data
mining dustedng methods such as k-mecns clste ring
hierarchial clustenng to arqanize doc. into neaningfu
clusters. which can ald in explortots data ana lysis
document organ 2tton, and topic discoVery.
Document clossification:
Document classifiation is the process at autamatially
assigning predefined ategories or lcbels to text documerts.
Shippets or farographs |t utHlzes data mining classificatin
methods, often based Gn ML oaloo. traned onabeled
eKOTDples. to classify doc. into releuant categorie Its
Commonly used in spam detection, neus catego rization 6
9entiment analysis.
L) Web ining'
Webmining focuses an extacting knaaledge <ainsighis
Arom data auailable on the WAtenompasses textR
dota miningtechniqug tailored tur the unique characte istics
&
of oeb dataSuch a the Scale interconnectedness
heterogeneity of weh resources. Web mining indude tasks
L FINOLEX ACADEMY OF MANAGEMENT AND
TECHNOLOGY, RATNAGIRI
TE cot)
TECat", Docu ment )= ly =o-25
TE C"Cat" Document): o
TF ("CatDoCument 3)=/? s0:)43
)Hybnd Approache
Combine multiple tehnique to improve acuracs
for example, combininglexican -bayed methodi ith
mochine learning algothms.
Q9 List and Exphin Steps In text analysis.
Stacel: Data gathenng
In this Stage. g gather text data from nternal ar
ekterna Sources
Internadata:
Internal dota s text content that is interhal ta qour
buuiness is eadils auailable
eq. emnails chats iVoíces.
EANT
FINOLEX ACADEMY OF
MANAGEMENT AND TECHNOLOGY, RATNAGIRI
External dato:
\oa can nd externa doto in source Such a)
media posts online reviewS news articles. t isSscial
harder to acouire externa) data becawe it is beyond
your control. You might need to we web Scraping tools
| Itegtate ith rd party Solutons to extroct erternab
data.
Tokenizotian:
Tokenization s segrogatieng the ras text into multiple
ports that makes Semanhe Sense. forq the ph rale text
analuhics benefits business tokenize tte to the uord
textanalytics benefits &businesses
Part-of-Speech tagging
Part- of -Speech taggieg assigns grammatical tacs to the
tokenized text for eg applying this Step to preuiQusly
ekntioed tokens results in tert : Naun: analytics: Noun
benefit Verb buiness :Noun
Parstna:
Porsig
Porsing estobl ishe neaningtal connechians beth the tokenized
wOrde eoth English gra mmex. Hhe lps the text anglysis
SftuareNisualize the relation ship beth. words.
SSMAL
Lenmahization
lemmatization iS a linguistHc process Ahat Simplifies rd
Into thelr dictisngry form cr len na. eg
eq the dichonarg
form of visualiz1ng is Visuglize.
Shop ord removal:
Stop oords are words that offer ittle or no.Serrortic ortext
of a setence such a ond 1or for. Pepending on the sse
Cae the Softoare might remoue them from the Strutured text.
-Shage 3: Text anolysis
Text analysis s the Core patof tte ptocess in uhich text
analusrs Softoare processeA the text be ausng di£f.
method
Text classfiation:
Classification is the process Of assqning tags to the text
Oatatht qre based mrlle n ML bayedSysems.
Text extraction
Extocon inuohe ldentifying the presence of Specific
keyaords in he text 6 associ ging tbem ith toas.
Shage L Visuolzattonid