You are on page 1of 4

Running Head: FIVE QUESTIONS 1

Five Questions

Student’s Name

Institutional Affiliation

Professor’s Name

Course Date

This study source was downloaded by 100000831236167 from CourseHero.com on 10-08-2022 01:46:52 GMT -05:00

https://www.coursehero.com/file/66078618/BI-assignmentdocx/
FIVE QUESTIONS 2

1. Explain the relationship among data mining, text mining, and sentiment analysis.

To understand the relationship between the three, let us first recall their definitions. Text mining

refers to a set of processes needed to convert unstructured information or resources into valuable

structured data (Aggarwal et al., 2012). Text mining needs both sophisticated statistical and

linguistic techniques to analyze unstructured information formats and methods which are able to

combine all the information with actionable metadata (Aggarwal et al., 2012). Data mining can

be defined as the process which depends on algorithms to extract and analyze useful content

from data. It’s a process that can be used to discover hidden relationships and patterns in datasets

(Aggarwal et al., 2012). Sentiment analysis refers to the process of categorizing and

computationally identifying the opinions which have been expressed in a piece of data

(Aggarwal et al., 2012). The relationship between the three is that text mining is the application

of data mining which has been specified by sentiment analysis. Text mining also relates to

sentiment analysis as it’s used for analysis so as to recognize data patterns and their analytics

(Aggarwal et al., 2012).

2. In your own words, define text mining, and discuss its most popular applications.

In my own words I can say text mining is the process followed when one is transforming text

data which is unstructured into actionable and meaningful information (Aggarwal et al., 2012).

The following are the most popular applications of text mining: knowledge management, risk

management, spam filtering, business intelligence, social media information analysis, contextual

advertising, detection of fraud via claims investigation, prevention of cybercrime, and customer

care service (Aggarwal et al., 2012).

This study source was downloaded by 100000831236167 from CourseHero.com on 10-08-2022 01:46:52 GMT -05:00

https://www.coursehero.com/file/66078618/BI-assignmentdocx/
FIVE QUESTIONS 3

3. What does it mean to induce structure into text-based data? Discuss the alternative

ways of inducing structure into them.

To induce structure into text-based data applying and adapting algorithms for mining information

using the iterative process of substitution and selection to present the information which

contains terms of interest (Younis., 2015). The following are ways of inducing structure into

them: First, isolating key words. In this approach, tokenizing is used to split the entire body of

the text into simpler and individual words. You are supposed to think of the words as tokens for it

to be effective. The second way in determining topics. This needs the text to be categorized by

its main subject matter. It depends on the data source. The last way I will discuss is measuring

sentiment. This approach involves measuring the tone using sentiment analysis (Younis., 2015).

4. What is the role of NLP in text mining? Discuss the capabilities and limitations of NLP

in the context of text mining.

The role of NLP in text mining is to perform linguistic analysis to help the computer read the

text. Its capabilities are: first, it can decipher the ambiguities found in language used by humans

since it uses several methodologies. Secondly, it can do the following, extract entity, perform

automatic summarization, and disambiguation. For all this to be effective it requires a knowledge

base that is consistent. The limitations of NLP in text mining are: there is variety and ambiguity

in text. This is because humans are more creative when using their language thus different text

contexts have meanings that are different (Younis., 2015).

5. Go to explore the sections on applications as well as software. Find names of at least

three additional packages for data mining and text mining.

The packages are: Quanteda, Text2vec, and Tidytext.

This study source was downloaded by 100000831236167 from CourseHero.com on 10-08-2022 01:46:52 GMT -05:00

https://www.coursehero.com/file/66078618/BI-assignmentdocx/
FIVE QUESTIONS 4

References

Aggarwal, C. C., & Zhai, C. (Eds.). (2012). Mining text data. Springer Science & Business

Media.

Younis, E. M. (2015). Sentiment analysis and text mining for social media microblogs using

open source tools: an empirical study. International Journal of Computer Applications,

112(5).

This study source was downloaded by 100000831236167 from CourseHero.com on 10-08-2022 01:46:52 GMT -05:00

https://www.coursehero.com/file/66078618/BI-assignmentdocx/
Powered by TCPDF (www.tcpdf.org)

You might also like