You are on page 1of 1

 Overview

In this problem, you will use the data from the chapter assigned for this week, particularly
problem 20.6Online Discussions on Autos and Electronics, in which the task is to develop a model
to classify documents as either auto-related or electronics-related.
In R Your Job is To:

 Load the above file into R and create a label vector.


 Preprocess the documents. Explain what would be different if you did not perform the “stemming” step.
 Use the lsa package from R to create 10 concepts. Explain what is different about the concept matrix,
as opposed to the TF-IDF matrix.
 Using this matrix, fit a predictive model (different from the model presented in the chapter illustration)
to classify documents as autos or electronics. Compare its performance to that of the model presented
in the chapter illustration. I have attached chapter 20 file for this assignment it is R code.

You might also like