Co4 Ha

CO-4
1. Explain the “bag of word” model for text representation.

2. Different sources result into outliers in the dataset. Discuss what they are? https://www.scribbr.com/s
tatistics/outliers/
3. Describe tf-idf score of a word? Explain why it is considered (mostly) as weight of
the word in similarity score computation. https://towardsdatascience.com/tf-idf-for-document-rankin
4. Describe autoregressive models? Why theseg-from-scratch-in-python-on-real-world-dataset-796d339a
models are applicable to a large class of
Desphade Pg-426 4089
FMCG and Drug items?
5. Justify the statement: The moving average method of forecasting relies on a “window
of past k observations
6. Define time series. Describe the necessary and sufficient conditions for a stationary
time series.
7. Give the mathematical derivation of the ARIMA model as a combination of AR and
MA models.
8. Differentiate between Collaborative and Content based filtering techniques for
recommendation systems
9. Justify the statement: Forecasting a time series requires (1) decomposition, followed
by (2) prediction of the decomposed components and (3) reconstruction of time series
using predicted components.
10. Give a quantitative analysis of the statement: “In the span of a few years, customers
could have instant access to half a million movies on a streaming service, millions of
products on an e-commerce platform, millions of news articles and user generated
content- Thus Recommendation systems are critically required.”
11. Justify the statement: “Intrusion Detection System (IDS) is a special class of Anomaly
Detection System (ADS) which is critically required in almost all software systems
security model”
12. Compare and contrast between distance and density-based outlier detection
13. Define the Exponential Smoothing process for time series forecasting. Also explain
the significance of the parameter in the forecasting values.
14. Explain in brief, the fundamental components of a time series
15. Justify the statement in terms of Exponential Smoothing of the time-series: “The Feb
2022 forecast not only depends upon actual Jan 2022 values but also on previously
forecasted Jan 2022 values.”
16. Explain the terms- Term Frequency (TF) and Inverse Document Frequency (IDF).
What is the role played by TF-IDF in the formulation of similarity index between two
paragraphs.
17. Compare and contrast between neighborhood based and latent matrix factorization-
based methods for collaborative filtering.
18. Comment on the statement: “Any supervised classification or regression predictive
models can be used to forecast the time series too, if the time series data are
transformed to a particular format with a target label and input variables.”
19. Outline the importance of Word-Cloud as a visual of interpreting text corpus in
graphical form.
20. In text processing, explain the process of stop word removal. Also explain the
importance of this technique. https://towardsdatascience.com/text-pre-processing-stop-words-remov
21. Explain how windowing can beal-using-different-libraries-f20bac19929a
used to transform a Time series forecasting problem
into supervised machine learning problem with dataset consisting of a set of input
variables and a target variable.
22. Comment on the statement: “Any supervised classification or regression predictive
models can be used to forecast the time series too, if the time series data are
transformed to a particular format with a target label and input variables.”
23. Describe document vector? How can it be used to score the similarity between two
documents?
24. For a quantitative analysis of a time series, it should be decomposed into 3 basic
components. Justify the statement and also explain the process of additive and
multiplicative decomposition of the time series.
25. Give a brief overview of Neighborhood based Collaborative Filtering for
Recommendation Systems
26. Compare and Contrast between “moving-average” and “weighted moving-average”
based forecasting of Time Series.
27. Identify the text Preprocessing operations which are critical for almost all the text
analytics models. Explain each one of them. https://blog.eduonix.com/artificial-intelligence/text-prepro
cessing-natural-language-processing/
28. In context of Text Processing, explain why those words which are having high IDF
are important words of the corpus.
29. Explain how the Convolution operation can be used as a method of Feature Detection.
Hence, explain how CNN are efficient in classification of Images / Handwriting.
30. Explain the following statement- A corpus is essentially a bag of word; wherein
unique words correspond to the dimensions of vector space.
31. Detail the significance of “Trend” component of a time series? Compare the trend
component with the seasonality component.
32. Using a suitable example, explain how supervised learning techniques can be applied
for content-based filtering in recommendation engines.

Co4 Ha

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Co4 Ha

Uploaded by

Copyright:

Available Formats

CO-4

1. Explain the “bag of word” model for text representation.

You might also like