1. Explain the “bag of word” model for text representation.
2. Different sources result into outliers in the dataset. Discuss what they are? https://www.scribbr.com/s tatistics/outliers/ 3. Describe tf-idf score of a word? Explain why it is considered (mostly) as weight of the word in similarity score computation. https://towardsdatascience.com/tf-idf-for-document-rankin 4. Describe autoregressive models? Why theseg-from-scratch-in-python-on-real-world-dataset-796d339a models are applicable to a large class of Desphade Pg-426 4089 FMCG and Drug items? 5. Justify the statement: The moving average method of forecasting relies on a “window of past k observations 6. Define time series. Describe the necessary and sufficient conditions for a stationary time series. 7. Give the mathematical derivation of the ARIMA model as a combination of AR and MA models. 8. Differentiate between Collaborative and Content based filtering techniques for recommendation systems 9. Justify the statement: Forecasting a time series requires (1) decomposition, followed by (2) prediction of the decomposed components and (3) reconstruction of time series using predicted components. 10. Give a quantitative analysis of the statement: “In the span of a few years, customers could have instant access to half a million movies on a streaming service, millions of products on an e-commerce platform, millions of news articles and user generated content- Thus Recommendation systems are critically required.” 11. Justify the statement: “Intrusion Detection System (IDS) is a special class of Anomaly Detection System (ADS) which is critically required in almost all software systems security model” 12. Compare and contrast between distance and density-based outlier detection 13. Define the Exponential Smoothing process for time series forecasting. Also explain the significance of the parameter in the forecasting values. 14. Explain in brief, the fundamental components of a time series 15. Justify the statement in terms of Exponential Smoothing of the time-series: “The Feb 2022 forecast not only depends upon actual Jan 2022 values but also on previously forecasted Jan 2022 values.” 16. Explain the terms- Term Frequency (TF) and Inverse Document Frequency (IDF). What is the role played by TF-IDF in the formulation of similarity index between two paragraphs. 17. Compare and contrast between neighborhood based and latent matrix factorization- based methods for collaborative filtering. 18. Comment on the statement: “Any supervised classification or regression predictive models can be used to forecast the time series too, if the time series data are transformed to a particular format with a target label and input variables.” 19. Outline the importance of Word-Cloud as a visual of interpreting text corpus in graphical form. 20. In text processing, explain the process of stop word removal. Also explain the importance of this technique. https://towardsdatascience.com/text-pre-processing-stop-words-remov 21. Explain how windowing can beal-using-different-libraries-f20bac19929a used to transform a Time series forecasting problem into supervised machine learning problem with dataset consisting of a set of input variables and a target variable. 22. Comment on the statement: “Any supervised classification or regression predictive models can be used to forecast the time series too, if the time series data are transformed to a particular format with a target label and input variables.” 23. Describe document vector? How can it be used to score the similarity between two documents? 24. For a quantitative analysis of a time series, it should be decomposed into 3 basic components. Justify the statement and also explain the process of additive and multiplicative decomposition of the time series. 25. Give a brief overview of Neighborhood based Collaborative Filtering for Recommendation Systems 26. Compare and Contrast between “moving-average” and “weighted moving-average” based forecasting of Time Series. 27. Identify the text Preprocessing operations which are critical for almost all the text analytics models. Explain each one of them. https://blog.eduonix.com/artificial-intelligence/text-prepro cessing-natural-language-processing/ 28. In context of Text Processing, explain why those words which are having high IDF are important words of the corpus. 29. Explain how the Convolution operation can be used as a method of Feature Detection. Hence, explain how CNN are efficient in classification of Images / Handwriting. 30. Explain the following statement- A corpus is essentially a bag of word; wherein unique words correspond to the dimensions of vector space. 31. Detail the significance of “Trend” component of a time series? Compare the trend component with the seasonality component. 32. Using a suitable example, explain how supervised learning techniques can be applied for content-based filtering in recommendation engines.