Professional Documents
Culture Documents
DAV - Viva QnA - Doubtly - in
DAV - Viva QnA - Doubtly - in
in
tly
ub
DAV - Viva QnA
do
w.
w
Semester : 6
Branch : AI-DS/ML - Engineering
w
Curated by Doubtly.in
Instagram : @mydoubtly
😃 Insta pe follow bhi Kardo
Important Links :
https://www.doubtly.in/semester-6-study-material-aids-ml/
https://www.doubtly.in/cloud-computing-viva-questions-with-answers-sem-6-ai-ds-ml/
https://www.doubtly.in/ml-viva-questions-with-answer-sem-6-ai-ds-ml/
.in
https://www.doubtly.in/dav-viva-questions-with-answer-sem-6-ai-ds-ml/
Important Questions :
tly
https://www.doubtly.in/semester-6-study-material-aids-ml/
Doubtly.in Page : 3
😃 Insta pe follow bhi Kardo
Module 1: Introduction to Data Analytics and Lifecycle
Q1: What are the key roles required for a successful analytics project?
A1: Several key roles are vital for the success of an analytics project. Data scientists are
responsible for analyzing data and extracting insights. Data engineers manage data
pipelines and infrastructure. Domain experts provide context and understanding of the
business domain. Project managers coordinate tasks and resources. Business stakeholders
provide guidance and make decisions based on analytics outcomes.
.in
understanding the business domain, defining the problem, identifying stakeholders,
interviewing sponsors, and formulating initial hypotheses. This phase is crucial for setting the
direction of the project and establishing a solid foundation for subsequent stages.
tly
A3: The Data Preparation phase focuses on getting the data ready for analysis. This
includes setting up the analytic environment, performing data extraction, transformation, and
loading (ETL), exploring and understanding the data, cleaning and formatting it, and creating
visualizations to gain insights into its characteristics and quality.
ub
Q4: Explain the Model Planning phase in the Data Analytics Lifecycle.
A4: The Model Planning phase is where the analytics team decides on the approach to
analyze the data. It involves exploring different variables, selecting relevant features, and
choosing appropriate models for analysis. This phase lays the groundwork for building
do
predictive or descriptive models that will be used to derive insights from the data.
Q5: What are the common tools used during the Model Planning phase?
A5: Common tools for the Model Planning phase include statistical software such as R or
w.
Python, along with libraries like scikit-learn or TensorFlow. These tools provide functionalities
for data exploration, feature selection, model training, and evaluation, helping analysts in the
decision-making process.
w
their accuracy and effectiveness. This phase requires iterative testing and refinement to
ensure the models meet the project objectives.
Doubtly.in Page : 4
😃 Insta pe follow bhi Kardo
outcomes. It involves deploying the developed models into production systems, integrating
them with existing workflows, and establishing mechanisms for monitoring model
performance and updating them as needed. This phase ensures that analytics solutions
deliver value in real-world scenarios.
Q9: Why is it essential to involve key stakeholders during the Discovery phase?
A9: Involving key stakeholders during the Discovery phase ensures alignment between
analytics objectives and business goals. It helps in gathering relevant domain knowledge,
clarifying requirements, and identifying potential challenges early in the project lifecycle.
Stakeholder involvement fosters collaboration and ensures that the analytics solution meets
the needs of the organization.
.in
Q10: How does data visualization contribute to the Data Analytics Lifecycle?
A10: Data visualization plays a crucial role in the Data Analytics Lifecycle by making
complex data more accessible and understandable. It helps analysts explore data patterns,
identify trends, and communicate insights effectively to stakeholders. Visualization
tly
techniques such as charts, graphs, and dashboards facilitate decision-making by providing
visual representations of analytical findings.
equation to the data, calculating fitted values and residuals, and minimizing the sum of
squared residuals through the method of least squares.
Q2: How is Multiple Linear Regression different from Simple Linear Regression?
A2: Multiple Linear Regression extends the concept of Simple Linear Regression to include
w.
outcome. It is particularly useful when the dependent variable is categorical and has only two
possible outcomes, such as “yes” or “no,” “success” or “failure.”
Q4: Describe the Logistic Response function and its significance in Logistic
Regression.
A4: The Logistic Response function, also known as the sigmoid function, maps the linear
combination of predictor variables to the probability of a binary outcome. It ensures that
predicted probabilities fall between 0 and 1, making Logistic Regression suitable for
modeling probabilities.
Q5: What are odds ratios, and how are they interpreted in Logistic Regression?
A5: Odds ratios represent the change in the odds of the outcome occurring for a one-unit
Doubtly.in Page : 5
😃 Insta pe follow bhi Kardo
change in the predictor variable. In Logistic Regression, they quantify the effect of each
predictor on the likelihood of the outcome, providing valuable insights into the relationship
between predictors and the outcome variable.
Q6: What are some similarities and differences between Linear Regression and
Logistic Regression?
A6: Both Linear Regression and Logistic Regression are types of regression models used
for predictive modeling. However, Linear Regression models continuous outcomes, while
Logistic Regression models binary outcomes. Additionally, the interpretation of coefficients
differs between the two models, with Linear Regression focusing on the change in the
dependent variable and Logistic Regression focusing on odds ratios.
.in
Q7: How do you assess the performance of Regression models?
A7: Regression model performance can be assessed using various metrics such as
R-squared (for Linear Regression), accuracy, confusion matrix, and ROC curve (for Logistic
Regression). Cross-validation techniques and model selection methods help in choosing the
tly
best-performing model.
A9: The coefficients in a Logistic Regression model represent the change in the log odds of
the outcome for a one-unit change in the predictor variable. Exponentiating these
coefficients gives the odds ratios, which quantify the impact of each predictor on the
likelihood of the outcome occurring.
w.
estimating the model’s predictive accuracy and identifying potential issues such as overfitting
or underfitting. Cross-Validation ensures that the model performs well on new data, beyond
the training dataset used for model fitting.
w
Doubtly.in Page : 6
😃 Insta pe follow bhi Kardo
steps: identification, estimation, and diagnostic checking of the model. This methodology
helps in selecting the appropriate ARIMA model to fit the data.
Q3: What is the Autocorrelation Function (ACF), and how is it used in Time Series
Analysis?
A3: The Autocorrelation Function (ACF) measures the correlation between observations at
different time lags within a time series. It helps in identifying patterns of correlation, such as
seasonality or trend, and selecting appropriate lag values for autoregressive or moving
average models.
Q4: What are Autoregressive (AR) Models, and how do they work in Time Series
Analysis?
.in
A4: Autoregressive (AR) Models are time series models that use past observations of the
variable to predict future values. They assume that the current value of the variable depends
linearly on its previous values, with the addition of random error.
ly
Q5: Describe Moving Average (MA) Models and their role in Time Series Analysis.
A5: Moving Average (MA) Models are time series models that use past forecast errors to
predict future values. They capture the short-term fluctuations in the data by modeling the
bt
relationship between the current value and past forecast errors.
Q6: What is the difference between ARMA and ARIMA Models in Time Series
Analysis?
ou
A6: ARMA (Autoregressive Moving Average) models combine both autoregressive and
moving average components to capture the temporal dependencies in the data. ARIMA
(Autoregressive Integrated Moving Average) models include an additional differencing step
to make the time series stationary before modeling.
d
Q7: How do you build and evaluate an ARIMA Model in Time Series Analysis?
A7: To build an ARIMA Model, you first identify the appropriate order of differencing (d),
w.
autoregressive (p), and moving average (q) components using methods like ACF and Partial
Autocorrelation Function (PACF) plots. Then, you estimate the parameters and fit the model
to the data. Evaluation involves assessing the model’s goodness of fit using diagnostic tests
w
Q8: What are some reasons to choose ARIMA models for Time Series Analysis?
w
A8: ARIMA models are suitable for analyzing time series data with trend and seasonality
patterns. They provide interpretable parameters and can capture complex temporal
dependencies in the data. ARIMA models are widely used for forecasting applications in
various fields.
Q9: What precautions should be taken when using ARIMA models in Time Series
Analysis?
A9: When using ARIMA models, it’s essential to ensure that the time series is stationary or
can be made stationary through differencing. Care should be taken to avoid overfitting by
selecting appropriate model orders and validating the model’s performance on out-of-sample
data. Additionally, outliers and missing values should be handled appropriately before model
fitting.
Doubtly.in Page : 7
😃 Insta pe follow bhi Kardo
Q10: How does Time Series Analysis differ from other types of data analysis?
A10: Time Series Analysis focuses specifically on data collected over time, aiming to
understand and forecast temporal patterns and trends. Unlike cross-sectional or panel data
analysis, which considers observations at a single point in time, Time Series Analysis
accounts for the sequential nature of data and the dependencies between observations.
.in
computational linguistics, machine learning, and big data technologies have led to the
development of more sophisticated text mining techniques capable of extracting insights
from unstructured text data.
ly
Q2: What are the seven practices of text analytics, and how do they contribute to the
field?
A2: The seven practices of text analytics encompass various techniques and methodologies
bt
used for extracting meaning from unstructured text data. These practices include text
summarization, sentiment analysis, topic modeling, named entity recognition, document
categorization, entity linking, and concept extraction. Each practice addresses different
aspects of text analysis to derive valuable insights from textual information.
ou
Q3: What are some application and use cases for text mining?
A3: Text mining finds applications across diverse domains such as customer feedback
analysis, market research, social media monitoring, healthcare informatics, and legal
d
document analysis. Use cases include sentiment analysis of product reviews, summarization
of news articles, topic modeling of research papers, and categorization of customer support
w.
tickets.
preprocessing and cleaning the text, representing the text in a suitable format (e.g.,
bag-of-words or word embeddings), applying text mining techniques such as TF-IDF or topic
modeling, and interpreting the results to gain insights.
w
Doubtly.in Page : 8
😃 Insta pe follow bhi Kardo
calculated by multiplying the term frequency (how often a term appears in a document) by
the inverse document frequency (how rare the term is across all documents). TF-IDF is
commonly used for text representation and feature weighting in text mining tasks.
Q7: How do text analytics techniques like sentiment analysis and topic modeling help
in gaining insights from textual data?
A7: Sentiment analysis helps in understanding the emotional tone of text data, enabling
businesses to gauge customer opinions, identify trends, and address issues proactively.
Topic modeling, on the other hand, organizes textual data into coherent topics or themes,
allowing analysts to uncover hidden patterns, explore relationships, and extract actionable
insights from large text collections.
.in
Module 5: Data Analytics and Visualization with R
Q1: How can you import and export data in R?
ly
A1: In R, you can import data from external sources using functions like read.csv() for
CSV files, read.table() for tabular data, and readRDS() for R data files. Similarly, data
can be exported using functions like write.csv() and write.table().
bt
# Example of importing a CSV file
data <- read.csv("data.csv")
ou
# Example of exporting data to a CSV file
write.csv(data, "exported_data.csv")
A2: Common data types in R include numeric, character, logical, integer, and factor.
Attributes such as names, dimensions, and class define additional properties of objects in R.
w.
Doubtly.in Page : 9
😃 Insta pe follow bhi Kardo
# Example of visualizing a histogram of a numeric variable
hist(numeric_vector)
.in
heatmaps, and correlation matrices.
ly
Q7: What is the difference between data exploration and presentation in R?
A7: Data exploration in R involves understanding the structure and patterns in the data using
various visualization and statistical techniques. Presentation, on the other hand, focuses on
bt
creating visually appealing and informative plots or reports to communicate the findings
effectively.
ou
# Example of data exploration with a scatter plot
plot(data$X, data$Y)
Q1: What are the essential data libraries for data analytics in Python?
w.
A1: Essential data libraries for data analytics in Python include Pandas for data manipulation
and analysis, NumPy for numerical computing, and SciPy for scientific computing and
statistical analysis.
w
import scipy
data = [1, 2, 3, 4, 5]
Doubtly.in Page : 10
😃 Insta pe follow bhi Kardo
plt.hist(data)
plt.show()
Q3: How can you create a box plot and a violin plot using Matplotlib in Python?
A3: You can create a box plot using plt.boxplot() and a violin plot using
plt.violinplot() functions in Matplotlib.
.in
Q4: What is the Seaborn library used for in Python?
A4: Seaborn is a Python visualization library based on Matplotlib that provides a high-level
interface for creating attractive and informative statistical graphics. It is particularly useful for
ly
visualizing complex datasets and for creating visually appealing plots with minimal code.
sns.pairplot(data)
w.
Q7: How can you customize plots in Seaborn to improve their appearance?
A7: You can customize plots in Seaborn by modifying various aesthetics such as colors,
markers, line styles, labels, titles, and axis ticks using functions like sns.set_style(),
sns.set_palette(), and sns.despine().
Doubtly.in Page : 11
😃 Insta pe follow bhi Kardo
sns.set_style("whitegrid")
sns.set_palette("pastel")
sns.despine(left=True)
.in
plt.show()
ly
Matplotlib’s plt.bar() function for creating bar charts in Python.
functionality and integrates seamlessly with Pandas data structures, making it easier to
visualize complex datasets.
w.
w
w
Doubtly.in Page : 12
😃 Insta pe follow bhi Kardo
Important Links :
https://www.doubtly.in/semester-6-study-material-aids-ml/
https://www.doubtly.in/cloud-computing-viva-questions-with-answers-sem-6-ai-ds-ml/
https://www.doubtly.in/ml-viva-questions-with-answer-sem-6-ai-ds-ml/
https://www.doubtly.in/dav-viva-questions-with-answer-sem-6-ai-ds-ml/
Important Questions :
.in
https://www.doubtly.in/semester-6-study-material-aids-ml/
ly
https://www.doubtly.in/semester-6-ai-ds-mu-question-papers/
bt
ou
Img Credits : Worker illustrations by Storyset
d
w.
w
w
Doubtly.in Page : 13