You are on page 1of 4

Viva Questions

Can you explain how you approached understanding the problem in your capstone project?

A: In understanding the problem for my capstone project, I began by conducting thorough research and gathering insights
into the domain. I engaged with stakeholders to comprehend their requirements, ensuring a comprehensive grasp of the
problem at hand.

How did you employ the Design Thinking framework to decompose the problem in your capstone project?

A: The Design Thinking framework helped me break down the complex problem into manageable components. By
empathizing with end-users, defining the problem, ideating potential solutions, prototyping, and testing iteratively, I
gained a holistic understanding and identified viable approaches.

What analytic approach did you employ, and how did you determine the data requirements for your capstone project?
Sample answer

A: I adopted a data-driven approach, leveraging analytics to derive insights. Identifying data requirements involved defining
key variables, assessing data availability, and ensuring data quality. This process was crucial for building a robust foundation
for subsequent modeling.

Can you elaborate on the modeling approach you used in your capstone project? Sample answer

A: My modeling approach involved selecting appropriate algorithms based on the problem at hand. I iteratively
experimented with different models, fine-tuning parameters to enhance performance. This process allowed me to identify
the most effective model for the given context.

How did you validate the quality of your model, particularly using the train-test split evaluation method?

A: The train-test split involved dividing the dataset into training and testing sets. Training the model on the training set and
evaluating its performance on the unseen test set helped gauge its ability to generalize to new data, providing insights into
potential overfitting or underfitting.

Could you introduce the concept of cross-validation and explain its role in model validation?

A: Cross-validation involves partitioning the dataset into multiple subsets, training the model on different combinations,
and averaging the results. This technique provides a more robust evaluation, reducing the impact of data partitioning
variability and ensuring a more reliable assessment of the model's performance.

Discuss the metrics of model quality using simple math and examples, focusing on RMSE (Root Mean Squared Error)
and MSE (Mean Squared Error).

A: RMSE and MSE quantify the difference between predicted and actual values. RMSE is the square root of the average
squared differences, providing a measure of error in the same units as the target variable. MSE is the average of squared
differences. For example, if predicting housing prices, an RMSE of 10 means, on average, predictions deviate $10 from
actual prices.

Why do you use Mean Squared Error, and when is it most appropriate in model evaluation?

A: Mean Squared Error is preferred when emphasizing larger errors over smaller ones. Squaring magnifies larger
differences, making MSE sensitive to outliers. It's suitable for regression problems, where predicting numerical values is
crucial. However, caution is necessary as MSE can be influenced by outliers, so understanding the data characteristics is
essential for its appropriate use.

How did you go about understanding the problem in your capstone project?

A: Understanding the problem involved thorough research, stakeholder interviews, and a detailed analysis of existing
solutions. This initial phase helped in defining the project scope and objectives.
Explain how you applied the Design Thinking framework to decompose the problem in your capstone project.

A: The Design Thinking framework was instrumental in breaking down the problem into empathize, define, ideate,
prototype, and test phases. This iterative process allowed for a holistic understanding and creative solutions.

What analytic approach did you adopt for your capstone project, and why? Sample answer

A: I chose a combination of exploratory and confirmatory data analysis. Exploratory analysis helped in understanding the
data patterns, while confirmatory analysis validated hypotheses and ensured the reliability of findings.

How did you determine the data requirements for your capstone project?

A: Data requirements were identified through a careful consideration of variables crucial for model development. This
involved domain expertise, literature review, and iterative discussions with mentors.

Briefly describe the modeling approach you used in your capstone project. Sample Answer

A: I employed a machine learning approach, specifically regression, due to the nature of the problem. The selected
algorithm was based on its suitability for the dataset and the desired prediction outcomes.

Explain the importance of train-test split in evaluating model quality.

A: The train-test split allows for assessing how well the model generalizes to new, unseen data. It helps in identifying
overfitting or underfitting issues and ensures the model's robustness.

What is cross-validation, and why is it crucial in model evaluation?

A: Cross-validation involves partitioning the dataset into multiple subsets, training the model on different combinations,
and evaluating its performance. It provides a more robust estimate of the model's effectiveness, especially with limited
data.

Define RMSE and MSE as metrics of model quality, and provide a simple mathematical explanation.

A: RMSE (Root Mean Squared Error) and MSE (Mean Squared Error) measure the average squared difference between
predicted and actual values. RMSE is the square root of MSE, offering a more interpretable metric in the original units of
the data.

What is the rationale behind using Mean Squared Error as a metric?

A: MSE gives higher penalties for large errors, making it sensitive to outliers. This property is beneficial when the model
needs to prioritize minimizing the impact of larger deviations from the actual values.

In what scenarios would you recommend using Mean Squared Error for model evaluation?

A: Mean Squared Error is particularly useful when the focus is on accurately predicting continuous numerical values. It is
well-suited for regression problems where minimizing the impact of larger errors is crucial.

What is the model life cycle in artificial intelligence?

Answer: The model life cycle in AI refers to the stages a machine learning model goes through, including problem
definition, data collection, data preprocessing, model training, model evaluation, deployment, and maintenance.

Why is problem definition an essential step in the model life cycle?

Answer: Problem definition is crucial because it helps to clearly understand the objective, the type of data required, and
the desired outcomes. It lays the foundation for making informed decisions throughout the model life cycle.

Explain the importance of data collection in the model life cycle.


Answer: Data collection is vital as it provides the raw material for training and evaluating the model. The quality, quantity,
and relevance of data directly impact the model's performance.

What is data preprocessing, and why is it necessary in model development?

Answer: Data preprocessing involves cleaning, transforming, and organizing raw data to make it suitable for training a
machine learning model. It is necessary to handle missing values, outliers, and ensure data consistency, which improves
the model's robustness.

Describe the model training phase.

Answer: Model training involves using a labeled dataset to teach the model to make predictions or decisions. During
training, the model adjusts its parameters based on the input data to minimize the difference between predicted and
actual outcomes.

How do you evaluate the performance of a machine learning model?

Answer: Model evaluation involves using metrics such as accuracy, precision, recall, F1 score, or others, depending on the
specific problem. It helps assess how well the model generalizes to new, unseen data.

Why is deployment an important step in the model life cycle?

Answer: Deployment is crucial as it involves integrating the trained model into the real-world environment where it can
make predictions or decisions. A successfully deployed model adds value to the business or application.

What are the challenges in deploying a machine learning model?

Answer: Challenges in deployment may include dealing with hardware requirements, scalability issues, monitoring and
maintaining model performance, and addressing ethical concerns related to the model's impact on users or society.

Explain the concept of model maintenance.

Answer: Model maintenance involves monitoring the model's performance over time, updating it with new data, retraining
it when necessary, and addressing any issues that arise. It ensures that the model continues to provide accurate and
reliable predictions.

What is storytelling in the context of artificial intelligence?

Answer: Storytelling in artificial intelligence involves using narrative techniques to communicate insights, predictions, or
information derived from AI models in a coherent and engaging manner.

What is Data Storytelling?

Answer: Data Storytelling is the process of using data to communicate a narrative or message effectively. It involves
translating complex data insights into a compelling and easy-to-understand story for diverse audiences.

Why is Data Storytelling important in data analysis?

Answer: Data Storytelling is important because it helps make data more accessible and understandable. It engages and
communicates insights to stakeholders, facilitating better decision-making based on the data.

What are the key components of a good data story?

Answer: A good data story includes a clear narrative, relevant visuals, context, a target audience focus, and a compelling
message. It should guide the audience through the data insights in a logical and coherent manner.

How does visualization contribute to Data Storytelling?


Answer: Visualization is a powerful tool in Data Storytelling as it helps convey complex information quickly and intuitively.
Charts, graphs, and other visuals make data more accessible and aid in the effective communication of insights.

Can you explain the difference between data reporting and data storytelling?

Answer: Data reporting typically presents facts and figures, while data storytelling adds context, a narrative, and emotional
elements to the data, making it more engaging and memorable.

How do you choose the right visualizations for your data story?

Answer: Choosing the right visualizations depends on the nature of the data and the story you want to tell. Bar charts, line
graphs, pie charts, and heatmaps are among the options, and the choice should align with the data characteristics and the
story's objectives.

How can you make a data story more relatable to a non-technical audience?

Answer: To make a data story relatable, avoid jargon, use simple language, and provide real-world examples or analogies.
Visuals should be clear and accessible, and the story should address the audience's concerns or interests.

What role does context play in Data Storytelling?

Answer: Context is crucial in Data Storytelling as it helps the audience understand the significance of the data. Providing
background information, explaining the data source, and considering the broader context enhance the story's impact.

How do you handle conflicting data points in a data story?

Answer: Address conflicting data points transparently. Explain the reasons for discrepancies, highlight uncertainties, and
provide alternative scenarios if applicable. This adds credibility to the data story.

What are the ethical considerations in Data Storytelling?

Answer: Ethical considerations include ensuring data accuracy, avoiding misleading visualizations, protecting privacy, and
being transparent about any biases in the data. Ethical Data Storytelling is essential for maintaining trust with the audience.

You might also like