You are on page 1of 2

BDS 1101-BSD 3101: Principles of Data Science

CAT 1

Alvin Maina Mwangi – 20/04372

a) Explain why effective communication is critical for a data scientist or analyst.


Support your answer with a valid example. (2 Marks)

Effective communication is crucial for a data scientist or analyst because it enables them
to convey their findings in an understandable and actionable way to stakeholders. As
data analysts identify trends, patterns, and ask critical questions, they need to be able to
communicate these insights effectively to drive business decisions. For example, a data
analyst may identify a particular demographic that is not engaging with a company's
product. If the analyst can effectively communicate this finding to the marketing team,
they can tailor their strategies to better engage this demographic, potentially increasing
the company's market share.
b) Discuss any three effective communication techniques for data analysts. (3
Marks)
Data Visualization: This involves presenting data in a graphical format, making
complex data simpler and more intuitive to understand. It can include anything from
simple bar graphs to complex heat maps or geospatial data.

Storytelling with Data: This technique involves weaving a narrative around the data to
help stakeholders understand the significance of the findings. The story can explain what
the data means, why it's important, and how it can impact the business.

Active Listening: This means fully engaging with the feedback and queries from
stakeholders. By understanding their concerns and questions, a data analyst can better
tailor their communication to address these points.
c) Describe four most commonly used algorithm by data scientist. (4 Marks)

Linear Regression: This is a statistical method used to model the relationship between a
dependent variable and one or more independent variables.

Logistic Regression: This is typically used for classification problems. It models the
probability that each input belongs to a particular category.

Decision Trees: This algorithm makes decisions based on certain conditions. It's a type
of supervised learning algorithm that is mostly used for classification problems.

K-Means Clustering: This is a type of unsupervised learning algorithm used to classify


data into clusters. The algorithm divides a set of samples into disjoint clusters where
each sample belongs to the cluster with the nearest mean.

d) Discuss the six steps in the data preparation process. (6 Marks)

Data Collection: This is the process of gathering data from different sources relevant
to the business problem.

Data Cleaning: This involves removing errors, inconsistencies, and inaccuracies in


the data.

Data Integration: This is the process of combining data from different sources into a
coherent set.

Data Transformation: This involves converting data from one format or structure into
another.

Data Reduction: This is the process of reducing the volume of data by removing
redundant data or aggregating data.

Data Discretization: This involves converting continuous data into discrete forms.
This step is optional and depends on the specific requirements of the analysis

K Mugoye, PhD Comp Science.

You might also like