You are on page 1of 1

Question 1

a) Data mining, is the process of discovering patterns and other valuable


information from large data sets. It is a powerful tool that can be used to improve

decision-making, predict future trends, and identify new opportunities.

b)SEMMA is an acronym for Sample, Explore, Modify, Model, and Assess. The first
step
of SEMMA process is to sample the data. A representative subset of the data will be
selected
for analysis. The next step is to explore the data. This would be done by examining
the data
to understand its characteristics such as distribution, missing values, and
outliers.
The Modify step involves preparing the data for modeling.
This may involve transforming variables, creating new variables, and handling
missing values.
The Model step involves applying data mining techniques to the data to create a
model.
The final step is to assess the model. It includes evaluating the model's
performance to know
how well it predicts the desired outcome.

c) The first step is to define the business problem and objectives.


Clearly articulate the problem you are trying to solve and the desired outcomes.
The second step is to Understand the data. Gather information about
the data sources, data quality, and data availability.
The third step is Data preparation. Cleanse, transform, and prepare the data for
analysis.
The fourth step is Exploratory data analysis.
Visualize and summarize the data to gain insights and identify patterns

You might also like