Data Mining & Data Warehouse : Hubei University of Technology (HBUT)

Hubei University of Technology
(HBUT)
湖北工业大学
~Data Mining & Data

Warehouse~
Name: Rashid Md Mamunur (潘安)

Student Id: 1811562126
Class: 19lc 软工
Assignment: 1
1. You are approached by the marketing director of a local company, who believes that he has devised a
foolproof way to measure customer satisfaction. He explains his scheme as follows: “It’s so simple that I
can’t believe that no one has thought of it before. I just keep track of the number of customer complaints
for each product. I read in a data mining book that counts are ratio attributes, and so, my measure of
product satisfaction must be a ratio attribute. But when I rated the products based on my new customer
satisfaction measure and showed them to my boss, he told me that I had overlooked the obvious, and that
my measure was worthless. I think that he was just mad because our best-selling product had the worst
satisfaction since it had the most complaints. Could you help me set him straight?”
a) Who is right, the marketing director or his boss? If you answered, his boss, what would you do to fix the
measure of satisfaction?
(b) What can you say about the attribute type of the original product satisfaction attribute?
Answer: (a).
The boss is correct in this situation with the marketing director overlooking the obvious. The number
of complaints is a meaningless measurement when it doesn’t consider the number of products purchased.
To fix the measurement of satisfaction analysis, one would have to consider the number of products sold
and compare it to the number of complaints filed.
To determine which product has the most complaints, you must compare the percentage of complaints
divided by the number of products sold. Another consideration that must be considered is the scale of the
minimum number of products sold to take an accurate analysis. For example, if a store sold two products:
products x and y, that sold 100 units of x, and 2 units of y. If the store received 30 complaints for product
x, and 1 complaint for product y, then computing the percentage of complaints for each product sold for
product results in 30% and 50%. When taking a quick look at the percentage rate of complaints, the boss
would rush to fix the problem with 50% complaint rate. Though in this case only 2 items of this product
type were sold, and the severity of the complaint is unknown. Therefore, placing a minimum number of
products sold to consider making an accurate analysis is needed.
(b).
The original product satisfaction attribute of the counts being ratio attributes is a correct analysis.
Although the data set is not comparable since each number count of complaints is not based on the same
scale resulting a bias sample set of data. This analysis is the same as having a sample set of temperatures
measured in Celsius, Kalvin, and Fahrenheit and just reporting the numerical temperature without
converting all measurements to one common scale domain.
2. Which of the following quantities is likely to show more temporal autocorrelation: daily rainfall or daily
temperature? Why?
Answer:
A feature shows spatial autocorrelation if locations that are closer to each other are more similar with
respect to the values of that feature than locations that are farther away. It is more common for physically
close locations to have similar temperatures than similar amounts of rainfall since rainfall can be very
localized the amount of rainfall can change abruptly from one location to another. Therefore, daily
temperature shows more spatial autocorrelation then daily rainfall.
3. Based on the data in Table 1 in Chapter 4, draw separate decision trees to predict which category the
lion, owl and crocodile belong to？
Answer:
Body Skin Gives Aquatic Aerial Has Class
Name Hibernates
Temperature Cover Birth Creature Creature Legs Label
Lion Warm-blooded hair yes no no yes no mammal
Owl Warm-blooded feathers no no yes yes no bird
Crocodile Cold-blooded scales no no no yes no reptile
Table:1
From the table 1 & Decision tree of lion we can predict that a lion is a mammal.
From the table 1 & Decision tree of owl we can predict that a owl is a bird.
From the table 1 & Decision tree of crocodile we can predict that a crocodile is a reptile.
Body temperature, hibernation and legs are the attributes in the dataset that decides a mammal or non-
mammal.
Because mammals and non-mammals have creatures that are aquatic, aerial, can have various range of skin
colors and may or may not give birth.
4. We further explore the cosine and correlation measures.
(a)What is the range of values that are possible for the cosine measure?
(b)if two objects have a cosine measure of 1，are they identical? Explain.
Answer:
(a)
[-1, 1]. Many times, the data has only positive entries and in that case the range is [0, 1].
(b)
Not necessarily. All we know is that the values of their attributes differ by a constant factor.

Data Mining & Data Warehouse : Hubei University of Technology (HBUT)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining & Data Warehouse : Hubei University of Technology (HBUT)

Uploaded by

Copyright:

Available Formats

Hubei University of Technology

~Data Mining & Data

Name: Rashid Md Mamunur (潘安)

You might also like