Professional Documents
Culture Documents
Mosquito Mark Final PDF
Mosquito Mark Final PDF
Mosquito Mark Final PDF
Data Encoding:
I asked the researchers to encode their results, as summarized by the table below. Note that, SAgree and SDisagree stands for Strongly Agree and Disagree.
Out[2]:
SAgree Agree Neutral Disagree SDisagree
0 5 4 3 2 1
For the Recommendation column, 0 was assigned to Commercial by default since, respondents are not ask to provide recommendation for them.
Out[3]:
HighlyRecommended Moderate Recommended LeastRecommended NotRecommended Commercial
0 5 4 3 2 1 0
Similarly, Gender was encoded as 1 - Male and 0 - Female. Finally, Age was binned as summarized by the table below.
Out[4]:
15 below 16-20 21-25 26-30 31-35 36 above
0 1 2 3 4 5 6
Out[5]:
RespondentNumber Gender Age Range Product Price Packaging Fragrance Span Safeness Effectiveness Duration Efficiency Recommendation
0 1 1 3 Commercial 5 5 4 4 5 3 3 4 0
1 1 1 3 Own 2 3 3 3 2 1 1 2 1
2 2 0 3 Commercial 2 4 4 5 4 5 3 5 0
3 2 0 3 Own 2 2 2 3 2 1 2 3 2
4 3 1 6 Commercial 3 3 1 3 2 4 3 4 0
5 3 1 6 Own 3 3 3 3 2 3 3 3 3
First, let us check the distribution of the Recommendations. Out of the 100 respondents, ~50% has 'recommended' the product. Also, looking at the distribution, ~70% of the
respondents chose 3-5. Looking at the gender level, we have a slightly higher respondents that are female. Finally, most of therespondents who gave a poor recommendation
are female.
Out[6]:
Next, let's examine the distribution of the recommendation by Age group. Most of the respondents are under the 3, 4 and 6 age range. The distribution by age range across all
recommendations seems fairly similar.
19/09/2019, 8:58 am
Out[7]:
Now, we will focus our attention to the questionaire. Let us examine how the ratings of our product compares to the commercial product.
Out[9]:
Out[11]:
19/09/2019, 8:58 am
Here's what the graphs above is telling us:
Finally, to better understand why our respondents chose their respective recommendations, we will use Logistic Regression. (Note that this is a multi-class example) We chose
this learning algorithm since this will assign coefficients to each metric, that after some calculations, it can be interpreted as probabilities.
The coefficients $\theta_i$ are chosen such that the below function is minimized.
$$-y\cdot log(h(\theta)) - (1-y)\cdot log(1-h(\theta))$$
where $h(\theta)$ is is the sigmoid function applied to the linear combination of the coefficients and the data. Note that we're only concerned with the coefficients, not how well
the model generalizes the data since our sample size is relatively small.
Out[15]:
Gender Age Range Price Packaging Fragrance Span Safeness Effectiveness Duration Efficiency
0 0.43817 0.397997 0.468527 0.507815 0.27758 0.620712 0.476683 0.605174 0.503099 0.660538
For interpretability's sake, we only show the first row of the assigned probabilities. Notice how our best attributes received the highest probabilities.
Summary
~70% of the respondents are likely to recommend the product.
Although we receive most of the lower votes on the most of the metrics, we're almost on par with respect to the higher votes.
Our best attributes are Packaging, Span, Effectiveness, Duration and Efficiency.
Recommendatons
Apply more sophisticated sampling techniques to avoid bias.
Increase the sample size.
Analysis made by
Benjamin Reyes Cabalona Jr.
Associate Data Scientist at Novare Technologies
benjamin.cabalonajr@novare.com.hk (benjamin.cabalonajr@novare.com.hk)
benjamin.cabalonajr@outlook.com (benjamin.cabalonajr@outlook.com)
19/09/2019, 8:58 am