You are on page 1of 3

Hi Slack Team,

Please find my responses below for the assignment.

Data Quality Checks:

Data quality checks need to be performed before data anlysis in order to eliminate bad data or
rather ignore bad data while performing the analysis.

1. One-to-One relationship between team_id and user_id

The following SQL was used to find if any user_id violated that relationship and belonged to
more than one team_id.

select user_id from( select count(distinct TEAM_ID) as DISTINCT_TEAM_IDs, user_id from


alerts group by 2)subquery1 where subquery1.DISTINCT_TEAM_IDs > 1

--- this returned the user_id=456468590 which was associated with two different teams and
had to be ignored

2. Check for duplicate rows:

Below mentioned SQL was used to find if any duplicate data existed.

Select user_id,team_id,app_id,event,primary_browser,alert_type,eventtime from alerts


group by 1,2,3,4,5,6,7 having count(*) > 1

3. Check for data collected on the same day:

Problem statement mentions that it contains data belonging to a particular day. Below SQL will
return rows which are not from that particular day.

select * from alerts where DATE(eventtime)<>'2016-04-10'

4. Check for valid values:

Alert_type and event columns are supposed to have only specified values and other values may
be treated as bad data. Below SQL may return if we found any such data. This returned an
empty set for our table.

select * from alerts where alert_type NOT IN('sidebar_alert','banner_alert','push_alert')


OR event NOT IN('imp','clk')
- I tried to delete the bad user user_id=456468590 but for some reason the SQL ran but
did not delete the data. So, I have ignored it wherever required.

Questions & Answers:

1. What is the best performing alert type?

- The best performing alert_type is sidebar_alert since that is the alert that has been
used the most number of times to send alerts.

SELECT alert_type FROM alerts GROUP BY 1


HAVING COUNT(alert_type) =
(SELECT MAX(mycount)FROM
(SELECT alert_type,COUNT(alert_type) mycount FROM alerts GROUP BY 1)alert_type_count)

2. What apps are the best and worst performing?

App which has sent more alerts is the best performing and which has sent the least number of
alerts is the worst performing one.

Best performing app is the app with app_id=15 and worst performing app is the app with
app_id=38

BEST APP:

SELECT app_id AS BEST_APP FROM alerts GROUP BY 1


HAVING COUNT(app_id) =
(SELECT MAX(mycount)FROM
(SELECT app_id,COUNT(app_id) mycount FROM alerts GROUP BY 1)app_id_count)

WORST APP:

SELECT app_id AS WORST_APP FROM alerts GROUP BY 1


HAVING COUNT(app_id) =
(SELECT MIN(mycount)FROM
(SELECT app_id,COUNT(app_id) mycount FROM alerts GROUP BY 1)app_id_count)
3. Im curious about what the first alert a team clicked on in this day? For
each alert_type, compute how many teams clicked an alert of that
type as their first alert in a day.

First alert a team clicked on this day:

SELECT DISTINCT(team_id), alert_type,MIN(eventtime) FROM(select * from alerts where


user_id!='456468590')alerts_sub_table
GROUP BY 1

Number of teams which clicked an alert of that type which was their first alert in the day:

Below SQL gives us the total number of teams that have clicked an alert_type which has been
their first alert_type on that day.

SELECT COUNT(*) Number_of_teams, alert_type FROM (


SELECT alert_type, event FROM (
SELECT DISTINCT(team_id), alert_type,MIN(eventtime),event FROM (select * from alerts where
user_id!='456468590')alerts_sub_table
GROUP BY 1) distinct_team_alert_type where event='clk')alert_type_count
GROUP BY 2

+-----------------+---------------+
| Number_of_teams | alert_type |
+-----------------+---------------+
| 270 | banner_alert |
| 188 | push_alert |
| 558 | sidebar_alert |
+-----------------+---------------+

Note: Best performing alert_type, best performing app and worst performing app are runaway
winners and will not be affected by bad data.