Professional Documents
Culture Documents
Cosmetic Store
Deepak
Creating S3 Bucket:
Created folders in the buckets
Uploaded files in the folders
Copying the data set into the HDFS:
2. Using database:
Use sales ;
Describe db
Describe table
Load data into table from Nov_2019:
LOAD DATA LOCAL INPATH "/home/hadoop/2019-Nov.csv" INTO TABLE cosmetic_sales ;
• The above result shows that the purchases made in the month of November -2019 are
greater than October-2019. This may be due to the festive season sale such as Black Friday
sale.
3. Write a query to find the change in revenue generated due to purchases from
October to November.
WITH Total_Monthly_Revenue AS (
SELECT
SUM (CASE WHEN date_format(event_time, 'MM')=10 THEN price ELSE 0 END) AS
October_Revenue,
SUM (CASE WHEN date_format(event_time, 'MM')=11 THEN price ELSE 0 END) AS
November_Revenue
FROM cosmetic_sales
WHERE event_type= 'purchase'
AND date_format(event_time, 'MM') in ('10','11')
)
SELECT November_Revenue, October_Revenue, (November_Revenue-October_Revenue) AS
Difference_Of_Revenue FROM Total_Monthly_Revenue;
• Positive result in the above query shows that the purchases made in the
month of November are more than the month of October by
319478.469592195 units.
4. Find distinct categories of products. Categories with null category code can be
ignored.
Select distinct(category_code) as Distinct_Categories from cosmetic_sales;
• There are total 11 distinct category_code in the combined data of October and
November, 2019.
5. Find the total number of products available under each category:
SELECT category_id, COUNT(distinct(product_id)) AS total_products FROM cosmetic_sales GROUP BY
category_id
Strong is the brand which has maximum sales for both months combined.
Running with partitioned table:
6. SELECT brand, sum(price) as sales from cosmetic_store where brand != '' group by
brand order by sales desc limit 1 ;