You are on page 1of 4

INDIVIDUAL CONTRIBUTION- CAPSTONE

ALY 6980 CAPSTONE

NORTHEASTERN UNIVERSITY

NOTE:

CHIRAG SANJAYBHAI SHAH, COLLEGE OF PROFESSIONAL STUDIES,


NORTHEASTERN UNIVERSITY, BOSTON, MA 02115.

INSTRUCTOR NAME: DR. MATTHEW GOODWIN

CONTACT: SHAH.CHIR@NORTHEASTERN.EDU

DATE : 2nd JULY, 2021


Problem Statement

Quantum Analytica is a company that aims to provide analytical solutions to small companies

that do not know how to maximize their data. This project is designed to help Quantum Analytica

better understand their marijuana data and any key patterns that may exist in their data. Each

Tableau dashboard contains visualizations that highlight key insights from the data that Quantum

Analytica may find useful in their future analyses. All dashboards and visualizations created

utilize the dispensary data (i.e. northeastern_dispensary_data) and the product data (i.e.

northeastern_product_data) provided to Northeastern by Quantum Analytica.

Methods

Team obtained the dataset from Quantum Analytica gave us. There were three datasets initially.

Team worked on dispensary dataset, product dataset and category dataset. Team performed

extensive exploratory data analysis using Python, Tableau. Team then performed machine

learning algorithms using K-Means algorithms in Python and made several dashboards in

Tableau which are aligned to our goal. Finally, the team served the findings, recommendation

and future research to the task given by the sponsor.

Contribution

The task contribution which I was assigned was to explore the dataset using the rating variables,

perform K-Means algorithm and make Tableau dashboard by doing exploratory data analysis.

There are 4 types of rating present in the dataset namely service_rating, atmosphere_rating,

quality_rating, rating. I need to see that how these variables is helping the dispensary to be

popular in certain types of the products category and its brands. To accomplish this task, I started

with exploring the dataset by seeing which zip codes have the highest average product price and
highest average product price per unit. So, I plotted the graph “Plot to show zip codes having 5

largest produt price and product price per unit” and I also plotted the graph of “'Plot to show

dispensaries created by year and month” to observe that is there any trend present in the dataset. I

found that year 2018 and the month October were the highest creation of dispensaries. I filtered

the dataset with service_rating, atmosphere_rating, quality_rating, rating values with more than

4.5 point, it was found that Cannabis as the highest production in MA and for the state of CO, the

highest production is of Edibles. The zip code numbered 2651 has the dispensaries with highest

average product price and zip code of 2703 has the dispensaries with the highest average product

price per unit. The zip code with 01002 produces the largest products of the Cannabis, having

Rythm product brand with highest product price per unit. There are 28 places where the Cannabis

is being sold at the same zip code. The zip code with 80478 and 81212 produces the largest

products of the Edibles, where Dixie Brands in 80478 zip code has rate of 180 product price per

unit while Stratos brand in 81212 has rate of 257.9 which are the highest in their respective zip

codes. There are 25 places where the Edibles products are being sold for both the zip codes. With

this, I have also performed the K-Means algorithm in Python as below:

You might also like