Professional Documents
Culture Documents
Analysis
Analysis
Out[3]: Order Brand Sneaker Name Sale Retail Release Shoe Buyer
Date Price Price Date Size Region
2017- Yeezy Adidas-Yeezy-
0 09-01 Boost-350-Low-V2- 1097.0 220 2016-09-
24 11.0 California
Beluga
2017- Yeezy Adidas-Yeezy-
1 09-01 Boost-350-V2- 685.0 220 2016-11-
23 11.0 California
Core-Black-Copper
2017- Yeezy Adidas-Yeezy-
2 09-01 Boost-350-V2- 690.0 220 2016-11-
23 11.0 California
Core-Black-Green
2017- Yeezy Adidas-Yeezy-
3 09-01 Boost-350-V2- 1075.0 220 2016-11-
23 11.5 Kentucky
Core-Black-Red
Adidas-Yeezy-
2017- Yeezy
4 09-01 Boost-350-V2- 828.0 220 2017-02-11 11.0 Rhode
Core-Black-Red- Island
2017
), Release Date, Shoe Size, and Buyer State. Each row represents an individual sale
on the StockX platform, and the dataset only includes sales within the United States.
The Order Date column represents the date the order was placed, while the Brand
column specifies whether the sale was for an Off-White x Nike or Yeezy 350 shoe.
The Sneaker Name column provides information on the specific model of the shoe
sold. The Sale Price column indicates the amount of money that the buyer paid for
the shoe, while the Retail Price column specifies the manufacturer's suggested retail
price for the shoe. The Release Date column indicates when the shoe was first
released. The Shoe Size column provides information on the size of the shoe sold.
The Buyer State column specifies the state in which the buyer resides.
Overall, this dataset provides valuable insights into the demand and pricing of two
popular shoe brands, and could be used to identify trends and patterns in consumer
behavior.
In [4]: # Convert Order_Date to datetime format
df['Order Date'] = pd.to_datetime(df['Order Date'])
df['Release Date'] = pd.to_datetime(df['Release Date'])
# Aggregate the data by Sneaker_name and Week to get the total sales of e
agg_df = df.groupby(['Sneaker Name', 'Week']).agg({'Sale Price': 'sum'}).
# Let's create a subset of the above table to plot the sales of 10 random
column_names = pivot_table.columns[1:].tolist()
selected_columns = pd.Series(column_names).sample(n=5, random_state=42)
pivot_table = pd.concat([pivot_table.iloc[:,0], pivot_table[selected_colu
pivot_table.head()
In [7]: # Let's try to create a heatmap to visualize the sales of the 10 randomly
# Pivot data to create a matrix with "Sneaker Name" as rows, "Week" as co
pivot_df = agg_df.pivot_table(index="Sneaker Name", columns="Week", value
Observing
States. the sales of sneaker models by US
In [8]: # Group the data by Buyer_region and calculate the total sales for each r
region_sales = df.groupby('Buyer Region')['Sale Price'].sum().reset_index
color
9M
8M
7M
6M
5M
4M
3M
2M
1M
# processed_df.head()
final_df = processed_df.copy()
final_df['Number of Sales'] = 1
final_df = final_df.groupby(['Month', 'Sneaker Name']).agg({'Sale Price':
final_df.head()
Out[10]:
Month Sneaker Name Sale Price Days Since Number of
Release Sales
0 0 Adidas-Yeezy-Boost-350-Low 1095.068182 457.818182 44
1 0 Adidas-Yeezy-Boost-350-V2 633.195329 198.233546 471
2 0 Air-Jordan-1-Retro-High-Off-
White-Chicago 1964.707317 9.219512 41
3 0 Nike-Air-Max-90 872.323529 8.470588 34
4 0 Nike-Air-Presto 1220.595238 8.523810 42
# test_df.head()
df_analysis.head(30)
Out[12]:
Month Sneaker Name Sale Price Days Since Number of
Release Sales
2 0 Air-Jordan-1-Retro-High- 1964.707317 9.219512 41
Off-White-Chicago
9 1 Air-Jordan-1-Retro-High- 1873.900000 34.000000 20
Off-White-Chicago
17 2 Air-Jordan-1-Retro-High- 1296.276042 71.317708 192
Off-White-Chicago
28 3 Air-Jordan-1-Retro-High- 1293.887850 94.383178 107
Off-White-Chicago
39 4 Air-Jordan-1-Retro-High- 1545.723077 128.400000 65
Off-White-Chicago
50 5 Air-Jordan-1-Retro-High- 1676.348837 158.767442 43
Off-White-Chicago
62 6 Air-Jordan-1-Retro-High- 1877.300000 186.725000 40
Off-White-Chicago
74 7 Air-Jordan-1-Retro-High- 2118.551724 218.448276 29
Off-White-Chicago
86 8 Air-Jordan-1-Retro-High- 2344.057143 250.600000 35
Off-White-Chicago
99 9 Air-Jordan-1-Retro-High- 2172.407407 279.962963 27
Off-White-Chicago
112 10 Air-Jordan-1-Retro-High- 2166.571429 308.500000 14
Off-White-Chicago
125 11 Air-Jordan-1-Retro-High- 2471.884615 342.807692 26
Off-White-Chicago
138 12 Air-Jordan-1-Retro-High- 2324.928571 370.500000 14
Off-White-Chicago
151 13 Air-Jordan-1-Retro-High- 2379.625000 404.125000 16
Off-White-Chicago
164 14 Air-Jordan-1-Retro-High- 2397.391304 432.826087 23
Off-White-Chicago
177 15 Air-Jordan-1-Retro-High- 2488.047619 463.666667 21
Off-White-Chicago
190 16 Air-Jordan-1-Retro-High- 2504.708333 491.166667 24
Off-White-Chicago
203 17 Air-Jordan-1-Retro-High- 2684.944444 515.944444 18
Off-White-Chicago
Correlation of
Since Release Number of Sales, Sale Price and Days
Let us pick one of the shows Air-Jordan-1-Retro-High-Off-White-Chicago
and see how the number of sales, sale price and days since release are correlated.
In [13]: fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot(111, projection='3d')
# Set the angle and elevation of the plot
ax.view_init(elev=30, azim=120)
ax.dist = 13
plt.show()
Linear Regression
In [14]: # create the independent variable matrix X and dependent variable vector
X = df_analysis[['Sale Price', 'Days Since Release']]
X = sm.add_constant(X) # add a constant term to the independent variable
y = df_analysis['Number of Sales']
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is c
orrectly specified.
[2] The condition number is large, 1.55e+04. This might indicate that th
ere are
strong multicollinearity or other numerical problems.
/Users/PRASI/Documents/My_Documents/MBA-SataProject/venv/lib/python3.9/s
ite-packages/scipy/stats/_stats_py.py:1736: UserWarning:
# create a 3D plot
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z, cmap='coolwarm')
ax.set_xlabel('Sale Price')
ax.set_ylabel('Days Since Release')
ax.set_zlabel('Number of Sales')
ax.dist = 11
plt.show()
So, it seems that the sale price is the highest when the product is released, and it
gradually reduces.
In [ ]: