Professional Documents
Culture Documents
Diamonds: Analyze Diamonds by Their Cut, Color, Clarity, Price, and Other Attributes
Diamonds: Analyze Diamonds by Their Cut, Color, Clarity, Price, and Other Attributes
Group Details
This dataset contains the prices and other attributes of almost 54,000 diamonds.
ATTRIBUTE EXPLANATION
PRICE price in US dollars (\$326--\$18,823)
CARAT weight of the diamond (0.2--5.01)
CUT quality of the cut (Fair, Good, Very Good,
Premium, Ideal)
COLOR diamond colour, from J (worst) to D (best)
CLARITY a measurement of how clear the diamond is (I1
(worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF
(best))
X length in mm (0--10.74)
Y width in mm (0--58.9)
Z depth in mm (0--31.8)
DEPTH total depth percentage = z / mean(x, y) = 2 * z /
(x + y) (43--79)
TABLE width of top of diamond relative to widest
point (43--95)
Method:
The method used in regression.
Code:
library(ggplot2)
library(plyr)
#Read file
diamonds<-read.csv(file.choose(),header = T)
summary(diamonds)
levels(diamonds$cut)
na <- colSums(is.na(diamonds))
#Finding outliers
# Checking the 98th percent value for each column of continuous variable
for(i in 1:ncol(continous_var)){
if(is.numeric(continous_var[,i])){
for(i in 1:ncol(continous_var)){
if(is.numeric(continous_var[,i])){
geom_bar(aes(x=cut,y=price),stat = "summary",fun.y="mean",alpha=1,fill="blue")+
xlab("Cut Type")+
ylab("Average Price")+
ggtitle("Cut vs Price")
geom_bar(aes(x=color,y=price),stat = "summary",fun.y="mean",alpha=1,fill="red")+
xlab("Color")+
ylab("Average Price")+
ggtitle("Color vs Price")
geom_bar(aes(x=clarity,y=price),stat = "summary",fun.y="mean",alpha=1,fill="green")+
xlab("Clarity Type")+
ylab("Average Price")+
ggtitle("Clarity vs Price")
geom_bar(aes(x=carat,y=price),stat = "summary",fun.y="mean",alpha=1,fill="orange")+
xlab("Carat")+
ylab("Average Price")+
ggtitle("Carat vs Price")
# Plotting depth vs price
geom_bar(aes(x=depth,y=price),stat = "summary",fun.y="mean",alpha=1,fill="pink")+
xlab("Depth")+
ylab("Average Price")+
ggtitle("Depth vs Price")
geom_bar(aes(x=table,y=price),stat = "summary",fun.y="mean",alpha=1,fill="purple")+
xlab("Table")+
ylab("Average Price")+
ggtitle("Table vs Price")
# Plotting X vs price
geom_bar(aes(x=x,y=price),stat = "summary",fun.y="mean",alpha=1,fill="purple")+
xlab("X")+
ylab("Average Price")+
ggtitle("X vs Price")
# Plotting Y vs price
geom_bar(aes(x=y,y=price),stat = "summary",fun.y="mean",alpha=1,fill="purple")+
xlab("Y")+
ylab("Average Price")+
ggtitle("Y vs Price")
# Plotting Z vs price
xlab("Z")+
ylab("Average Price")+
ggtitle("Z vs Price")
table(data_main$cut)
table(data_main$color)
table(data_main$clarity)
data_main$clarity <-
revalue(data_main$clarity,c("I1"=1,"IF"=2,"SI1"=3,"SI2"=4,"VS1"=5,"VS2"=6,"VVS1"=7, "VVS2"=8))
summary(model1)
plot(model1)
plot(residuals1)
Steps and Output
3. Categorizing variables:
Price 0
Carat 0
Cut 0
Color 0
Clarity 0
X 0
Y 0
Z 0
Depth 0
Table 0
Residuals:
Min 1Q Median 3Q Max
-21248.6 -618.6 -170.8 414.6 8640.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.094e+03 4.412e+02 -4.746 2.08e-06 ***
X 4.998e-03 3.343e-04 14.951 < 2e-16 ***
clarity2 5.245e+03 4.917e+01 106.662 < 2e-16 ***
clarity3 3.551e+03 4.210e+01 84.339 < 2e-16 ***
clarity4 2.624e+03 4.225e+01 62.105 < 2e-16 ***
clarity5 4.470e+03 4.296e+01 104.051 < 2e-16 ***
clarity6 4.163e+03 4.229e+01 98.425 < 2e-16 ***
clarity7 4.930e+03 4.544e+01 108.503 < 2e-16 ***
clarity8 4.860e+03 4.421e+01 109.949 < 2e-16 ***
depth -3.715e+01 5.431e+00 -6.840 8.01e-12 ***
table -2.191e+01 2.931e+00 -7.478 7.69e-14 ***
x -8.806e+02 8.136e+01 -10.824 < 2e-16 ***
y 3.795e+02 8.042e+01 4.718 2.38e-06 ***
z 1.190e+02 6.100e+01 1.951 0.0511 .
carat 9.753e+03 4.245e+01 229.756 < 2e-16 ***
cut2 6.178e+02 3.237e+01 19.088 < 2e-16 ***
cut3 8.710e+02 3.121e+01 27.907 < 2e-16 ***
cut4 8.236e+02 3.018e+01 27.294 < 2e-16 ***
cut5 7.692e+02 3.078e+01 24.991 < 2e-16 ***
color2 -2.097e+02 1.723e+01 -12.171 < 2e-16 ***
color3 -2.837e+02 1.743e+01 -16.280 < 2e-16 ***
color4 -4.820e+02 1.706e+01 -28.247 < 2e-16 ***
color5 -9.537e+02 1.814e+01 -52.565 < 2e-16 ***
color6 -1.427e+03 2.038e+01 -70.010 < 2e-16 ***
color7 -2.289e+03 2.517e+01 -90.963 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1