You are on page 1of 22

HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY

FACULTY OF
DEPARTMENT OF

<TÊN SINH VIÊN>

ASSIGNMENT REPORT

ANALYSIS USING R

HO CHI MINH CITY, <2021>


TABLE OF CONTENTS
HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY........................................................1

Activity 1. Chicken Feed Data.................................................................................................4

1.1. Import and Clean Data.................................................................................................4

1.2. Fitting Linear Regression Model.............................................................................8

Activity 2. Concrete Data.......................................................................................................10

2.1. Input and clean data..................................................................................................10

2.2. Fit Linear Regression Model...................................................................................14


TABLE OF FIGURES
Picture 1.1: Output uncleaned data........................................................................................4
Picture 1.2: Output Clean data..................................................................................................5
Picture 1.3: Output histogram of data...................................................................................6
Picture 1.4: Output Boxplot of data........................................................................................7
Picture 1.5: Output Pairs of data..............................................................................................8
Picture 1.6: Output data training.............................................................................................9
Picture 2.1: Output Name unclean column.......................................................................11
Picture 2.2: Output Histogram data.....................................................................................12
Picture 2.3: Output Boxplot of data......................................................................................13
Picture 2.4: Output Pairs of data...........................................................................................14
Picture 2.6: Output Linear Regression................................................................................15
LIST OF TABLES

No table of figures entries found.


LIST OF ACRONYMS
ASSIGNMENT

1
2
3
Activity 1. Chicken Feed Data

1.1. Import and Clean Data

Command

data = read.csv("Data/chicken_feed-1.csv")

print(colnames(data))

head(data)

str(data)

summary(data)

Output:

Picture 1.1: Output uncleaned data


Data need cleaning: All data with N/A and unstructured data.

Command:

4
data = subset(data,select=-c(X))

data = data[complete.cases(data),]

print(colnames(data))

head(data)

str(data)

summary(data)

Output:

Picture 1.2: Output Clean data


Comment: data is now clean, we keep subset data for visualize

Command:

data_no_factor = subset(data,select=-c(feed))

######### Draw table histogram ########

op <- par(mfrow=c(3, 4)) # to put histograms side by side

5
lapply(seq(data_no_factor), function(x)

hist(x=data_no_factor[[x]], xlab=names(data_no_factor)[x],
main=paste("Histogram", names(data_no_factor)[x])))

par(op) # restore

########################################

Output:

Picture 1.3: Output histogram of data

6
Command:

op <- par(mfrow=c(3, 4))

lapply(seq(data), function(x)

boxplot (x=data[[x]], xlab=names(data)[x], main=paste("Boxplot",


names(data)[x])))

par(op)

Output:

Picture 1.4: Output Boxplot of data

7
Command:

pairs(data)

Output:

Picture 1.5: Output Pairs of data


1.2. Fitting Linear Regression Model

Command:

8
LinearRegressions = lm(weight~.,data=data)

summary(LinearRegressions)

Output:

Picture 1.6: Output data training

9
Activity 2. Concrete Data

2.1. Input and clean data

Command:

data = read.csv("Data/caffeine.csv")

print(colnames(data))

str(data)

summary(data)

head(data)

10
Picture 2.7: Output Name unclean column
Data need to clean: Column drink.

Reason: data unanalyzable.

Command:

data = subset(data,select=-c(drink))

11
data_no_factor = subset(data,select=-c(type)) ######### Draw table
histogram ########

op <- par(mfrow=c(3, 4)) # to put histograms side by side

lapply(seq(data_no_factor), function(x)

hist(x=data_no_factor[[x]], xlab=names(data_no_factor)[x],
main=paste("Histogram", names(data_no_factor)[x])))

par(op) # restore

########################################

Output:

Picture 2.8: Output Histogram data


Command:

######### Draw table boxplot ##########

op <- par(mfrow=c(3, 4)) # to put histograms side by side

lapply(seq(data_no_factor), function(x)

boxplot(x=data_no_factor[[x]], xlab=names(data_no_factor)[x],
main=paste("Boxplot", names(data_no_factor)[x])))

par(op) # restore

12
########################################

Output:

Picture 2.9: Output Boxplot of data


Command:

pairs(data)

Output:

13
Picture 2.10: Output Pairs of data
Comment: data doesn’t have bias.

2.2. Fit Linear Regression Model

Command:

flm_model <- lm(Caffeine..mg. ~ . , data = data)

summary(flm_model)

Output:

14
Picture 2.11: Output Linear Regression
REFERENCES
[1]"Using the R programming language in Jupyter Notebook — Anaconda
documentation", Docs.anaconda.com, 2021. [Online]. Available:
https://docs.anaconda.com/anaconda/navigator/tutorials/r-lang/. [Accessed:
04- Dec- 2021].

[2]"Quick-R: Descriptives", Statmethods.net, 2021. [Online]. Available:


https://www.statmethods.net/stats/descriptives.html. [Accessed: 04- Dec-
2021].

[3]H. function and S. Humby, "How to split data into training/testing sets using
sample function", Stack Overflow, 2021. [Online]. Available:

15
https://stackoverflow.com/questions/17200114/how-to-split-data-into-
training-testing-sets-using-sample-function. [Accessed: 04- Dec- 2021].

[4]"Model Selection", R-statistics.co, 2021. [Online]. Available: http://r-


statistics.co/Model-Selection-in-R.html. [Accessed: 04- Dec- 2021].

[5]"An introduction to the Akaike information criterion", Scribbr, 2021.


[Online]. Available: https://www.scribbr.com/statistics/akaike-information-
criterion/. [Accessed: 04- Dec- 2021].

[6]Coursera, 2021. [Online]. Available:


https://www.coursera.org/lecture/modern-regression-analysis-in-r/model-
selection-in-r-Y6jDM. [Accessed: 04- Dec- 2021].

[7]"Linear Model Selection · UC Business Analytics R Programming Guide", Uc-


r.github.io, 2021. [Online]. Available: https://uc-r.github.io/model_selection.
[Accessed: 04- Dec- 2021].

[8]H. package?, "How to plot AIC values when using the leaps package?", Cross
Validated, 2021. [Online]. Available:
https://stats.stackexchange.com/questions/11115/how-to-plot-aic-values-
when-using-the-leaps-package. [Accessed: 04- Dec- 2021].

[9]Youtube.com, 2021. [Online]. Available: https://www.youtube.com/watch?


v=-J4zPwa6ZVw. [Accessed: 04- Dec- 2021].

[10]2021. [Online]. Available: https://data-flair.training/blogs/hypothesis-


testing-in-r/. [Accessed: 04- Dec- 2021].

[11]C. cross-validation, C. Menguy and c. SX, "Choice of K in K-fold cross-


validation", Cross Validated, 2021. [Online]. Available:
https://stats.stackexchange.com/questions/27730/choice-of-k-in-k-fold-
cross-validation. [Accessed: 04- Dec- 2021].

16
[12]H. &quot;R-essentials&quot;? and Z. Eunicien, "How to install R packages
that are not available in "R-essentials"?", Stack Overflow, 2021. [Online].
Available: https://stackoverflow.com/questions/34705917/how-to-install-r-
packages-that-are-not-available-in-r-essentials. [Accessed: 04- Dec- 2021].

[13]C. Notebook, X. Huang, D. C. and S. Yavari, "Cannot install R packages in


Jupyter Notebook", Stack Overflow, 2021. [Online]. Available:
https://stackoverflow.com/questions/42459423/cannot-install-r-packages-
in-jupyter-notebook. [Accessed: 04- Dec- 2021].

[14]R. Science and V. Science, "Check for multicollinearity with the car package
in R", R Functions and Packages for Political Science Analysis, 2021. [Online].
Available: https://rforpoliticalscience.com/2020/08/03/check-for-
multicollinearity-with-the-car-package-in-r/#:~:text=To%20check%20for
%20multicollinearity%20problem,multicollinearity%20in%20the%20overall
%20model. [Accessed: 04- Dec- 2021].

[15]"knnreg function - RDocumentation", Rdocumentation.org, 2021. [Online].


Available: https://www.rdocumentation.org/packages/caret/versions/6.0-
90/topics/knnreg. [Accessed: 04- Dec- 2021].

[16]"UCI Machine Learning Repository: Concrete Compressive Strength Data


Set", Archive.ics.uci.edu, 2021. [Online]. Available:
https://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength.
[Accessed: 04- Dec- 2021].

17

You might also like