You are on page 1of 5

I have changed the highlighted part to get the results form test data

# Visualising the Test set results

library(ggplot2)

ggplot() +

geom_point(aes(x = test_set$YearsExperience, y = test_set$Salary),

colour = 'black') +

geom_line(aes(x = test_set$YearsExperience, y = predict(regressor, newdata = test_set)),

colour = 'blue') +

ggtitle('Salary vs Experience (Test set)') +

xlab('Years of experience') +

ylab('Salary') #salary on y axis is not in normal numbers??


# Simple Linear Regression

# Importing the dataset

dataset = read.csv('Salary_Data.csv')

# Splitting the dataset into the Training set and Test set

# install.packages('caTools')

library(caTools)

set.seed(123)

split = sample.split(dataset$Salary, SplitRatio = 2/3)

training_set = subset(dataset, split == TRUE)

test_set = subset(dataset, split == FALSE)

# Feature Scaling

# training_set = scale(training_set)

# test_set = scale(test_set)

# Fitting Simple Linear Regression to the Training set

regressor = lm(formula = Salary ~ YearsExperience,

data = training_set)

# Predicting the Test set results

y_pred = predict(regressor, newdata = test_set)


# Visualising the Training set results

library(ggplot2)

ggplot() +

geom_point(aes(x = training_set$YearsExperience, y = training_set$Salary),colour = 'red') +

geom_line(aes(x = training_set$YearsExperience, y = predict(regressor, newdata = training_set)),colour


= 'blue') +

ggtitle('Salary vs Experience (Training set)') +

xlab('Years of experience') +

ylab('Salary')

# Visualising the Test set results

library(ggplot2)

ggplot() +

geom_point(aes(x = test_set$YearsExperience, y = test_set$Salary), colour = 'black') +

geom_line(aes(x = test_set$YearsExperience, y = predict(regressor, newdata = test_set)),colour =


'blue') +

ggtitle('Salary vs Experience (Test set)') +

xlab('Years of experience') +

ylab('Salary')
Reasons for using the set.seed function
Ask Question

Many times I have seen the set.seed function in R, before starting the


program. I know it's basically used for the random number generation. Is
there any specific need to set this?

The need is the possible desire for reproducible results, which may for
example come from trying to debug your program, or of course from trying
to redo what it does:

These two results we will "never" reproduce as I just asked for something
"random":

R> sample(LETTERS, 5)
[1] "K" "N" "R" "Z" "G"
R> sample(LETTERS, 5)
[1] "L" "P" "J" "E" "D"
These two, however, are identical because I set the seed:
R> set.seed(42); sample(LETTERS, 5)
[1] "X" "Z" "G" "T" "O"
R> set.seed(42); sample(LETTERS, 5)
[1] "X" "Z" "G" "T" "O"
R>
There is vast literature on all that; Wikipedia is a good start. In essence,
these RNGs are called Pseudo Random Number Generators because they are
in fact fully algorithmic: given the same seed, you get the same sequence.
And that is a feature and not a bug.

You might also like