You are on page 1of 8

 Chapter 1 Introduction

 The prompt “>” indicates that the system is ready to


receive commands.
 we use the function “c” that combines its arguments and
produces a sequence with the arguments as the
components of the sequence.
 We call the components of the input the arguments of the
function. Arguments are separated by commas.
 If we want to create an object for further manipulation
we should save it and give it a name.
“<-” is the assignment operator. s.t.
*OBJECT NAME* <- *VARIABLE/FUNCTION/EXPRESSION*
 R distinguishes between capital and small
letter.(case-sensitive)
 If we want to produce a table that contains a count
of the frequency of the different values in our data
we apply the function “table” to the object .
 We can graphically represent the data by the
function “plot”.
 Chapter 2
Sampling and Data
Structures

 The function “read.csv” takes as an input argument the


address of a CSV file and produces as output a data frame
object with the content of the file.

 Reading Data into R

 By inserting the data into an empty spreadsheet, new CSV


files can be generated. The spreadsheet should be saved in
CSV format by using the "Save by name" dialogue and
selecting CSV from the "Save by Type" selection.
 R should be notified where the file is located to be able to
read it.
 The frequency distribution is the most common way of
summarizing data variability.
 Data frames are the standard tabular format of storing
statistical data.
 The columns of the table are called “variables” and
correspond to measurements.
 When the values of the variable are “numerical” we say that
it is a quantitative variable or a numeric variable.
 if the variable has qualitative or level values we say that it is
a “factor”.
 Chapter 3
Descriptive Statistics
 The histogram is a popular way to visualise the
distribution of continuous numerical data.
 A histogram has the advantage of being able to view large
data sets quickly.
 If the data set contains 100 or more values, a "rule of
thumb" is to use a histogram.
 Quartiles are numbers that separate the data into quarters.
 To find the quartiles, first find the median or second
quartile. The first quartile is the middle value of the lower
half of the data and the third quartile is the middle value
of the upper half of the data.
 The box plot, or box-whisker plot, gives a good graphical
overall impression of the concentration of the data.
 The function “summary”, when applied to a numerical
sequence, produce the minimal and maximal entries, as
well the first, second and third quartiles (the second is the
Median).
 It also computes the average of the numbers (the Mean) .
 The function “length” returns the length of the input
sequence.
 The function “var” computes the sample variance and the
function “sd” computes the standard deviations.
 Chapter 4 Probability

 function “summary” produces the smallest and largest


values in the sequence, the three and the mean.
 If the input of the same function is a factor then the
outcome is the frequency in the data of each of the levels
of the factor.

 Chapter 5 Random Variables

 Discrete Random Variables


o The Binomial Random Variable

 “X ∼ Binomial(n,p)”.
 The probabilities of the various possible values of a
Binomial random variable may be computed with the
aid of the R function “dbinom”.
 The input to this function is a sequence of values, the
value of n (sample size = number of trials) , and the
value of p (probability of success).
 The output is the sequence of probabilities associated
with each of the values in the first input.
 The expression “start.value : end.value” produces a
sequence of numbers that initiate with the number
“start.value” and proceeds in jumps of size one until
reaching the number “end.value”.
 “pbinom” is a function that produces the cumulative
probability of the Binomial.
 cumulative probability is obtained by summing all the
probabilities associated with values that are less than or
equal to the input value.

o The Poisson Random Variable

 “X ∼ Poisson(λ)”
 the number of trials ‘n’ , probability of success in each
trial ‘p’ and expectation is denoted by ‘λ’ .
 The function “dpois” computes the probability.
 The expectation of the distribution is entered in the
second argument.
 “ppois” computes the cumulative probability.

 Continuous Random Variable


o The Uniform Random Variable
 “X ∼ Uniform(a, b)”
 The density function is computed by the function “dunif”
 The points are generated using the "seq" function.
 This function returns a series of values that are evenly
spaced.
 The first argument in the function's input is the starting
value of the series, and the last value is the second
argument in the input.
 The argument "length=1000" specifies the sequence's
length, which in this case is 1,000 values.
 The object “den” is a sequence of length 1,000 that
contains the density of the Uniform(a, b) evaluated over
the values of “x”.
 In order to obtain nicer looking plots we may choose to
connect the points to each other with segments and use
smaller points. This may be achieved by the addition of
the argument “ type="l" ”, with the letter "l" for line .
 the cumulative probability of the Uniform distribution is
presented by the function “ punif ” .
> cdf <- punif(x,a,b)
> plot(x,cdf,type="l")

o The Exponential Random Variable

 The Exponential distribution is frequently used to model


times between events.
 The density of the Exponential distribution can be
computed with the aid of the function “dexp” .
 The cumulative probability can be computed with the
function “pexp” .
 X ∼ Exponential(λ).
 Chapter 6
 The Normal Random Variable

o The Normal Distribution

 X ∼ Normal(µ, σ2)
where µ = E(X) is the expectation of the random variable
and σ2 = Var(X) is its variance
 “dnorm” computes The density of the Normal distribution.
 “pnorm” computes The cumulative probability .

o The Standard Normal Distribution

 Z = (X − µ)=σ , Z ∼ N(0, 1)
 “qnorm” computes The percentiles of the Normal
distribution
The first argument to the function is a probability (or a
sequence of probabilities), the second and third arguments
are the expectation and the standard deviations of the
normal distribution.
If these arguments are not provided the function computes
the percentiles of the standard Normal distribution.
 the Normal approximation is valid when n, the number of
independent trials in the Binomial distribution, is large.
 the Normal approximation for the Binomial(n, p)
distribution may not be so good when the number of trials n
is small.

You might also like