You are on page 1of 37

Notes of Lab Practices of Statistics

Statistics
Bachelor of Software Engineering
2013-2014

Contents
1 Lab Practice 1.- Introduction to Octave

13

25

4 Lab Practice 4.- Tests on the parameters of a normal distribution

39

5 Lab Practice 5.- Tests on the parameters of two independent normal distributions
47
6 Lab Practice 6.- Tests on population proportions

55

7 Lab Practice 7.- Goodness of t tests, a test for randomness, and the Kolmogorov-Smirnov test for two samples.
61
Bibliography

69

iii

g
n

i
r
e
e
n

CHAPTER 1. LAB PRACTICE 1.- INTRODUCTION TO OCTAVE

Octave is free software, which can be used with Linux, Unix and Microsoft Windows.
Such software can be downloaded in http://www.octave.org, where is possible to obtain

i
g

Other web addresses of interest are the following:

http://www.gnu.org/software/octave

Chapter 1

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

(Ocial page)

Lab Practice 1.- Introduction to Octave

http://www.gnu.org/software/octave/NEWS-3.2.html
http://octave.sourceforge.net/packages.html
http://octave.sourceforge.net/FAQ.html#install
http://octave.sourceforge.net/symbolic/index.html

Content

http://www.network-theory.co.uk/octave/manual

1.1

Aim of the practice . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2

Introduction

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3

1.4

Exercises of Lab Practice 1

. . . . . . . . . . . . . . . . . . . . . . .

10

1.3

Basic commands in Octave

Throughout the lab classes we will use QtOctave, that is, Octave with a graphic interface which
allows users to interact with Octave in a simple way.
You can start QtOctave by a double clicking on QtOctave icon that should be on the

1.1

Aim of the practice

In this practice we are showing basic instructions of Octave software. These orders will be
essential for the purposes of the lab classes.

r
o

1.2

l
e

Introduction

h
c

Octave was originally intended to be companion software for an undergraduate-level textbook

on chemical reactor design, being written by James B. Rawlings of the University of WisconsinMadison and John G. Ekerdt of the University of Texas in 1988. John W. Eaton started
developing such a program in 1992. The rst alpha version was presented in 1993, appearing
the version 1.0 in 1994.

a
B

This brings up the window called QTOctave[Empty] which contains the Octave Terminal,
the Editor, the Commands List, the Variables List and the Navigator windows. Those windows
allow users to enter simple commands (see next gure).
The prompt >> (in the Command line) indicates that Octave is awaiting a command.

To perform a simple computation, type a command and next press the Enter or Return
key. All commands in Octave should be typed in lower case.
To get out of Octave we can use the order Quit in the File window.
Some relevant orders of Octave are the following:
1) Denition of variables:
Variables are dened assigning their values directly. Names of variables are chosen
by the user.

Clearly, Octave is now much more than just another courseware package with utility
beyond the classroom.

Once the variable is dened, we can use it in posterior calculations. Thus >> a = 2
denes the variable a with value 2. The order >> a + 23 will return the value 25

Today Octave is an interactive software for numeric computation, which contains powerful tools for areas like linear algebra, analysis, calculus, optimization, interpolation, linear
programming, statistics, probability, geometry, dierential equations, signal processing, etc.

One can obtain the value of a variable just by typing its name.

A variable name begins with a letter, followed by letters, numbers or underscores.

Blank spaces are not allowed.
Statistics

g
n

i
r
e
e
n

i
g

moves cursor one character to the right side

If you begin to write in a line and then you click on , the last order with the
same beginning appears.
You can recuperate a line in the Command History by a double clicking on it.
To get out of Octave we can use the order Quit in the File window.

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

2) Usual operations: usual operations are performing with the same symbols of usual calculators, thus

a b subtraction,
a b multiplication,

a /b division,
a b

Capital letters are distinguished, thus A and a are not the same variable.

The result of the calculations not assigned to a variable is automatically assigned to

the variable ans (shortening of answer), which can be used as any other variable in
subsequent computations. Thus >> a + 23 will return the value 25 in the variable
ans, however >> b = a + 23 will return the value 25 in the variable b.
There are built-in variables in Octave. A built-in variable that is often useful is pi
which is the number . Moreover inf is . The complex unit is represented by
either of the built-in variables i or j.
To see the variables which have been dened:

l
e

r
o

- the order who shows the names of the variables,

- the order whos displays the names of the variables and their characteristics.

h
c

a
B

where a and b should be numbers.

3) Denition of a matrix: for instance, the command >> A = [1, 1, 2; 3, 2, 4; 13, 12, 1] will
dene a matrix with name A, size 33, which each row will have those numbers separated
by ; the elements of each row determined by ,
Octave will respond by printing the matrix in neatly aligned columns without parenthesis.
The above order will dene the matrix

1 1 2
A = 3 2 4
13 12 1
It is also possible to dene the above matrix in the following way
>> A = [1 1 2; 3 2 4; 13 12 1] that is, the elements of each row are separated by blanks
>> A = [1 1 2
>> 3 2 4

- clear var deletes the variable with name var

- clear deletes all the variables

>> 13 12 1]

Other important characteristics of Octave are the following:

power,

You can suppress an output by adding a semicolon (;) after the statement.
To write comments, use % or # Octave stops to read after those symbols.
To enter dierent statements in one line, use commas (,).
Keys with arrows allow to recuperate previous instructions:
reproduces the line before
Statistics

that is, the elements of each row are separated by blanks, and the dierent rows are
written in dierent lines. Note that the last procedure does not allow to recuperate all
the matrix using the Command History.
4) Generation of a random matrix: the order >> rand(n, m) return a n m-matrix with
random numbers of the interval (0,1).
For instance >> C = rand(2, 3); generates a 2 3-matrix with random numbers of the
interval (0,1) and name C, but without showing in the screen since at the end of the order
we have the symbol ; (semicolon).
Statistics

5) Multiplying a matrix by a scalar: for instance, to obtain 2C the order is >> 2 C

6) Adding matrices: the sum of the matrices A and B is obtained with >> A + B
7) Subtracting matrices: the order >> A B obtains A minus B
8) Multiplying matrices: to multiple A by B we use the order >> A B

g
n

23) Dening matrices by combination of matrices: in some cases it is necessary to join two
matrices A and B in a matrix C. When the sizes of the matrices A and B allow this,
that can be performed in two dierent ways: on the one hand, placing the matrix B to
the right of the matrix A (A and B should have the same number of rows), on the other
hand, placing B below the matrix A (A and B should have the same number of columns).

i
g

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

9) Power of matrices : the order >> A p obtains A raised to the power p (A.A. . . . .A
multiplied p times).
10) Transpose of a matrix: the transpose of the matrix A (At ) can be obtained with the order
>> A.  (note the point after A) where the symbol  is in the same key of the question
mark (note that .  is not the exclamation mark, but two symbols, . and  ). It is also
possible to obtain the transpose with >> transpose(A)
11) Conjugate transpose of a matrix: the conjugate transpose of A is obtained with >> A

12) Determinant of a matrix: the determinant of A is computed with the order >> det(A)

i
r
e
e
n

CHAPTER 1. LAB PRACTICE 1.- INTRODUCTION TO OCTAVE

In the rst case we use the order >> C = [A, B], which denes the matrix C = (A B),
for the second case the order is >> C = [A; B] generating the matrix
 
A
.
C=
B

It is necessary that the sizes of matrices A and B permit such operations in each case.

24) Obtaining an element of a matrix: >> A(i, j) displays the element of A in row i and
column j.

25) Obtaining columns of a matrix: with >> A(:, j) we obtain the submatrix of the matrix
A with all rows of A and the column j of A, that is, we obtain column j of A.

13) Rank of a matrix: the rank of matrix A is calculated by means of >> rank(A)
14) Trace of a matrix: the trace of matrix A can be calculated with >> trace(A)

26) Obtaining rows of a matrix: the order >> A(i, :) obtains the submatrix of A with the
row i and all the columns, that is, row i de A.

15) Eigenvalues of a matrix: the eigenvalues of a matrix A are computed with the order
>> eig(A) In some version of Octave such an order also obtains the eigenvectors.

27) Obtaining submatrices: there are dierent ways to obtain submatrices:

16) Eigenvalues and eigenvectors of a matrix: The eigenvalues and eigenvectors of a matrix
A can be calculated with >> [P, D] = eig(A) which provides a squared matrix named P
of the same size as A, whose rows contains the the eigenvectors, and a diagonal matrix
named D with the eigenvalues of A in the diagonal (variables names P and D can be
changed).

>> A(u, v) denes a submatrix of A with the rows indicated in the vector u and the
columns which appear in vector v. For instance, >> A([i, r], [j, k, l]) displays the
submatrix of A given by rows i and r, and columns j, k and l,

17) Inverse of a matrix: the order >> inv(A) computes the inverse of the square matrix A.

with >> A(i : r, j) we obtain the submatrix of matrix A given by the part of the
column j between rows i and r, both included,

18) Size of a matrix: >> size(A) obtains the size of the matrix A.

>> A(:, v) provides the submatrix of matrix A given for all the rows and those
columns which appear in vector v. For instance A(:, [1, 3, 5]) gives the submatrix of
A with all its rows and columns 1, 3 and 5,

l
e

r
o

19) Length of a matrix: >> length(A) calculates the length of a matrix. The length is
the number of rows or columns, whichever is greater. Thus, if the size of A is m n,
>> length(A) obtains the maximum value between m and n. This order will be very
useful to obtain the number of elements in a vector.

h
c

a
B

20) Number of rows of a matrix >> rows(A) returns the number of rows of matrix A.
21) Number of columns of a matrix: >> columns(A) provides the number of columns of
matrix A.
22) Solving systems of linear equations: suppose we want to solve the system of linear equations Ax = b. The order >> A\b obtains the solution of the system by calculating A1 b
(when such a value exists, otherwise Octave will notify that the solution does not exist).
Statistics

with >> A(i, j : k) we obtain the submatrix of matrix A given by the part of the
row i between columns j and k, both included,

by means of >> A(u, :) we calculate the submatrix of A given by all the columns
and those rows which appear in vector u. Thus A([1, 4], :) provides the submatrix of
A with all its columns and rows 1 and 4,
the order >> A(i : j, :) denes the submatrix of A given by all its columns and those
rows between the ith and the j th , both included,
in a similar way >> A(:, i : j) creates the submatrix of A given by all its rows and
columns between the ith and the j th , both included,
with >> A(i : j, k : l) we calculate the submatrix of A given by rows between the
ith and the j th , both included, and columns between k th and lth , both included,
Statistics

CHAPTER 1. LAB PRACTICE 1.- INTRODUCTION TO OCTAVE

the orders >> vec(A) or >> A(:) returns all elements of A in a column vector, after
stacking the columns of A,
we should note that obtaining a column or a row of a matrix is also generating a
submatrix.
28) Replacing elements of a matrix: >> A(i, j) = replaces the element in row i and column
j of A by the value .
For instance, if A is a matrix of size 5 6 and V is a vector of size 5 1, the order
A(:, 2) = V replaces column 2 of A with vector V .
29) Other transformations of matrices: suppose that we have a matrix A, sometimes it is
necessary to dene a new matrix B of the same size of A, such that the element Bi,j must
be equal to 1 if the element Ai,j satises certain inequality, and Bi,j must be equal to 0
in any other case.
For instance , if A is a matrix, the order >> M = (A <= 3) denes a matrix named M ,
of the same size of A, such that the element (i, j) of M will be 1 if the element (i, j) of
A is less than or equal to 3, and 0 in any other cases,
30) Element-by-element calculations with matrices: if A and B are matrices of the same size,
it is possible to operate element-by-element with such matrices.

Let us clarify these operations with an example. Suppose that we have two row matrices
of size 1 n, A = (a1 , a2 , , an ) and B = (b1 , b2 , , bn ), and let c be a real number,
(a1 + c, a2 + c, , an + c)
>> A c or >> c A displays

r
o

(a1 c, a2 c, , an c)

l
e

>> A + B calculates

(a1 + b1 , a2 + b2 , , an + bn )

h
c

i
g

with >> A. B (note the point after matrix A) we obtain

(ab11 , ab22 , , abnn )

there are built-in functions in Octave which can be used with matrices in an elementby-element way. Some examples are: sine (sin), cosine (cos), logarithm (log), etc.
Thus >> log(A) obtains
(log a1 , loga2 , , logan )

31) Built-in matrices: Octave has some built-in matrices. Thus,

>> ones(m, n) denes a m n matrix whose element are ones,
>> ones(n) creates a n n matrix of ones,
>> zeros(m, n) generates a m n matrix of zeros,
>> zeros(n) returns a n n matrix of zeros,
>> eye(m, n) returns a m n matrix with ones on the principal diagonal and zeros
elsewhere

>> eye(n) denes the n n identity matrix

>> diag(v), where v is a vector, returns a diagonal matrix with v on the diagonal

32) Requesting help: One of the nice features of QtOctave is its help system. Thus >>
help order shows information on the order order For instance, >> help inv provides
information on the order inv which computes the inverse of a matrix.
33) General help: QtOctave facilitates a manual on Octave by only clicking in the question
mark under the command View. The resulting window presents a Content Table in which
it is possible to obtain information.
34) Handling of les: We will use two dierent kinds of les, namely, les of variables (data)
and les of orders (programs). For such a purpose we will use the windows Navigator and
Editor respectively (see rst gure).

(a1 b1 , a2 b2 , , an bn )

a
B

(ca1 , ca2 , , can )

if A ia a matrix >> diag(A) returns a vector with the elements of the diagonal of
A.

>> A. B (note the point after matrix A) obtains

the order >> A./B (note the point after matrix A) computes
(

i
r
e
e
n

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

>> A + c obtains

g
n

CHAPTER 1. LAB PRACTICE 1.- INTRODUCTION TO OCTAVE

In order to select the folder in which to save (or to load) a le of variables, we specify
the path to the le with Navigator (similar to a File Open window). Once we are in the
corresponding folder (you can be sure that you are in that folder with Go), we use the
orders:

a1 a2
an
, , ,
)
b1 b2
bn

>> A. c (note the point after matrix A) provides the vector

>> load mane to load the variables which are in the le name (we can check that
such variables are loaded with the order >> who which shows the current variables)

Statistics

Statistics

g
n

i
r
e
e
n

1.4

10

Exercises of Lab Practice 1

Solve the following problems using the Octave commands previously described (the solutions
are shown after the exercises).

i
g

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

1 3 0 2 3
5
5 7 2 4
1 3

2
4
6
2
4
4
A=

2 3 3 1 7 3
4 2 7 6 7 2

without displaying it in the screen.

>> A = [1, 3, 0, 2, 3, 5; 5, 7, 2, 4, 1, 3; 2, 4, 6, 2, 4, 4; 2, 3, 3, 1, 7, 3; 4, 2, 7, 6, 7, 2];

>> save mane to save the current variables in a le with name name

Exercise 1.2. Show the matrix A.

Programs are written in the Editor and they are saved and loaded in the usual way with
the corresponding icons in the top part of the Editor.

>> A

Exercise 1.3. Obtain a matrix of size 3 4 with random numbers of the interval (0, 1). Give to this
matrix the name B.
>> B = rand(3, 4)

Exercise 1.4. By means of B obtain a matrix of the same size of B with random numbers of the
interval (0, 4).

h
c

l
e

a
B

r
o

>> C = 4 B

>> C.

or

>> transpose(C)

>> C B.

Exercise 1.7. Dene the variable

35) List of les: To obtain the list of les of the folder in use (make sure with Navigator and
Go that your are in such a folder) we use the order >> dir which shows the list of les.
36) Re-establishing the terminal: to re-establish the terminal use the order >> quit or >>
exit.
37) Clearing the terminal: in the window of View we can nd the option Clear T erminal,
which deletes the terminal. Note that no information is lost.
Statistics

1
b = 2
1

>> b = [1; 2; 1]
Exercise 1.8. Calculate the determinant, rank and trace of the matrix CB t .
>> det(C B.  )
Statistics

11

>> rank(C B.  )

g
n

Exercise 1.9. Obtain the eigenvalues and eigenvectors of the matrix CB t .

>> [P, D] = eig(C B.  )

i
g

Exercise 1.20. Obtain all the elements of the matrix A in a column vector.
>> vec(A) or >> A(:)

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

Exercise 1.21. Given the matrix

>> inv(C B.  )

Exercise 1.11. Obtain a matrix of size 4 4 with name D, taking from A its rst four rows and
columns.
>> D = A(1 : 4, 1 : 4) or >> D = A([1, 2, 3, 4], [1, 2, 3, 4])
Exercise 1.12. Obtain the size of A

generate the matrix

1 3 2 2 4 6

F = 5 3 6 2 4 4
1 3 5 8 4 0
 
A
F

>> F = [1, 3, 2, 2, 4, 6; 5, 3, 6, 2, 4, 4; 1, 3, 5, 8, 4, 0]

>> size(A)

>> [A; F ]

Exercise 1.13. Dene a matrix E by means of A which should contain the columns and rows of A
with even position.
>> E = A([2, 4], [2, 4, 6])

Exercise 1.22. Let

Exercise 1.14. Replace the element in position (2, 2) of the E by the value 11.
>> E(2, 2) = 11

1 3
2
2 4 6

6
G=
5 3

2 4 4
1 3 5

Obtain the matrix (A, G).

Exercise 1.15. By means of the matrix E, dene a matrix equal to the second column of E, and a
matrix equal to the rst row.

r
o

>> E(:, 2)

l
e

>> E(1, :)

h
c

>> A(2, 3 : 6)

a
B

>> G = [1, 3, 2; 2, 4, 6; 5, 3, 6; 2, 4, 4; 1, 3, 5]
>> [A, G]

Exercise 1.23. Dene a matrix of the same dimension of G, such that in a position takes the value 1
if in such a position G has a value lower than 3, and 0 in any other case. Posteriorly give it the name
M.

Exercise 1.16. Obtain the submatrix of A with the part of the second row between columns 3 and
6, both included.

>> (G < 3)
>> M = ans

Exercise 1.17. Obtain the submatrix of A with the part of the column 2 between rows 1 and 3, both
included.

Exercise 1.24. Multiply element-by-element the matrix E by itself.

>> E. E or >> E. 2
Exercise 1.25. Construct a matrix H of size 5 3 with random numbers in the interval (0, 1). Divide
element-by-element G by H.

Exercise 1.18. Obtain the submatrix of A with all the rows, and columns 1, 3 and 5.
>> A(:, [1, 3, 5])

12

Exercise 1.19. Obtain the submatrix of A with all its columns, and rows 2 and 4.

>> trace(C B.  )

>> A(1 : 3, 2)

i
r
e
e
n

CHAPTER 1. LAB PRACTICE 1.- INTRODUCTION TO OCTAVE

>> H = rand(5, 3)
>> G./H

Statistics

Statistics

g
n

i
r
e
e
n

14

CHAPTER 2. LAB PRACTICE 2.- DESCRIPTIVE STATISTICS WITH OCTAVE

For instance, matrix >> A = [1, 20; 2, 23; 1, 19; 2, 21] would contain four observations (four experimental units) of two characteristics. Such observations are (1, 20), (2, 23), (1, 19) and (2, 21).
Thus, if we study only one characteristic, our sample must be stored in a column vector.

i
g

Chapter 2

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

with Octave
Content

2.1

2.1

2.2

13

2.3

Exercises of Lab Practice 2

19

. . . . . . . . . . . . . . . . . . . . . . .

13

r
o

In this practice we show basic commands of Octave in order to apply the descriptive statistics explained
at expositive classes to real-life problems.

l
e

Namely, we will show how to obtain in a quick and easy way:

i) frequency tables,

h
c

iii) position measures,

iv) dispersion measures,
v) graphical representations, etc.

2.2

2) Calculating the absolute frequencies: the order

>> table(x)
obtains the absolute frequencies of the dierent values of vector x, when these are considered
in increasing sense.
3) Calculating the relative frequencies: if the absolute frequencies are obtained with the order
>> table(x) and n is the sample size,
>> table(x)/n
will compute the relative frequencies of the dierent values of the vector x when these are in
increasing sense.
In a general way, if x is the vector with the sample, the order
>> table(x)/length(x)
obtains the relative frequencies of the dierent values of vector x when these are in increasing
order.

4) Calculating the accumulated absolute frequencies: these can be obtained with the order
>> cumsum(table(x))
That displays the accumulated absolute frequencies of the dierent values of vector x when these
are in increasing order.

1) Calculating the dierent values: the order

>> values(x)
obtains the dierent values which are in vector x. They are displayed from the lowest dierent
value to the greatest dierent value.

a
B

In this section we show Octave commands in relation to Descriptive Statistics.

We must clarify that Octave commands assume that our data are in a matrix where each column
is a characteristic, and each row contains all the information of exactly one experimental unit.

13

5) Calculating the accumulated relative frequencies: by means of the previous order, accumulated
relative frequencies can be computed with
>> cumsum(table(x))/n
where n is the sample size.
In a general way, if vector x contains the data, the order
>> cumsum(table(x))/length(x)
computes the accumulated relative frequencies of the dierent values of vector x when these are
in increasing order.

6) Calculating the mean value: the order

>> mean(x)
obtains the mean of the data in vector x.
If x is a matrix, the order >> mean(x) returns a row vector with the mean of each column
(mean of each characteristic).
7) Calculating the median: The order
>> median(x)
calculates the median of the data in vector x.
Statistics

15

CHAPTER 2. LAB PRACTICE 2.- DESCRIPTIVE STATISTICS WITH OCTAVE

If x is a matrix, the order >> median(x) returns a row vector with the median of each column
(median of each characteristic).
8) Calculating the mode: the order
>> mode(x)
obtains the mode of the data in vector x.

9) Calculating the mean of the square of the values: the order

>> meansq(x)
obtains the mean of the square of the values which are in vector x.

10) Calculating the quasivariances and variances: the orders we will use are
>> var(x) or >> var(x, opt)

i=1

r
o

l
e

If opt is 0,
>> var(x, 0)
computes the quasivariance of the elements the vector x, that is, >> var(x, 0) and >> var(x)

h
c

If x is a matrix, the order >> var(x, 0) returns a row vector containing the quasivariance of
each column (of each characteristic).

2
=
SX

If opt is 1,
>> std(x, 1)
computes the standard deviation of the elements of vector x, that is,

n
1
SX = 
(xi X)2 ni .
n
i=1

If x is a matrix, the order >> std(x, 1) returns a row vector containing the standard deviation
of each column (of each characteristic).

12) Calculating the range: if x is a vector, the order

>> range(x)
returns the range of the values in x, that is, the dierence between the maximum and the
minimum of the input data in vector x.
If x is a matrix, >> range(x) computes the range in each column of x.

If opt is 1,
>> var(x, 1)
computes the variance of the elements of the vector x, that is,

a
B

In relation to the order

>> std(x, opt)
the optional argument opt determines the type of normalization to use.

If x is a matrix the order >> std(x, 0) returns a row vector containing the quasistandard
deviations of each column (of each characteristic).

If x is the matrix the sample, the above order returns a row vector containing the quasivariance
of each column (of each characteristic).
In relation to the order
>> var(x, opt)
the optional argument opt determines the type of normalization.

i=1

If opt is 0,
>> std(x, 0)
computes the quasistandard deviation of the elements vector x, that is, >> std(x, 0) and >>
std(x) lead to the same result, this being

If x is the vector with the sample,

>> var(x)
computes the quasivariance of the elements of such a vector, that is,
n

i
g

If x is a vector,
>> std(x)
computes the quasistandard deviation of the elements of such a vector, that is,

n
1

SX = 
(xi X)2 ni .
n1

If x is a matrix, the order returns a row vector containing the quasistandard deviation of each
column (of each characteristic).

If x is a matrix, the order >> meansq(x) returns a row vector containing the mean square of
each column (of each characteristic).

1
2
=
(xi X)2 ni .
S
X
n1

i
r
e
e
n

16

11) Calculating the quasistandard deviations and standard deviations: the orders we will use are the
following
>> std(x) or >> std(x, opt)

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

If x is a matrix, the mode is calculated in each column (in each characteristic) with >> mode(x)
.
If there are two modes or more, the smallest one is returned.

g
n

13) Calculating the interquartile range : if x ia a vector, the order

>> iqr(x)
returns the interquartile range, i.e., the dierence between the third and the rst quartile of the
input data (Q3 Q1 ).

1
(xi X)2 ni .
n
i=1

If x is a matrix, >> iqr(x) computes the interquartile range in each column of x.

If x is a matrix, >> var(x, 1) returns a row vector containing the variance of each column (of
each characteristic).
Statistics

>> statistics(x)
Statistics

17

CHAPTER 2. LAB PRACTICE 2.- DESCRIPTIVE STATISTICS WITH OCTAVE

returns a matrix with the minimum, rst quartile, median, third quartile, maximum, mean,
quasistandard deviation, skewness and kurtosis (nine dierent values) of the values in x.
If x is a matrix, >> statistics(x) computes the above values for each column (for each characteristic).
The result will be a matrix with 9 rows (one row for each of the above computed values) and
so many columns as columns in the data matrix (if x is a vector we obtain a 9 1 matrix).
15) Sum of the data of a sample: by means of the order
>> sum(x)
we obtain the sum of the values in vector x.

17) Denition of categories: in many real-life problems it is necessary to divide quantitative data
in dierent intervals, dening in this way dierent categories.
If x is a vector containing a sample, the order
>> cut(x, n)
where n {1, 2, . . . }, returns a vector of the same size as x saying which group each point in x
belongs to. Groups are labelled from 1 to the number of groups (n).

Group 1 corresponds to the lowest values of the sample, group 2 contains the following lowest
values, and so on. The last group will contain the greatest values. For instance, if >> cut(x, n)
returns the value 2 in position 8, that means that the value of x which is in position 8 belongs
to the second group.

In Lab practice 1, we have seen dierent matrices manipulation instructions. Other interesting
orders are the following:

r
o

18) Ordering elements of a matrix: it is possible to order the elements of a matrix in accordance
with dierent criteria. Thus

a
B

i
g

then >> sortrows(A, 3) returns the matrix

16) Empirical distribution function: the empirical distribution function of a sample can be evaluated
with the order
>> empirical cdf (y, X)
where X is the data vector and y is the point (or set of points in a vector) in which we want to
evaluate the empirical distribution function.

l
e

h
c

i
r
e
e
n

18

for our purposes the following instruction is very interesting. If A is a matrix

>> sortrows(A, c)
returns a matrix of the same size of A, where the values of the column c have been sorted
in increasing order, the rest of rows being ordered in the same way the rows of the column
c were ordered.
For instance, if

1 3 2

A = 3 2 1
2 2 0

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

If x is a matrix, >> sum(x) obtains the sum of the values of each column of x.

>> sort(A) returns a copy of the matrix A

in increasing order.
For instance, if

1
A = 3
2

g
n

with the elements of each column arranged

3 2
2 1
2 0

2 2 0
3 2 1
1 3 2

Note that the order >> sortrows does not break the rule columns=characteristics.
If we want the elements of the column c ordered in decreasing sense, we should write in
the above order c instead of c.
For instance, if

1 3 2
A = 3 2 1
2 2 0
then >> sortrows(A, 1) returns the matrix

3 2 1
2 2 0
1 3 2

19) Graphical representations: dierent graphical representations for quantitative and qualitative
data can be depicted with Octave:
the empirical distribution function can be drawn in the following way. The order >>
empirical cdf (y, X) gives the value of the empirical distribution function of the sample in
vector X at the point y. Thus we can draw such a graphic as follows. For instance, if the
data are included in the interval [5, 9], we can draw the picture from the point -6 to the
point 10 with a jump between points of 0.1. Such a graphic is generated with the orders
>> y = 6:0.1:10
>> plot(y, empirical cdf (y, X)),
the order
>> hist(x)
plots an histogram for the quantitative data of vector x. Octave will draw 10 dierent
boxes. The number of boxes can be modied with the order
>> hist(x, )
where is the number of boxes to include in the picture,

then >> sort(A) returns the matrix

1 2 0
2 2 1
3 3 2

Note that the order >> sort breaks the key structure rows=experimental units.
From a statistical point of view this instruction is, let us say, dangerous.
Statistics

if vector x contains a sample, a pie chart can be drawn with the order
>> pie(table(x)),
Statistics

19

CHAPTER 2. LAB PRACTICE 2.- DESCRIPTIVE STATISTICS WITH OCTAVE

i
r
e
e
n

20

the order
>> stem(values(x), table(x))
draws a line chart for the quantitative data in vector x, that is, for each dierent value in
x, a height equal to its absolute frequency is drawn,

1800; 1, 61, 10020, 2, 1400; 1, 52, 12574, 1, 1800; 1, 38, 12490, 3, 1400; 0, 57, 15265, 4, 1600; 1, 43, 12432, 1,
1900; 0, 49, 20780, 3, 2100; 0, 19, 18052, 4, 1800]

moreover, the order

>> plot
erases all previous pictures. If we want to overlap pictures, we have to use the order
>> hold on Thus all the pictures will be drawn in the same gure. To deactivate such a
order it is sucient to consider >> hold off The order >> ishold generates the value 1 if
hold on is activated, and 0 in any other cases,

The matrix can be found in the le DatPrac2Ejer1 which can be loaded from Campus Virtual.
Matrix A contains the above samples.

to conclude, the order

>> clf ()
rubs all the gures.

2.3

g
n

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

Exercise 2.3. Construct a new matrix with the women data.

For such a purpose we cut the part of matrix B with the women data. We will use the orders
explained in Lab Practice 1. Thus >> C = B(1 : 9, :) cuts the rst 9 rows of B, exactly where the
women data are.

An analysis about the drivers of a city leads to the following information:

age
34
26
39
61
52
38
57
43
49
36
61
52
38
57
43
49
19

i
g

Exercise 2.2. Using the above matrix, construct another matrix containing in the rst rows the data
of women, and in the last rows the men data.

We have consider the codication male=0, female=1. Tus we can order the matrix in accordance
with the values of the gender column, when such values are sorted in decreasing order, that is, >>
B = sortrows(A, 1)

Exercises of Lab Practice 2

gender
of drivers
Male
Female
Male
Female
Female
Female
Male
Female
Male
Male
Female
Female
Female
Male
Female
Male
Male

Such a matrix can be dened in any of the ways explained in Lab Practice 1.

km. covered
in the last year
15231
12231
14150
9020
11574
12390
19265
12132
20780
18052
10020
12574
12490
15265
12432
20780
18052

make

r
o

l
e

h
c

a
B

seat
seat
renault
peugeot
seat
renault
peugeot
opel
renault
seat
peugeot
opel
renault
seat
opel
renault
seat

cubic capacity
of the car
1900
1600
1800
1600
1600
1400
1600
1600
2100
1800
1400
1800
1400
1600
1900
2100
1800

Exercise 2.4. Analyze cubic capacities of the cars driven by women. Describe the cubic capacities
of cars driven by men.

Fort such a purpose we can consider only the column of the cubic capacities of the matrix dened
in the above question. Thus
>> D = C(:, 5)
generates matrix D which contains the cubic capacities of the cars driven by women

Now we can obtain the dierent values of the cubic capacities of cars driven by the women with
>> values(D)
and the absolute frequencies with
>> table(D)
We should note that
>> table(D)/length(D)
provides the relative frequencies. The accumulated absolute frequencies are computed with the order
>> cumsum(table(D))
By means of the instruction
>> cumsum(table(D))/length(D)
the accumulated relative frequencies are displayed.

Exercise 2.1. Introduce the above data in a matrix for a posterior descriptive analysis.
For the qualitative data (gender and make of car) we consider codications, for instance male=0,
female=1, and in the cases of car makes, for instance opel=1, peugeot=2, renault=3 and seat=4.
The data matrix, let us call it A, can be entered as follows
>> A = [0, 34, 15231, 4, 1900; 1, 26, 12231, 4, 1600; 0, 39, 14150, 3, 1800; 1, 61, 9020, 2, 1600; 1, 52, 11574, 4,
1600; 1, 38, 12390, 3, 1400; 0, 57, 19265, 2, 1600; 1, 43, 12132, 1, 1600; 0, 49, 20780, 3, 2100; 0, 36, 18052, 4,
Statistics

The mean cubic capacity is obtained with

>> mean(D)
the median with
>> median(D)
and in general with
>> statistics(D)
basic descriptive statistics are calculated.
The second part of the problem can be solved in the same way by considering the men data.
Statistics

21

CHAPTER 2. LAB PRACTICE 2.- DESCRIPTIVE STATISTICS WITH OCTAVE

Exercise 2.5. By means of the matrix dened in the rst problem, construct a matrix in which the
age appears in moths.
The matrix with the original data was named A. We can cut the second column, multiply it by
12, and paste it again. That can be performed with
>> E = 12 A(:, 2)
>> H = A
>> H(:, 2) = E

We can use the order

>> sum(A)
taking the second value of the output (the ages are in the second column).
For the second and the third question it is sucient with
>> mean(A)
and
>> meansq(A)
taking the second value of the output.

r
o

l
e

h
c

We can use the order

>> median(A)
taking the second value of the generated row vector.

a
B

We should take the submatrix of A given by the columns 2, 3 and 5 (and all the rows). The order
>> J = A(:, [2, 3, 5])
generates the result.

The matrix with the women data is C. We take the vector of cubic capacities with
>> K = C(:, 5)

The matrix with the women data was called C. We can consider the orders
>> std(C, 1)
and
>> var(C, 1)
taking the second value of the output. The case of male drivers can be solved in a similar way.

i
g

Exercise 2.11. Construct a matrix with the quantitative data.

Exercise 2.12. Draw the empirical distribution function of the cubic capacity of cars driven by
women, and do the same with cars driven by men, both representations in the same gure. In
accordance with such a graphic, is it possible to obtain a conclusion?

Exercise 2.7. Obtain the standard deviation and the variance of the women ages. Repeat the same
question with the men.

It is sucient to consider
>> iqr(A)
taking the second value of the generated row vector.

i
r
e
e
n

22

Now the three categories are generated with

>> cut(I, 3)
Values which are assigned number 1 correspond to the category low number of km., if number is 2,
the category is a medium number of km., and the number 3 is for the category large number of
km..

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

Exercise 2.6. Obtain the sum of the ages of the drivers. Obtain the mean age and the mean square
of the ages.

g
n

CHAPTER 2. LAB PRACTICE 2.- DESCRIPTIVE STATISTICS WITH OCTAVE

We dene the matrix with the men data, for instance with
>> H = sortrows(A, 1)
>> I = H(1 : 8, :)
>> J = I(:, 5)
Vector J contains the cubic capacities of the cars driven by men.

Cubic capacities are between 1400 and 2100. We can consider jumps of 1 unit for the representation. We draw graphics in the same gure with
>> y = 1300 : 1 : 2200
>> plot(y, empirical cdf (y, K))
>> hold on
>> plot(y, empirical cdf (y, J), r )

Note that in the last order we have added r to draw such a graphic in red color. Other initials
can be used as y (yellow), g (green), etc.

Exercise 2.10. Classify the drivers in accordance with number of the cover kilometers in the last
year. Distinguish among a low number of km., a medium number of km. and a large number of
km..
We should generate three dierent categories with the number of km. in the last year. First we
consider the vector number of kilometers in the last year with the order
>> I = A(:, 3)

Statistics

The empirical distribution function of a sample in a point a R gives the proportion of data
which are lower than or equal to (not greater than) the point a.
Statistics

23

CHAPTER 2. LAB PRACTICE 2.- DESCRIPTIVE STATISTICS WITH OCTAVE

g
n

In accordance with the gure, whatever the value of a is, the proportion of cars driven by women
with cubic capacity lower than or equal to a is greater than the proportion of cars driven by men with
cubic capacity lower than or equal to a.
As a consequence we can conclude that in the sample data, the cubic capacities of cars driven by men
are greater than those driven by women.
It is interesting to point out that this armation is only valid for that sample data. In order to
obtain a general conclusion by means of the sample information, we should apply statistical inference
techniques.

i
g

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

Exercise 2.13. How would you obtain the accumulated relative frequencies of the dierent cubic
capacities of the sample?, and the accumulated absolute frequencies?

i
r
e
e
n

24

CHAPTER 2. LAB PRACTICE 2.- DESCRIPTIVE STATISTICS WITH OCTAVE

Accumulated relative frequencies of the dierent values can be obtained for instance with the value
of the empirical distribution function in such values.

The vector of cubic capacities can be obtained with

>> K = A(:, 5)
The dierent values of such a vector with
>> L = values(K)
Now we can evaluate the empirical distribution function of the sample stored in K in the values of the
vector L by means of
>> M = empirical cdf (L, K)
The values generated by the last order are the accumulated relative frequencies. To obtain the accumulated absolute frequencies, we can multiply the accumulated relative frequencies by the number of
data, that is,
>> N = length(K) M
Another possibility is the following. First we obtain the accumulated absolute frequencies with
>> K = A(:, 5)
which is vector of cubic capacities, and
>> M = cumsum(table(K))
which obtains the accumulated absolute frequencies. The accumulated relative frequencies are obtained
by dividing the accumulated absolute frequencies by the number of data,
>> N = M/length(K)

r
o

l
e

Exercise 2.14. Draw a pie chart for the makes of the cars.

h
c





















Exercise 2.16. If x is a column vector with quantitative data, how would you construct a frequency
table (rst column with the dierent values, second column with absolute frequencies, third column
with relative frequencies, fourth column with accumulated absolute frequencies and last column with
accumulated relative frequencies).
The above matrix could be obtained with
>> [values(x), transpose(table(x)), transpose(table(x)/length(x)),
transpose(cumsum(table(x))), transpose(cumsum(table(x))/length(x))]
which gives the table of frequencies of the sample in vector x.

The makes are at the fourth column of matrix A. Thus with the order
>> pie(table(A(:, 4)))
we obtain the requested pie chart.
In this case the representation is given in the following gure

a
B

Exercise 2.15. Draw a histogram with ve boxes for the ages of the drivers.
The ages appear in the second column of the matrix A. Thus with the order
>> hist(table(A(:, 2)), 5)
we generate the histogram, obtaining the following representation

Statistics

Statistics

g
n

i
r
e
e
n

CHAPTER 3. LAB PRACTICE 3.- PROBABILITY WITH OCTAVE

26

The following table shows the orders to simulate data from the distributions which appear on
the left hand side. Note that all orders nish with the ending rnd:
Binomial
Chi2

discrete rnd

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S
Exponential

F (Snedecor)
Geometric

Octave
Content

i
g

chi2rnd

Discrete

Chapter 3

binornd

3.2

25

3.3

Exercises of Lab Practice 3

29

. . . . . . . . . . . . . . . . . . . . . . .

25

f rnd

geornd

Normal

normrnd

Poisson

poissrnd

t (Student)

3.1

exprnd

trnd

Uniform

unif rnd

Weibull

wblrnd

Such orders are used in the following way:

>> binornd(n, p, r, c), or >> binornd(n, p, s), returns a r c matrix, or a s s matrix, of
random numbers drawn from the binomial distribution with parameters n and p.
>> chi2rnd(n, r, c), or >> chi2rnd(n, s), returns a r c matrix, or a s s matrix, of
random numbers drawn from the chi2 distribution with n degrees of freedom (2n ).

3.1

r
o

- simulate data from some probability distributions,

- evaluate densities functions (continuos distributions) and probability mass functions (discrete
distributions),

l
e

h
c

3.2

a
B

Basic commands for probability theory with Octave

The main orders for the above purposes will be described in this section.
1) Simulating data from probability distributions: let X be a random variable which follows any
of the probability distributions studied in the expositive classes (binomial, geometric, Poisson,
etc.). We want to simulate values drawn from such a random variable.

25

>> discrete rnd(n, v, p) (or >> discrete rnd(v, p, r, c), or >> discrete rnd(v, p, s) ), returns a row vector of size n with random numbers from the discrete distribution which
can take the values which appear in vector v whose probabilities are proportional to the
values (non-negative) which appear in vector p (if the sum of the values of the components
of p is not equal to 1, such values are divided by the total sum to obtain the probabilities
of the elements of vector v). The role of r, c and s in the other orders is the same as in
the previous cases.

>> exprnd(, r, c), or >> exprnd(, s), returns a r c matrix, or a s s matrix, of random
numbers from the exponential distribution whose mean is , that is, for the exponential
distribution with parameter 1/ (be careful with this).
>> f rnd(m, n, r, c), or >> f rnd(m, n, s), returns a r c matrix, or a s s matrix, of
random numbers from a F distribution with m and n degrees of freedom.

>> geornd(p, r, c), or >> geornd(p, s), returns a r c matrix, or a s s matrix, of random
numbers from a geometric distribution with parameter p.
>> normrnd(m, d, r, c), or >> normrnd(m, d, s), returns a r c matrix, or a s s matrix,
of random numbers from a normal distribution with mean m and standard deviation d.
>> poissrnd(lambda, r, c) or >> poissrnd(lambda, s) returns a r c matrix, or a s s
matrix, of random numbers from a Poisson distribution with mean lambda, that is, from
a Poisson distribution with parameter lambda.
>> trnd(n, r, c), or >> trnd(n, s), returns a r c matrix, or a s s matrix, of random
numbers from a t (Student) distribution with n degrees of freedom.
Statistics

27

CHAPTER 3. LAB PRACTICE 3.- PROBABILITY WITH OCTAVE

>> unif rnd(a, b, r, c), or >> unif rnd(a, b, s), returns a r c matrix, or a s s matrix, of
random numbers from a uniform distribution on the interval (a, b).

Poisson

>> wblrnd(a, b, r, c), or >> wblrnd(a, b, s), returns a r c matrix, or a s s matrix, of

random numbers from a Weibull distribution with parameters a and b.

Uniform

unif pdf

Weibull

wblpdf

2) Evaluating,
ii) cumulative distribution functions,

iii) and the inverse of cumulative distribution functions (in case of existence):
Octave has dierent orders to evaluate:

- usual probability mass functions (discrete distributions) and usual density functions (continuous case),
- common cumulative distribution functions, and

- the inverse of cumulative distribution functions (in case of existence).

We should remark that the inverse of a distribution function F exists if and only if F is continuous and strictly increasing (where it takes values dierent from 0 and 1).

Common continuous distributions in Statistics, as the exponential, uniform, normal, t, 2 and

F distribution, have cumulative distribution functions with inverse. This allows us to obtain
key values of such distributions without using their associated tables (included at the end of
Chapter 5 of Statistical Notes).
We clarify this with the following example.

Let Z N (0, 1). Suppose we want to obtain the value x for the above distribution, such that
the tail on its right hand side is equal to 0.05. Therefore the tail on its left hand side should be
equal to 0.95. Thus we look for the value x such that FZ (x) = 0.95, equivalently, FZ1 (0.95) = x,
that is, we need to know the value of the inverse function of the cumulative distribution function
at the point 0.95.

r
o

Obviously this reasoning is valid for the aforementioned distributions.

l
e

The following table summarizes the orders for dierent probability distributions.

In the rst place we have the name of the distribution, then the order which computes values
of the probability mass function (discrete case) or values of the density function (continuous
case) (ending with pdf ), in the third place we nd the order which returns values of cumulative
distribution functions (ending with cdf ), and nally the order which calculates values of the
inverse of the cumulative distribution function (ending with inv):
Binomial
Chi2

binopdf

chi2pdf

Discrete

discrete pdf
exppdf

F (Snedecor)

f pdf

Normal

geopdf
normpdf

h
c

a
B

chi2cdf

Exponential
Geometric

binocdf

chi2inv

poisspdf

t (Student)

tpdf

i
r
e
e
n

poisscdf
tcdf

i
g

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

i) probability mass functions (discrete distributions) and density functions (continuous case),

g
n

CHAPTER 3. LAB PRACTICE 3.- PROBABILITY WITH OCTAVE

tinv

unif cdf

wblcdf

28

unif inv

wblinv

We clarify the use of the above orders by means of an example of the binomial distribution
(discrete case) and the normal distribution (continuous case), the rest of the orders being totally
analogous.
It is important to remark that for all the distributions of the above table, and for all the orders,
the point in which we want to evaluate dierent functions appears in the rst place, before the
parameters of the distributions.
In the following instructions, x will be a real number, or a vector of real values.
>> binopdf (x, n, p) computes the probability mass function at x of the binomial distribution with parameters n and p. If x is a vector it computes the probability mass function
in each component of x.

>> binocdf (x, n, p) computes the cumulative distribution function of the binomial distribution with parameters n and p at the point x. If x is a vector it computes in each
component of x the above value.
>> normpdf (x, a, b) computes the probability density function of the normal distribution
with mean a and standard deviation b (parameters a and b) at the point x. If x is a vector
it computes in each component of x the above value.
>> normcdf (x, a, b) computes the cumulative distribution function of the normal distribution with mean a and standard deviation b (parameters a and b) at the point x. If x is
a vector it computes in each component of x the above value.
>> norminv(x, a, b) computes the inverse of the cumulative distribution function of the
normal distribution with mean a and standard deviation b (parameters a and b) at the
point x. If x is a vector it computes in each component of x the above value.

The rest of orders are completely similar. The parameters of each of the above orders can be
viewed in Point 1) of this Lab practice.

discrete cdf

expcdf

f cdf

geocdf

normcdf

expinv

f inv

geoinv
norminv
Statistics

Statistics

29

3.3

Exercises of Lab Practice 3

Exercise 3.1. Simulate by means of two dierent procedures 200 throws of fair dice. Calculate the
proportion of values lower than or equal to 4 in such a simulation. Obtain the proportion of each
possible value in the above simulation.

Another possibility is the following. We generate 200 random numbers on the interval (0, 1) with
>> A = rand(200, 1)
those numbers being divided in 6 dierent categories with
>> B = cut(A, 6)
which gives the matrix with the simulation of the numbers of throws. Note that this is possible because
we have a fair dice.
The proportion of values lower than or equal to 4 can be obtained with
>> empirical cdf (4, B)
The proportion of each possible value in the above simulation is computed with
>> table(B)/200
Exercise 3.2. We have a coin with probability of tail equal to 0.4. Simulate 1000 tosses of such a
coin and obtain the proportion of tails. Solve the above simulation in at least two dierent ways.
We can simulate the throws of the coin with the Bernoulli distribution with parameter 0.4 (or
binomial with parameters 1 and 0.4). Note that it takes the value 1 (tail) with probability 0.4, and the
value 0 (head) with probability 0.6. Thus the order is
>> A = binornd(1, 0.4, 1000, 1)
The proportion of ones (tails) in the matrix A can be computed with
>> mean(A)
or with
>> table(A)/length(A)
taking the second value of the output.

h
c

l
e

Another possibility is the following. We simulate 1000 random numbers in the interval (0, 1) with
>> A = rand(1000, 1)
and transform those lower than or equal to 0.4 in the value 1 (tail), and those greater than 0.4 in the
value 0, by means of the order
>> B = (A <= 0.4)
The proportion of tails (ones in the matrix B) can be obtained with
>> mean(B)
or with
>> table(B)/length(B)
taking the second value of the output.

a
B

Statistics

i
r
e
e
n

30

A third possibility is based on the simulation of a discrete random variable which can take values
1 and 0 with probabilities 0.4 and 0.6 respectively. This can be carried out with
>> C = discrete rnd(1000, [0, 1], [0.6, 0.4])
The proportion of tails (ones in the matrix C) can be obtained with
>> mean(C)
or with
>> table(C)/length(C)
taking the second value of the output.

i
g

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

We can simulate 200 throws of a fair dice with the order

>> A = discrete rnd(200, [1, 2, 3, 4, 5, 6], [1, 1, 1, 1, 1, 1])
The proportion of values lower than or equal to 4 can be displayed with
>> empirical cdf (4, A)
The proportion of each possible value in the above simulation is computed with
>> table(A)/200
We should indicate that the vector of probabilities [1, 1, 1, 1, 1, 1] could be dened with the order
>> ones(1, 6)

r
o

g
n

CHAPTER 3. LAB PRACTICE 3.- PROBABILITY WITH OCTAVE

Exercise 3.3. Suppose that we have a dice where the probabilities of the numbers 1, 2, 3, 4, 5 and 6
are 0.2, 0.15, 0.15, 0.3, 0.1 and 0.1 respectively. Simulate 500 throws of such a dice.
The simulation can be generated with
>> A = discrete rnd(500, [1, 2, 3, 4, 5, 6], [0.2, 0.15, 0.15, 0.3, 0.1, 0.1]) or
>> A = discrete rnd([1, 2, 3, 4, 5, 6], [0.2, 0.15, 0.15, 0.3, 0.1, 0.1], 500, 1)
Note that it is not necessary to write the values of the probabilities, we can write proportional values,
so we could consider the vector [20, 15, 15, 30, 10, 10] in the above orders.
Exercise 3.4. Consider a discrete random variable X which takes the values 1, 0 and 1 with
probabilities 0.25, 0.5 and 0.25 respectively. Generate 2000 values drawn from such a variable and
obtain the sample mean, compare it with the mean of the variable.

The values can be generated with

>> A = discrete rnd(2000, [1, 0, 1], [0.25, 0.5, 0.25])
The sample mean with the order
>> mean(A)
In our simulation the sample mean was 0.007, value which is very close to the mean of the variable
(EX = 0).

Exercise 3.5. Generate 5000 values of a B(4, 0.5) distribution. Construct the empirical distribution
function of such a sample. Draw in the same gure that function and the distribution function of the
above binomial distribution.

The values are generated with the order

>> A = binornd(4, 0.5, 5000, 1)
Since such values are in the set {0, 1, . . . , 4}, we draw the the empirical cumulative distribution (for
instance) in the interval [1, 5]. Thus we consider
>> x = 1 : 0.01 : 5
>> plot(x, empirical cdf (x, A))
Since we want to overlap pictures, we use the order
>> hold on
The cumulative distribution function of the distribution B(4, 0.5) is drawn with the order
>> plot(x, binocdf (x, 4, 0.5))
In our simulation we obtained the graphic (see next gure)
Exercise 3.6. Generate 1000 values of a normal distribution with parameters 0 and 2. Draw the
empirical distribution function of the sample, and in the same gure, draw the distribution function
of such a normal.
Statistics

31

g
n

i
r
e
e
n

CHAPTER 3. LAB PRACTICE 3.- PROBABILITY WITH OCTAVE

A possible solution is
>> x = 3 : 0.01 : 3
>> plot(x, normpdf (x, 0, 1))
We obtain the gure

32

i
g

The following orders are a possible solution

>> A = normrnd(0, 2, 1000, 1)
>> x = 3 : 0.01 : 3
>> plot(x, empirical cdf (x, A))
>> hold on
>> plot(x, normcdf (x, 0, 2), r)
In our simulation we have obtained the gure (see next gure)

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

Exercise 3.9. Draw in the interval [3, 3], and in the same gure, the density functions of the normal
distributions with common standard deviation 0.5, and means 0, 0.5 and 1 respectively. By means of
such a graphic, which result about the probabilities of such distributions can be derived by intuition?
The following instructions provide the solution of the problem
>> x = 3 : 0.01 : 3
>> plot(x, normpdf (x, 0, 0.5))
>> hold on
>> plot(x, normpdf (x, 0.5, 0.5), r)
>> plot(x, normpdf (x, 1, 0.5), g)

h
c

l
e

r
o

The gure we obtain is the following

Exercise 3.7. By means of the above problem, how would you simulate 1000 values of a normal
distribution with parameters 1 and 0.25?
We know that if W N (0, 2), then 14 W +1 N (1, 0.25) (see properties of the normal distribution
in Statistical Notes, Chapter 5).
In the above problem we have stored 1000 values drawn from a normal distribution with parameters 0
and 2 (N (0, 2)) in matrix A. Therefore it is sucient to consider the order
>> B = 0.25  A + 1
to obtain a sample drawn from the N (0, 2) distribution.

a
B

Exercise 3.8. Represent the density function of a N (0, 1) random variable between the values -3 and
3.
Statistics

From a probabilistic point of view, and in accordance with the meaning of the area below a density
function, we sense that the probabilities of those distributions are equal except for a translation.
Statistics

33

CHAPTER 3. LAB PRACTICE 3.- PROBABILITY WITH OCTAVE

Exercise 3.10. Let X be a random variable following a binomial distribution with parameters 10
and 0.2 (X B(10, 0.2)) Obtain P (X = 6) and FX (5).
The solution is given by the orders
>> binpdf (6, 10, 0.2)
>> binocdf (5, 10, 0.2)

g
n

Exercise 3.15. Consider a normal distribution with mean equal to 0 and variance equal to 1 (N (0, 1)).
Obtain the value which determines on its right-hand side a tail of size 0.05. Obtain the value which
leaves on its left-hand side a tail of size 0.025.

i
g

Let F denote the distribution function of a N (0, 1) distribution. We need to obtain the value a
such that F (a) = 0.95, equivalently a = F 1 (0.95).
Therefore
>> norminv(0.95, 0, 1)
gives the solution, which is 1,6449.

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

Exercise 3.11. Let X be a random variable following a geometric distribution with parameter 0.3
(X G(0.3)) Obtain P (X = 4) and P (X 7).

i
r
e
e
n

34

CHAPTER 3. LAB PRACTICE 3.- PROBABILITY WITH OCTAVE

In relation to the second question, we need the value b such that F (b) = 0.025, that is, b =
F 1 (0.025). Thus we obtain the solution with
>> norminv(0.025, 0, 1)
which is equal to 1.9600.

The required values can be obtain with

>> geopdf (4, 0.3)
>> geocdf (7, 0.3)

Exercise 3.12. Let Z be a random variable following a normal distribution with parameters 0 and
1. (Z N (0, 1)) Obtain P (Z 2) and fZ (1.3).
We can obtain the above values by means of
>> normcdf (2, 0, 1)
>> normpdf (1.3, 0, 1)

Exercise 3.16. Solve the above problem when the distribution is a 2 with 11 eleven degrees of
freedom (211 ), and when we consider a distribution F with 10 and 15 freedom degrees (F10,15 ).
The solution is similar to the above problem, in this case with the distributions 211 and F10,15 .

Exercise 3.13. Let X be a random variable following a exponential distribution with parameter 4
(X exp(4)) Obtain P (X 3) and fX (2).
Note that for the exponential distribution we should enter the mean of the distribution (1/parameter) instead of the parameter.
The solution is given by
>> expcdf (3, 0.25)
>> exppdf (2, 0.25)

r
o

In relation to the rst distribution, the solutions are given by the orders
>> chi2inv(0.95, 11)
which returns 19.675, and
>> chi2inv(0.025, 11)
giving the value 3.8157.
In the second case the solution can be obtained with
>> f inv(0.95, 10, 15)
returning 2.5437, and
>> f inv(0.025, 10, 15)
which generates the value 0.28396.

Exercise 3.14. Let X be a random variable following a Weibull distribution with parameters 2 and 5.
Without using integration, how would you obtain P (X (1, 4)), P (X 8) and P (X > 3)? Moreover,
determine the point c R such that the distribution function of that variable takes the value 0.95?

Exercise 3.17. An engineer connects in parallel two resistors of 100 and 25 (ohm). However the
real resistance could dier from those values. Suppose that the real resistances are two independent
random variables with distributions X N (100, 10) and Y N (25, 2.5) respectively.

Since X is a continuos random variable P (X (1, 4)) = FX (4) FX (1), P (X 8) = FX (8)

and P (X > 3) = 1 P (X 3) = 1 FX (3).
We can obtain the above values with
>> wblcdf (4, 2, 5) wblcdf (1, 2, 5)
which is equal to 0.96923,
>> wblcdf (8, 2, 5)
obtaining 1 and
>> 1 wblcdf (3, 2, 5)
which is equal to 5.0359e-004.

It is known that the real resistance of the assembly, denoted by R, is given by the formula

a
B

h
c

l
e

In relation to second point, we need a value c such that FX (c) = 0.95, equivalently, c = FX1 (0.95).
Thus we obtain such a value with
>> wblinv(0.95, 2, 5)
being equal to 2.4908.

Statistics

R=

XY
.
X +Y

Estimate by simulation the probability of the event 19 < R < 21.

We estimate by means of the strong law of large numbers and simulation the required probability.
We simulate for instance 10000 values of the random variable R to estimate the probability of
the event 19 < R < 21 by means of the proportions of values of the simulation which satisfy that
condition, that is, by means of the proportions of values of the simulation which belong to the interval
(19, 21).
For the simulation of the values of R, we simulate 10000 values of the variable X and 10000
values of Y , to generate by means of them the simulated values of R.
Such a procedure can be performed with
>> X = normrnd(100, 10, 10000, 1);
Statistics

35

>> Y = normrnd(25, 2.5, 10000, 1);

>> R = (X. Y )./(X + Y );
generating a column vector with 10000 simulated values of the variable R.
Now we need to count how many of such values belong to the interval (19, 21).
That is calculated with
>> P = (19 < R);
>> Q = (R < 21);
>> L = P. Q;
>> mean(L)

Exercise 3.18. A machine has three components which work in an independent way. The life (in
hours) of each component has exponential distribution with parameter 1/2. The machine works if at
least two of the components work. Program a procedure which generate the value 1 if the machine
works after half an hour.
In the rst place we generate the values of the three exponential distributions (life of the three
components) with
>> A = exprnd(2, 1, 3)
Note that the rst number of the above order (2) must be the inverse of the parameter of the exponential
distribution, that is, its mean.

r
o

Now we obtain which components are working after 0.5 hours with
>> B = (A > 0.5)
The number of components working after that period can be obtained with
>> c = sum(B)
If c is greater than or equal to 2, the value 1 (the machine works) should be displayed, in any other
case the value 0 (the machine does not work). This can be performed with
>> if c >= 2 (f = 1)
>> else f = 0
>> endif
>> f

h
c

l
e

a
B

Exercise 3.19. By means of the above problem, estimate the probability that the machine woks at
least half an hour.
We apply again the strong law of large number and simulation to estimate such a probability.
We repeat (simulate) the experiment a large number of times and we estimate the probability
by means of the proportion of times such an event occurs in the simulations.

Statistics

i
r
e
e
n

36

Let us consider 10000 repetitions. The following program solves the problem.
>> A = exprnd(2, 10000, 3);
>> B = (A > 0.5);
>> f or i = 1 : 10000;
>> C(i) = sum(B(i, :));
>> endf or
>> f = 0;
>> f or j = 1 : 1000
>> if C(j) >= 2
>> f = f + 1;
>> endif
>> endf or
>> p = f /10000

i
g

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

Matrix P of size 10000 1 has a value 1 in those positions in which the simulation of R takes a
value greater than 19, and has the value 0 in any other case.
In a similar way, matrix Q of size 10000 1 has a value 1 in those positions in which the simulation
of R takes a value lower than 21, and has the value 0 in any other case.
The element-by-element product of P and Q generates a matrix of size 10000 1 with 1 in those
positions in which the simulation of R belongs to the interval (19, 21), and 0 in any other case.
The last order computes the proportion of values equal to 1 in the matrix L, which is the estimation of
the requested probability (we could use >> table(L)/length(L) taking the second value of the output).
Our simulations estimate such a probability around the value 0.45.

g
n

CHAPTER 3. LAB PRACTICE 3.- PROBABILITY WITH OCTAVE

In the rst line we simulate 10000 times (10000 rows) the lives of three components (each row
is a possible machine).
In the second line we detect which components have a duration greater than half an hour.
In the rst loop we count the number of components working after half an hour in each machines (each
row).
With the variable f we count the number of rows with at least two components working after half an
hour (number of machines working after half an hour).
Finally p gives the proportion of machines working after half and hour, that is, the estimation of the
required probability (applying the strong law of large numbers).
In our simulations we obtain that the proportions of times such an event occurred were very close
to 0.087.
Exercise 3.20. Let X be a discrete random variable. It is said that a value a R is a mode of the
random variable X if P (X = a) P (X = x) for any x R, that is, if the probability mass function
reaches its maximum at the point a.

It is well-known that if X follows a Poisson distribution with parameter , its mode(s) is given
by the integer part of if such a value is not an integer number, and by 1 and when is an
integer number.
Corroborate such a result by drawing in the interval [0, 15] the probability mass function of
random variables following Poisson distributions with parameters 2, 5, 6.3 and 7.4.

A possible solution for the rst parameter, the rest of cases being equal, is the following
>> p = 2
>> f or i = 1 : 16
>> y(i) = poisspdf (i 1, p)
>> endf or
>> x = 0 : 15
>> plot(x, y)
Here p plays the role of the parameter of the Poisson distribution.
We should observe that to dene the coordinates of vector y, it is not possible to use y(0), that
is the reason we dene y(i) with i from 1 to 16, the value y(i) being the value of the probability mass
function at the point i 1. Note that the order plot joins the drawn points with segments.

Statistics

g
n

i
r
e
e
n

37

CHAPTER 3. LAB PRACTICE 3.- PROBABILITY WITH OCTAVE

The graphical representation which is obtained appears below. Note that it has two maximum
values at the points 1 and 2, which corroborates the statement of the problem in the case p = 2.

i
g

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

The graphical representations of the other parameters can be obtained just by modifying the value
of the parameter in p.

a
B

h
c

l
e

r
o

Statistics

g
n

i
r
e
e
n

40

CHAPTER 4. LAB PRACTICE 4.- TESTS ON THE PARAMETERS OF A NORMAL DISTRIBUTION

In the above order, vector x must contain the sample from which we want to infer.
The variable m should contain the value we want to test if it is the mean of the normal distribution.

i
g

The argument alt determines the kind of test we are performing (the alternative hypothesis).
Thus,

Chapter 4

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

if such an argument does not appear, the test we perform is

H0 : = m against H1 : = m,
if the argument is ! = (quotation marks must be included), the test we perform is
H0 : = m against H1 : = m
(the same test as in the above case),

Lab Practice 4.- Tests on the

parameters of a normal distribution

if the argument is > (quotation marks must be included), the hypothesis of the test
are
H0 : = m against H1 : > m,
if the argument is < (quotation marks must be included), the test is
H0 : = m against H1 : < m.

Content

The above orders compute the p-value of the sample of x in the corresponding test.

4.1

39

4.2

39

4.3

Exercises of Lab Practice 4

42

. . . . . . . . . . . . . . . . . . . . . . .

The order
>> [pval, stat, gl] = t test(x, m, alt)
calculates three dierent values, pavl which contains the p-value of the sample in the corresponding test, stat with the value of the statistic of the test, and gl which gives the freedom
degrees of the statistic of the test (sample size minus 1).
We recall that the statistic of the test is

4.1

Aim of the practice

Xn m
,
S
X / n

where

In this practice we are learning to infer on the parameters of a normal distribution by means of some
hypothesis tests studied in expositive classes (see Chapter 7 of Statistical Notes).

4.2

l
e

r
o

Basic orders to test on the parameters of a normal

distribution

h
c

The orders to test on the parameters of a normal distribution are shown in this section.
Throughout this practice we will assume that we have a random sample (X1 , X2 , . . . , Xn ) drawn
from a normal X with parameters and (X N (, )). By means of that sample we want to infer
on both parameters.

a
B

The main orders in this framework are the following:

1) Test on the mean of a normal distribution with unknown variance, (N (, ) with unknown
): to perform a test on the mean of a normal random variable whose variance is unknown, we
will use the order
>> t test(x, m, alt)

39

Xn =

1
Xi
n
i=1

(sample mean) and

2
S
X
=

1
(Xi X n )2 (sample quasivariance) .
n1
i=1

Such a statistic follows a distribution tn1 (t with n1 freedom degrees) when the null hypothesis
H0 is true, where n is the sample size.

We should remark that the above procedure is also valid when the variance 2 is known, however,
in such a case it is more appropriate to consider the method explained in the following point.

2) Test on the mean of a normal distribution with known variance, (N (, ) with known ): to
perform a test on the mean of a normal variable with known variance, we will use the order
>> z test(x, m, v, alt)
In the above order, vector x must contain the sample from which we want to infer.
The variable m should contain the value we want to test if it is the mean of the normal distribution.
The variable v should contain the variance ( 2 ), which is known (be careful to introduce the
variance and not the standard deviation).
The argument alt determines the kind of test we are performing (the alternative hypothesis).
Thus,
Statistics

41

CHAPTER 4. LAB PRACTICE 4.- TESTS ON THE PARAMETERS OF A NORMAL DISTRIBUTION

i
g

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

if the argument is > (quotation marks must be included), the hypothesis of the test
are
H0 : = m against H1 : > m,

respectively, that is, samples in which the value of the statistic, we will denote it by stat, satises

if the argument is < (quotation marks must be included), the test we perform is
H0 : = m against H1 : < m.
The above order computes the p-value of the sample in the corresponding test.

The order
>> [pval, stat] = z test(x, m, v, alt)
computes two dierent values, pavl which contains the p-value of the sample in the corresponding
test, and stat, which displays the value of the statistic of the test.

1
n

Xi ,

(sample mean)

a) H0 : = 0 against H1 : = 0 ,

where

h
c

a
B

2
S
X
=

r
o

l
e

b) stat 2n1;1 ,
c) stat 2n1; ,

2n1;

Now, if our test is a), and

stat 2n1;1/2

or

stat 2n1;/2 ,

we reject the null hypothesis to conclude the alternative hypothesis, that is, the sample belongs
to the critical region. If such a condition is not satised, we should not reject the null hypothesis.
If we have test b), and

3) Test on the standard deviation of a normal distribution (N (, )): to our knowledge, the
test on the standard deviation of a normal distribution has not been implemented in Octave,
probably because its simplicity. Therefore it is necessary to program such a test. For such a
purpose we are using the orders on the inverse of cumulative distribution functions which have
been analyzed in Lab Practice 2.

c) H0 : = 0 against H1 : > 0 .

stat 2n1;/2 ,

or

>> stat = (length(x) 1) var(x)/0 2

i=1

against H1 : < 0 ,

stat 2n1;1/2

respectively, where
stands for the value which has on its right hand side a tail of size
when we consider the distribution 2n1 . Therefore, such a value can be computed with the
order
>> chi2inv(1 , n 1)

which follows a distribution N (0, 1) when the null hypothesis is true.

b) H0 : = 0

a)

If x denotes the vector with the sample data, the value of the statistic can be obtained with

Xn m
,
/ n

Xn =

42


s
2X
s
2
2n1;1 or (n 1) X2 2n1; ,
2
2
2
0
0


s
2X
b) CR = (x1 , . . . , xn ) : (n 1) 2 2n1;1 ,
0


s
2X
c) CR = (x1 , . . . , xn ) : (n 1) 2 2n1; ,
0
(x1 , . . . , xn ) : (n 1)

a) CR =

if the argument is ! = (quotation marks must be included), the test we perform is

H0 : = m against H1 : = m
(the same test as in the above case),

where

i
r
e
e
n

if such an argument does not appear, the test we perform is

H0 : = m against H1 : = m,

g
n

CHAPTER 4. LAB PRACTICE 4.- TESTS ON THE PARAMETERS OF A NORMAL DISTRIBUTION

stat 2n1;1 ,

we reject the null hypothesis to conclude the alternative hypothesis, that is, the sample belongs
to the critical region. If such a condition is not satised, we fail to reject the null hypothesis.
If we have considered test c), and

stat 2n1; ,

we reject the null hypothesis to conclude the alternative hypothesis, that is, the sample belongs
to the critical region. If such a condition is not satised, we should not reject the null hypothesis.

4.3

S
2
(n 1) X2 ,
0

Exercises of Lab Practice 4

Exercise 4.1. The opening time of a web page is a very important characteristic in the design of such
pages. It is assumed that the opening time follows a normal distribution. In order to analyze such
times, web pages designed by the same programmer are taken at random, annotating their opening
times (135 data) in hundredths of second. These are the data:

1
(Xi X n )2 .
n1
i=1

Such a statistic follows a distribution 2n1 (2 with n 1 degrees of freedom) when H0 is true,
where n is the sample size.
The critical rejections at a level of signicance are
Statistics

the sample can be found in the le DatPrac4Ejer1 which can be loaded from Campus Virtual.
Matrix A contains the opening data.
Statistics

43

CHAPTER 4. LAB PRACTICE 4.- TESTS ON THE PARAMETERS OF A NORMAL DISTRIBUTION

i) Could we consider that the mean opening time of the web pages of the above programmer is
0.65 hundredths of second?, or on the contrary is there enough evidence to reject such a hypothesis?
ii) Solve the same question when we know that the variance of the opening times is 0.02.

g
n

Exercise 4.3. In a market analysis on computing and communications, 28 personal computer users
were taken at random, studying, among other characteristics, the space (in gigabytes) taken up by
lms in the hard disks of their computers. Data are the following:

i
g

Use level of signicance 0.05.

Since we do not know the true variance of the opening times, in the rst question we consider
the order
>> p1 = t test(A, 0.65)
which gives a p-value equal to 0.
Therefore there exists enough evidence to reject that the mean of the opening times of the web pages
designed by such a programmer is equal to 0.65. The above p-value is stored in the variable p1.

the sample can be found in the le DatPrac4Ejer3 which can be loaded from Campus Virtual.
Matrix A contains the space in gigabytes taken up by lms.

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

In relation to the second question, now we know that the variance of the random variable opening
times is 0.02. Therefore we will use in this case the order
>> p2 = z test(A, 0.65, 0.02)
The computed p-value is 0.36887. Thus we do not have enough evidence to reject the null hypothesis.
That is, we should not reject that the mean of the opening times of the web pages designed by the
programmer is equal to 0.65. Note that in this case, the p-value is stored in the variable p2.

i
r
e
e
n

44

CHAPTER 4. LAB PRACTICE 4.- TESTS ON THE PARAMETERS OF A NORMAL DISTRIBUTION

Let us suppose that the taken up space by lms follows a normal distribution.

i) Could we consider that the mean of the taken up space by lms in hard disks of personal
computers is 20 gigabytes, or on the contrary such a mean is lower?

ii) Is there enough evidence to reject that the standard deviation of the taken up space by lms
is 8 gigabytes?
Use level of signicance 0.05.

Exercise 4.2. Some compression rates have been observed using level 9 of Gzip compression to
compress the Lisp source code. Such rates are (80 data):

The rst test can be solved with the order

>> p1 = t test(A, 20, < )
which gives in the variable p1 the p-value of the sample in the test.
Since such a value is 0.28619, we do not have enough evidence to reject that the mean of the space of
lms in the hard disks of personal computers is 20 gigabytes.

the sample can be found in the le DatPrac4Ejer2 which can be loaded from Campus Virtual.
Matrix A contains the compression rates.

The second question is on the standard deviation of the space of lms. The value of the statistic
is obtained with
>> stat = (length(A) 1) var(A)/8 2

Such a value is stat = 1, 29477.

i) is it possible to consider that the mean compression rate is 40, or on the contrary, is there
enough evidence to reject such a value and conclude that it is greater?

ii) is it possible to consider that the variance of the compression rate is 100, or on the contrary,
is there enough evidence to reject such a value and conclude that it is lower?

r
o

l
e

The rst question can be solved with

>> p1 = t test(A, 40, > )
which gives the p-value of the sample in the variable p1.
In this case we obtain a p-value equal to 0.086073. If the level of signicance is 0.05, we do not have
enough evidence to reject that the mean compression rate is 40 and conclude that it is greater.

h
c

The second question is in relation to the variance of the compression rate. The value of the
statistic can be computed with
>> stat = (length(A) 1) var(A)/100
We store it in the variable stat.

a
B

To know if the sample belongs to the critical region or not, we need to obtain in this case the
value f r = 2n1;0.95 , which determines the frontier of the critical region.
Such a value can be computed with
>> f r = chi2inv(0.05, length(A) 1)
We obtain that stat = 105, 93319 and f r = 59, 522. Thus the relation stat f r is not satises,
therefore there is not sucient evidence to reject that the variance of the compression rate is 100.
Statistics

To know if the sample is in the critical region we need to obtain the values >> f r1 = 2n1;0,975
and >> f r2 = 2n1;0,0.025 , which give the frontiers of the critical region. Such values can be obtained
with
>> f r1 = chi2inv(0.025, length(A) 1)
>> f r2 = chi2inv(0.975, length(A) 1)
respectively. Those values are f r1 = 14, 573 and f r2 = 43, 195.
Note that stat 2n1;0.975 . Therefore the sample belongs to the critical region and so we should
reject that the standard deviation of the space of lms is 8 gigabytes, to conclude that it is dierent
from that value.
Exercise 4.4. A web page gets hung up frequently. A programmer studies the length of the moments
that the web page is got hung up. The data are the following
7.97452, 3.87601,
3.17150, 6.39356,

0.61877,

8.52384,

2.50740,

6.63117,

8.83795,

10.01265,

If the above length follows normal distribution,

i) is there enough evidence to reject that the mean time the web page is got hung up is 4 seconds
and conclude that such a mean time is greater than 4?
ii) which is the value of the statistic used in the above question?
Use level of signicance 0.05.
Statistics

45

g
n

i
r
e
e
n

CHAPTER 4. LAB PRACTICE 4.- TESTS ON THE PARAMETERS OF A NORMAL DISTRIBUTION

In the rst place, the data are stored in a matrix with name A by means of the order
>> A = [7.97452; 3.87601; 0.61877; 8.52384; 2.50740; 6.63117; 8.83795; 10.01265; 3.17150; 6.39356]
The p-value of above sample for the test proposed in question i) can be calculated with
>> p1 = t test(A, 4, > )
Such p-value appears in the variable p1, being equal to p1 = 0.047068. Since such a p-value is less
than 0.05, we should reject the null hypothesis of mean time equal to 4 to conclude that the mean time
the web page is got hung up is greater than 4 seconds.

i
g

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

To obtain the value of the statistic used in the above question, it is sucient to consider the
order
>> [p1, stat, gl] = t test(A, 4, > )
It computes three dierent values.
The variable p1 contains the p-value of the sample, that is, 0.047068. The variable stat displays the
value of the statistic of the test, in this case 1.8711 (this solves the second question of the problem).
Finally gl gives the freedom degrees of the statistic of the test, in this case 9 (sample size1).

46

Since the sample was generated with a mean equal to 10, it is obvious that if we move away from
10, the p-values should decrease rapidly.

Exercise 4.5. For the data of the above problem, study if it is possible to consider that the variance
of the time the web page is got hung up is 4 or on contrary such a variance is greater.
The value of the statistic is obtained with
>> stat = (length(A) 1) var(A)/4
We obtain that stat = 22.10843

To know if the sample belongs to the critical region, we need to obtain in this case the value
f r1 = 2n1;0,05 , which is the frontier of the critical region.
Such a value is obtained with
>> f r1 = chi2inv(0.95, length(A) 1)
In this case we obtain that f r1 = 16.919. Note that the value stat is greater than f r1. Therefore we
should reject that the variance is 4, to conclude that it is greater.

r
o

Exercise 4.6. Generate a random sample of size 1000 from a distribution N (10, 2). For any value
between 8 and 12 with a step of 0.01, obtain the p-value of the above sample for the test which studies
if such a sample becomes from a normal distribution with mean that value and standard deviation
equal to 2. Represent in a graphic the above values against their p-values. Give an interpretation of
such a graphic.

l
e

The random sample is generated with

>> A = normrnd(10, 2, 1000, 1);
The values between 8 and 12 with a step of 0.01 are created with
>> x = 8 : 0.01 : 12;
the p-values of the tests proposed in the statement of the problem are performed with the order
>> f or k = 1 : 401
>> w(k) = z test(A, x(k), 2);
>> endf or
such p-values being stored in the vector w.
The graphical representation is drawn with the order
>> plot(x, w)

h
c

a
B

In our simulation we obtained the following representation:

Statistics

Statistics

g
n

i
r
e
e
n

CHAPTER 5. LAB PRACTICE 5.- TESTS ON THE PARAMETERS OF TWO INDEPENDENT NORMAL
DISTRIBUTIONS

48

1) Tests of equality of variances (1 = 2 ) of two independent normal distributions: to perform a

test of equality of variances of two independent normal distributions, we will use the order
>> var test(x, y, alt)

i
g

In the above order, vectors x and y should contain the samples from which we want to infer.
The argument alt determines the kind of test we are performing (the alternative hypothesis(.
Thus,

Chapter 5

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

Lab Practice 5.- Tests on the

parameters of two independent normal
distributions

if such an argument does not appear, the test we perform is

H0 : 1 = 2 against H1 : 1 = 2 ,
if the argument is ! = (quotation marks must be included), the test we perform is
H0 : 1 = 2 against H1 : 1 = 2 ,
(the same test as in the above case),

if the argument is > (quotation marks must be included), the test has the hypothesis
H0 : 1 = 2 against H1 : 1 > 2 ,

if the argument is < (quotation must should be included), the test is

H0 : 1 = 2 against H1 : 1 < 2 .

Content

The above orders compute the p-value of the samples in the corresponding test.

5.1

5.1

47

5.2

Basic orders to test on the parameters of two independent normal

distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47

5.3

Exercises of Lab Practice 5

50

. . . . . . . . . . . . . . . . . . . . . . .

Aim of the practice

l
e

r
o

In this practice we are learning to test with Octave on the parameters of two independent normal
distributions by means of the hypothesis tests explained in the expositive classes.

5.2

h
c

Basic orders to test on the parameters of two independent normal distributions

a
B

The orders to test on the parameters of two independent normal distributions are shown in this section.
Throughout this practice we will consider that we have two random samples (X1 , X2 , . . . , Xn1 ) and
(Y1 , Y2 , . . . , Yn2 ) drawn from X and Y respectively, where X and Y are independent random variables
with distributions X N (1 , 1 ) and Y N (2 , 2 ).
The main orders in this context are the following:

47

The order
>> [pval, stat, gl1, gl2] = var test(x, y, alt)
calculates four dierent values, pavl which contains the p-value of the samples in the corresponding test, stat with the value of the statistic of the test, gl1 which gives the freedom degrees of
the sample in vector x (size of x1), and gl2 which gives the freedom degrees of the sample in
vector y (size of y1).

2) Tests of equality of means (1 = 2 ) of two independent normal distributions when variances

are equal: to perform a test of equality of means of two independent normal distributions when
variances are equal we use the order
>> t test 2(x, y, alt)
In the above order, vectors x and y must contain the samples from which we want to infer.
The argument alt determines the kind of test we are performing (the alternative hypothesis).
Thus,
if such an argument does not appear, the test we perform is
H0 : 1 = 2 against H1 : 1 = 2 ,
if the argument is ! = (quotation marks must be included), the test we perform is
H0 : 1 = 2 against H1 : 1 = 2 ,
(the same test as in the above case),
if the argument is > (quotation marks must be included), the test we perform is
H0 : 1 = 2 against H1 : 1 > 2 ,
if the argument is < (quotation marks must be included), the test has the hypothesis
H0 : 1 = 2 against H1 : 1 < 2 .
Statistics

CHAPTER 5. LAB PRACTICE 5.- TESTS ON THE PARAMETERS OF TWO INDEPENDENT NORMAL

49

DISTRIBUTIONS

g
n

i
r
e
e
n

CHAPTER 5. LAB PRACTICE 5.- TESTS ON THE PARAMETERS OF TWO INDEPENDENT NORMAL
DISTRIBUTIONS

50

The above orders compute the p-value of the samples in the corresponding test.

5.3

Exercises of Lab Practice 5

The order
>> [pval, stat, gl] = t test 2(x, y, alt)
computes three dierent values, pavl which contains the p-value of the samples in the corresponding test, stat which displays the value of the statistic of the test, and gl which gives the
freedom degrees of the statistic (sum of the sizes of x and y minus 2).

Exercise 5.1. A mice factory produces two dierent kinds of mice, say A and B. It is known that
the lives of both models follow normal distributions. In order to compare both models, the life of 140
mice of model A and 105 of model B were studied. The data are the following:

i
g

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

the samples can be found in the le DatPrac5Ejer1 which can be loaded from Campus Virtual.
Matrix A contains the lives of model A, matrix B of model B.

3) Tests of equality of means (1 = 2 ) of two independent normal distributions when variances

are dierent: to perform a test of equality of means of two independent normal distributions
when variances are dierent, we use the order
>> welch test(x, y, alt)

i) What could we conclude on the mean life of the above mouse models?, are they equal?

In the above order, vectors x and y must contain the samples from which we want to infer.

ii) Is the mean life greater in model B?

The argument alt determines the kind of test we are performing (the alternative hypothesis).
Thus,

We test the equality of the means of the life times of the mice.

if such an argument does not appear, the test we perform is

H0 : 1 = 2 against H1 : 1 = 2 ,

if the argument is ! = (quotation marks must be included), the test we perform is

H0 : 1 = 2 against H1 : 1 = 2 ,
(the same test as in the above case),

if the argument is > (quotation marks must be included), the test we perform is
H0 : 1 = 2 against H1 : 1 > 2 ,
if the argument is < (quotation marks must be included), the test we perform is
H0 : 1 = 2 against H1 : 1 < 2 .

r
o

The above orders compute the p-value of the samples in the corresponding test.

l
e

The order
>> [pval, stat, gl] = welch test(x, y, alt)
obtains three dierent values, pavl which has the p-value of the samples in the corresponding
test, stat which displays the value of the statistic of the test, and gl which gives the freedom
degrees of the statistic (see point 8.2.3 of Statistics Notes).

h
c

a
B

If X is the random variable life time of a mouse of model A and Y stands for the random variable
life time of a mouse of model B, we know that X N (1 , 1 ) and Y N (2 , 2 ).

Remark: When we want to test on the means of two independent normal random variables, N (1 , 1 ),
and N (2 , 2 ), by means of random samples drawn from such variables, it is usual that we do not
know if the variances (12 and 22 ) are equal or not (normally 1 , 2 , 1 and 2 are unknown).
In order to know which test should be applied (Point 2 or Point 3), it is necessary to test in the
rst place H0 : 1 = 2 against H1 : 1 = 2 .
If in such a test we reject the null hypothesis concluding that 1 = 2 , we will apply Point 3 to
test on the means. On the contrary, if we fail to reject the null hypothesis, we will apply Point 2.
Statistics

For such a purpose rstly we test the equality of variances, that is, we consider the test
H0 : 1 = 2 against H1 : 1 = 2 .
In accordance with the result of this test we will analyze the equality 1 = 2 with the appropriate

test.

The inference on the variances can be performed with the order

>> p1 = var test(A, B)
which obtains the p-value of the samples in such a test. In this case the p-value is p1 = 0. Therefore
we should reject that the variances of the life times of both models are equal, concluding that 1 = 2 .
Since we have considered that the above variances are dierent, we will use the order welch test
to infer on the means of the life times.
Such an inference can be performed with the order
>> p2 = welch test(A, B)
which computes the p-value of the samples in the corresponding test.
We obtain that such a p-value is p2 = 0.00019569. Since it is less than the usual level of signicance,
there is enough evidence to reject the null hypothesis of equality of the mean life times and conclude
that such values are dierent.
In order to solve the last question, we consider the order
>> p3 = welch test(A, B, < )
which gives the p-value of the samples in the new test.
The p-value is p3 = 9.7847e 004. Note that it is lower than the usual level of signicance, thus we
should reject the hypothesis of equal mean life time and conclude that the mean life time of B is greater
than the mean life time of A.
Exercise 5.2. A company of processors analyzes the quality of two processors, the so-called ProMT1
and ProMT2. It is known that the speeds of both follow normal distributions. Taken at random 15
ProMT1 processors and 23 ProMT2 processors, the following speeds were found:
ProMT1: 2.7, 2.65, 2.83, 2.95, 2.64, 2.45, 3.01, 2.56, 2.76, 2.99, 2.76, 2.87, 3.05, 2.65, 3.09
Statistics

CHAPTER 5. LAB PRACTICE 5.- TESTS ON THE PARAMETERS OF TWO INDEPENDENT NORMAL

51

DISTRIBUTIONS

ProMT2: 2.85, 2.89, 2.77, 3.00, 2.87, 2.76, 2.78, 2.67, 2.97, 2.99, 2.87, 2.93, 2.78, 2.98, 3.01, 2.84,
2.88, 2.79, 2.89, 2.91, 2.87, 2.88, 2.92
the samples can be found in the le DatPrac5Ejer2 which can be loaded from Campus Virtual.
Matrix A contains the lives of model A, matrix B of model B.

We should test if the mean speeds of ProMt1 and ProMt2 processors are equal. For such a
purpose, in the rst place we should test if the variances of the speeds of such processors are the same,
or on the contrary they are dierent. In accordance with the result of the that test, we will apply an
specic test for the question of the problem.

We store the data in two matrices, A1 and A2 by means of

>> A1 = [2.7; 2.65; 2.83; 2.95; 2.64; 2.45; 3.01; 2.56; 2.76; 2.99; 2.76; 2.87; 3.05; 2, 65; 3.09]
>> A2 = [2.85; 2.89; 2.77; 3.00; 2.87; 2.76; 2.78; 2.67; 2.97; 2.99; 2.87; 2.93; 2.78; 2.98; 3.01; 2.84; 2.88; 2.79;
2.89; 2.91; 2.87; 2.88; 2.92]
The inference on the equality of variances is performed by
p1 = var test(A1, A2)
which obtains the p-value of the samples in such a test. Such a p-value is p1 = 9.9280 104 , which
is lower than the usual level of signicance, therefore we reject that the variances of the speeds of both
processors are the same, concluding that they are dierent.
As a consequence we will use the order welch test to infer on the equality of mean speed of both
processors.
We perform the order
>> p2 = welch test(A1, A2)
which obtains the p-value of the samples in such a test.
The p-value is p2 = 0.16534. Since it is greater than the usual level of signicance, there is not enough
evidence to reject the equality of mean speeds.

r
o

Exercise 5.3. The percentages of hard disk taken up by lms in young and adult people follow
normal distributions. In an analysis of such percentages the following data were obtained:

l
e

Young: 20, 18, 22, 32, 17, 25, 34, 19, 17, 27, 24, 31, 17, 23, 26, 28, 30, 26, 33, 12

h
c

Could we consider that young and adult people give the same use to the computers in relation
to the space taken up by lms?
Since the percentages of hard disk taken up by lms in young and adult people follow normal
distributions, we can consider that young and adult people give the same use to the computers (with
respect to lms) if the means and variances of both distributions are the same. Note that in that case,
the probability distributions of the space taken up by lms are the same. Therefore we are testing the
equality of means and variances.

a
B

We store the data in matrices A and B with

>> A = [20; 18; 22; 32; 17; 25; 34; 19; 17; 27; 24; 31; 17; 23; 26; 28; 30; 26; 33; 12]
>> B = [6; 12; 17; 7; 21; 34; 6; 12; 18; 12; 11; 8; 28; 17; 15; 16; 14]

i
r
e
e
n

DISTRIBUTIONS

52

which obtains the p-value of the samples in such a test.

The p-value is p1 = 0.43184, thus we should not reject the equality of the variances.
It is interesting to note that if in the above test we had rejected the hypothesis 1 = 2 , the
problem would be nished, since the probability distributions of the space taken up by lms would not
be the same.

i
g

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

By means of the above data, could we conclude that the mean speed of both models is the same?

Adult: 6, 12, 17, 7, 21, 34, 6, 12, 18, 12, 11, 8, 28, 17, 15, 16, 14

g
n

CHAPTER 5. LAB PRACTICE 5.- TESTS ON THE PARAMETERS OF TWO INDEPENDENT NORMAL

Now we test the equality of the means of the space taken up by lms in young and adult people.
As a consequence of the previous test we will use the order t test 2. Thus we perform
>> p2 = t test 2(A, B)
which obtains the p-value of the samples in such a test.
The p-value is p2 = 0.0002. Since it is smaller than the usual level of signicance, we reject that the
mean space taken up by lms is the same for young and adult people.
As a consequence we cannot consider that young and adult people give the same use to the
computers in relation to the space taken up by lms.
Exercise 5.4. A manufacturer asserts that the mean information rate sent its ports is equal to the
mean information rate of a port developed by another manufacturer. However, the latter declares that
the mean information rate of its port is greater than the one of the former. It is known that both
Taken at random ports of both manufacturers, information rates in bytes per second were obtained. What could we conclude with the following data?
First manufacturer: 1199.3, 1200.2, 1200.9, 1198.3, 1200.5, 1200.3, 1200.5, 1199.5, 1200.8, 1200.0,
1200.4, 1199.7, 1199.7, 1199.7, 1198.7, 1199.4, 1200.5, 1199.8, 1199.1, 1199.4
Second manufacturer: 1318.4, 1299.7, 1301.3, 1294.7, 1310.8, 1306.3, 1296.2, 1309.6, 1287.5,
1310.7, 1303.2, 1308.9, 1307.7, 1297.2, 1314.6

the samples can be found in the le DatPrac5Ejer4 which can be loaded from Campus Virtual.
Matrix A contains data of the rst manufacturer, matrix B data of the second one.

In the rst place we test if the variances of the information rates of both ports are the same by
means of
>> p1 = var test(A, B)
which obtains the p-value of the samples in such a test. Such a p-value is p1 = 6.6613e016. Therefore
we reject the equality of variances.
As a consequence, the inference on the means will be carried out with the order welch test So
we consider
>> p2 = welch test(A, B, < )
which computes the p-value of the samples in the above test. Such a value is p2 = 1.9990e017. Therefore we should reject the equality of mean information rates and conclude that the mean information
rate of the second manufacturer is greater than the information rate of the rst manufacturer.
Exercise 5.5. Generate 500 data from a normal distribution N (0, 1). Dene a vector with values
from -3 to 3 with a step of 0.1 (601 values). For each value of the above vector, generate 1000 values
from a normal distribution N (value, 1).

The possible equality of the variances can be tested with

>> p1 = var test(A, B)
Statistics

Statistics

g
n

i
r
e
e
n

CHAPTER 5. LAB PRACTICE 5.- TESTS ON THE PARAMETERS OF TWO INDEPENDENT NORMAL

53

DISTRIBUTIONS

Draw a gure which should associate with each value of the vector, the p-value of the test data
generated from N (0, 1) and data generated from N (value, 1) come from random variables with the
same mean. Give an interpretation of such a graphic.
A possible solution of the problem is
>> A = normrnd(0, 1, 500, 1);
>> x = 3 : 0.01 : 3;
>> f or j = 1 : 601;
>> Bj = normrnd(x(j), 1, 1000, 1);
>> p(j) = t test 2(A, Bj);
>> endf or
>> plot(x, p)
We obtained the following gure in our simulation











4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

Since the rst sample was generated with a mean equal to 0, it is obvious that if the our mean
moves away from 0, the corresponding p-values should decrease rapidly.

r
o

Exercise 5.6. Generate 500 data from a normal distribution N (0, 1). Dene a vector with values
from -3 to 3 with a step of 0.1 (601 values). For each value of the above vector, generate 1000 values
from a normal distribution N (0, value).

l
e

Draw a gure which should associate with each value of the vector, the p-value of the test data
generated from N (0, 1) and data generated from N (0, value) come from random variables with the
same variances. Give an interpretation of such a graphic.

h
c

The following orders are a possible solution of the problem

>> A = normrnd(0, 1, 500, 1);
>> x = 3 : 0.01 : 3;
>> f or j = 1 : 601;
>> Bj = normrnd(0, x(j), 1000, 1);
>> p(j) = var test(A, Bj);
>> endf or
>> plot(x, p)

a
B

i
g

In our simulation we obtained the following picture

Since the rst sample was generated with a variance equal to 1, it is obvious that if our variance
moves away from 1, the corresponding p-values should decrease rapidly.

Statistics












g
n

i
r
e
e
n

56

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

proportions
Content
6.1

6.2

55

6.3

Exercises of Lab Practice 6

57

. . . . . . . . . . . . . . . . . . . . . . .

55

r
o

l
e

h
c

The orders to test on population proportions are shown in this section.

The main orders in this context are the following:

a
B

1) Test on a population proportion (p): to our knowledge the test on a population proportion p has
not been implemented in Octave, probably because its simplicity. So it is necessary to program
such a test. For such a purpose we are using the orders on the inverse of cumulative distribution
functions which have been analyzed in Lab Practice 3.
We will consider the tests
a) H0 : p = p0

against

H1 : p = p0 ,

b) H0 : p = p0

against

H1 : p > p0 ,

c) H0 : p = p0

against

H1 : p < p0 .

The critical rejections at a level of signicance are










 x
n p0 

z
a) CR = (x1 , . . . , xn ) :  
,

2

 p0 (1 p0 ) 



n

x
n p0
z ,
b) CR = (x1 , . . . , xn ) : 

p0 (1 p0 )

x
n p0
z1 = z .
c) CR = (x1 , . . . , xn ) : 

p0 (1 p0 )

respectively. That is, samples in which the values of the statistic, we will denote such a value
by stat, satises
a) |stat| z/2 ,
b)

stat z ,

c) stat z1 ,

respectively.

In this practice we are learning to test on population proportions. We will study both the case of one
proportion and the case of two proportions.

6.2

i
g

p0 (1p0 )
n

which follows a distribution (approximately) N (0, 1) when the null hypothesis is true and the
sample size n is large enough.

Chapter 6

6.1

X n p0
Stat = 
,

Note that z stands for the value which has on its right hand side a tail of size when we
consider a distribution N (0, 1). That is, if W N (0, 1), then P (W > z ) = . Such a value
can be computed with the order >> norminv(1 , 0, 1) as we saw in Lab Practice 3.
The sample should be stored in a vector, let us denote it by x, whose values should be 0 or 1,
where a value equal to 1 in a position means that in such an observation the property whose
proportion is analyzed was satised, and a value equal to 0 in that position means that in such
an observation the property under study was not held.
If x denotes the vector with the sample data, the value of the statistic can be obtained with
>> (mean(x) p0 )/sqrt(p0 (1 p0 )/length(x))
In accordance with the hypothesis we have considered, we should obtain the critical region to
know if the sample belong to it, and then we should reject such a hypothesis, or on the contrary
it does not belong to the critical region, but to the acceptance region, and so we do not have
enough evidence to reject the null hypothesis.

2) Tests on equality of population proportions (p1 = p2 ): to perform a test on equality of population

proportions, the order we will use is
>> prop test 2(x1, n1, x2, n2, alt)

55

Statistics

57

CHAPTER 6. LAB PRACTICE 6.- TESTS ON POPULATION PROPORTIONS

i
r
e
e
n

58

In the above order, the argument x1 is the number of times that the event we are studying its
proportion (p1 ) occurred in the rst sample. The argument n1 is the sample size of the rst
sample.

Exercise 6.2. The proportion of computers of a company aected by a virus is being analyzed by one
of the employees. A workmate arms that the proportion of aected computers is 5%, but a second
workmate says that it is grater than 5%.

The meanings of x2 and n2 are similar but with the second sample.

Taken at random 150 computers, a number 1 was annotated if the computer had such a virus,
and 0 in any other cases. The sample which was obtained is the following:

The argument alt determines the kind of test we are performing. Thus,
if such an argument does not appear, the test we perform is
H0 : p1 = p2 against H1 : p1 = p2 ,

(the same test as in the above case),

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

if the argument is > (quotation marks should be included), the test we perform is
H0 : p1 = p2 against H1 : p1 > p2 ,

if the argument is < (quotation marks should be included), the test we perform is
H0 : p1 = p2 against H1 : p1 < p2 .

The order >> prop test 2(x1, n1, x2, n2, alt) computes the p-value of the samples in the corresponding test.

The order
>> [pval, stat] = prop test 2(x1, n1, x2, n2, alt)
computes two dierent values, pavl which contains the p-value of the samples in the corresponding test, and stat which displays the value of the statistic of the test.

Exercises of Lab Practice 6

r
o

Could we conclude that the proportion of visits from outside the University remains during the
summer period, or on the contrary it decreases?

h
c

Let p the proportion of visits from outside the University during the summer months. We want
to tests
H0 : p = 0.2 against H1 : p < 0.2
The value of the statistics can be computed with
>> stat = (32/167 0.2)/(sqrt(0.2 (1 0.2)/167))
whose value is -0.27084.
With the usual level of signicance 0.05, we should compare this value with z0.95 . The quantity z0.95
can be obtained with the order
>> norminv(0.05, 0, 1)
giving z0.95 = 1.6449.
Since stat = 0.27084 1.6449 the sample does not belong the critical region, and so there is not
enough evidence to reject the null hypothesis.

a
B

Could we conclude that the armation of the rst workmate is true?, or should we reject it to
conclude the opinion of the second workmate?

Le p be the proportion of computers aected by virus. We want to test

H0 : p = 0.05 against H1 : p > 0.05
The value of the statistic can be obtained with
>> stat = (mean(A) 0.05)/sqrt(0.05 (1 0.05)/length(A))
such a value being 1.6859. That value should be compared with z0.05 . This quantity is obtained with
the order
>> norminv(0.95, 0, 1)
which reads that z0.05 = 1.6449.
Since stat = 1.6859 < z0.05 = 1.6449, the sample does not belong to the critical region of the test.
Therefore we should not reject the null hypothesis. Thus we conclude the armation of the rst
employee.
Exercise 6.3. A keyboard factory asserts that the proportion of keyboards which last over 6000
hours is 90%. A number of 200 keyboards were analyzed, obtaining the following information:
the sample can be found in the le DatPrac6Ejer3 which can be loaded from Campus Virtual.
Matrix A contains the above sample. A value of 1 in the position i means the the keyboard
i worked after 6000 hours, while a value equal to 0 means that its life did not exceed 6000
hours.

Exercise 6.1. It is known that during the academic course, the server of a department registers a
20% of visits from outside of the University. During the summer period, it is observed that there were
167 requests, 32 from outside of the university.

l
e

i
g

the sample can be found in the le DatPrac6Ejer2 which can be loaded from Campus Virtual.
Matrix A contains the above sample.

if the argument is ! = (quotation marks should be included), the test we perform is

H0 : p1 = p2 against H1 : p1 = p2 ,

6.3

g
n

Statistics

What could we conclude?

Let p be the proportion of keywords whose life is more than 6000 hours. We want to tests
H0 : p = 0.9 against H1 : p = 0.9
The value of the statistic is computed with
>> stat = (mean(A) 0.9)/sqrt(0.9 (1 0.9)/length(A))
being equal to -3.2998
To know if the sample belongs to the critical region, we need to obtain z0.025 , value which can be
computed with the order
>> norminv(0.975, 0, 1)
thus z0.025 = 1.9600.
Since the relation |stat| = 3.2998 z0.025 = 1.9600 holds, there is enough evidence to reject the null
hypothesis since the sample belongs to the critical region.
Exercise 6.4. Two companies which repair computers assure that the proportion of computers which
are repaired by each of them in a number of days below 5, is greater than the proportion of the another
company. To analyze such a question, the following information was obtained:
Statistics

59

CHAPTER 6. LAB PRACTICE 6.- TESTS ON POPULATION PROPORTIONS

the sample can be found in the le DatPrac6Ejer4 which can be loaded from Campus Virtual.
Matrix A contains the information of the rst company, a value of 1 in the position i means
that such a computer was repaired in a period of days below 5, while a value equal to 0 means
that the period was at least 5 days. Matrix B contains the same information for the second
company.
What conclusions can be derived?

i
r
e
e
n

60

With such matrices we can test the above hypothesis with

>> pval = prop test 2(sum(A1), length(A1), sum(B1), length(B1))
which gives the p-value of the samples in the above test.

i
g

In this case pval = 3.9979e 013, thus we should reject the null hypothesis to conclude that both
proportions are dierent.

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

Let p1 and p2 be the proportion of computers which are repaired in a period of days smaller than
5 by the rst and the second companies respectively.
In a rst test we are considering the hypothesis
H0 : p1 = p2 against H1 : p1 = p2
For such a purpose we consider the order
>> prop test 2(x1, n1, x2, n2)
where x1 is the number of computers repaired by the rst company, in a period below 5 days, n1 is the
number of computers repaired by the rst company and the meanings of x2 and n2 are the same with
the second company.
Such values are calculated with the orders >> sum(A), >> length(A), >> sum(B) and >> length(B)
respectively.
The order
>> pval = prop test 2(sum(A), length(A), sum(B), length(B))
gives the p-value of the samples in the above test.

g
n

CHAPTER 6. LAB PRACTICE 6.- TESTS ON POPULATION PROPORTIONS

Since the p-value is pavl = 0.69395 we do not reject the null hypothesis, that is, the proportion
of computers repaired in a time below 5 days is the same for both companies.

Exercise 6.5. A company of processors analyzes the quality of two subsidiary companies in the north
of Spain, studying the speed of the processors manufactured in both places. In the rst subsidiary
company, 125 processors are taken at random, 150 in the second one. The speeds are the following:

r
o

the samples can be found in the le DatPrac6Ejer5 which can be loaded from Campus Virtual.
Matrix A contains the information of the rst subsidiary company, matrix B provides the
information of the second one.

l
e

The company is interested in studying if the proportion of processors with a speed greater than
2.8 is the same in both subsidiary companies.

h
c

Let p1 and p2 the proportion of processors with a speed greater than 2.8 in each of the above
places respectively.
We want to test if
H0 : p1 = p2 against H1 : p1 = p2 .

a
B

We transform the matrices A and B in the matrices A1 and B1, obtained from A and B respectively, which will have the value 1 in those positions where there is a value greater than 2.8, and
a value 0 in any other case. By means of such matrices we can know which processors had a speed
greater than 2.8, and which did not satisfy such a condition.
Fort such a purpose we consider the orders
>> A1 = (A > 2.8)
>> B1 = (B > 2.8)
Statistics

Statistics

g
n

i
r
e
e
n

CHAPTER 7. LAB PRACTICE 7.- GOODNESS OF FIT TESTS, A TEST FOR RANDOMNESS, AND THE
KOLMOGOROV-SMIRNOV TEST FOR TWO SAMPLES.

62

1) A goodness of t test for a normal distribution, the Lilliefords test: to test if a random sample is
drawn from a normal distribution (without conditions on its parameters), we will use the order
>> kolmogorov smirnov test(x, normal, mean(x), var(x, 0))

i
g

Vector x should contain the sample. The above order computes the p-value of the sample in the
test whose null hypothesis is

Chapter 7

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

Lab Practice 7.- Goodness of t tests,

a test for randomness, and the
Kolmogorov-Smirnov test for two
samples.

H0 : the sample in x is drawn from a normal distribution.

2) Goodness of t tests for continuous distribution with given parameters: to test if a random
sample is drawn from a totally specied continuos distribution, we will use the order
>> kolmogorov smirnov test(x, dist, params)
In the above order, vector x should contain the sample.
The argument dist determines the distribution we want to test if data are drawn from it.

Such an argument can be any chain dist such that the order dist cdf or distcdf computes the
distribution function of dist (see Lab Practice 3).

For instance, for the uniform distribution we will write unif orm (quotation marks must
be included), for the normal distribution we will write normal (note that we do not write
norm, however some versions of Octave announce that in future versions it will be norm
The argument params contains the parameters of the distribution.

Content

The above order computes the p-value of the sample in the test whose null hypothesis is

7.1

7.1

Aim of the practice . . . . . . . . . . . . . . . . . . . . . . . . . . . .

61

7.2

H0 : the sample in x is drawn from the distribution dist with parameters params.

Basic orders for goodness of t tests, a test for randomness, and

the Kolmogorov-Smirnov test for two samples . . . . . . . . . . .

7.3

61

Exercises of Lab Practice 7

64

For instance,
>> kolmogorov smirnov test(x, unif orm, 2, 4)
calculates the p-value of the sample in x of the test whose null hypothesis is that the sample is
drawn from a uniform distribution on the interval (2, 4), that is,

. . . . . . . . . . . . . . . . . . . . . . .

r
o

l
e

Aim of the practice

h
c

In this practice we will see Octave instructions for some goodness of t tests, the Kolmogorov-Smirnov
test for two samples and a test for randomness.

7.2

a
B

Basic orders for goodness of t tests, a test for randomness, and the Kolmogorov-Smirnov test for two
samples

The orders for the above aims are shown in this section.
The main orders in this context are the following:

61

H0 : the data are drawn from a U(2,4) random variable.

The order
>> kolmogorov smirnov test(x, normal, 0, 2)
obtains the p-value of the samplein x in the test whose null hypothesis is that data are drawn
from a normal distribution N (0, 2), that is,

H0 : data are drawn from a N (0, 2) random variable.

It is very important to remark that in the case of the normal distribution, we should write in
the order kolmogorov smirnov test the variance instead of the standard deviation.
3) A goodness of t test for a discrete population: to our knowledge the chi-square goodness of t
test has not been implemented in Octave. So it is necessary to program such a test. For such a
purpose we briey recall this test.
Suppose that we have a random sample of size n from a random experiment. Each observation
is classied in one and only one of k possible outcomes. Let A1 , A2 , . . . , Ak be such outcomes.
Statistics

CHAPTER 7. LAB PRACTICE 7.- GOODNESS OF FIT TESTS, A TEST FOR RANDOMNESS, AND THE

63

KOLMOGOROV-SMIRNOV TEST FOR TWO SAMPLES.

H0 : pi = P (Ai ), 1 i k,

The null hypothesis of the test is

i
g

i=1 pi = 1.

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

Let ni be the number of observations of Ai (observed frequency) in the sample. If the null
hypothesis is true, it is expected to observe npi times the outcome Ai (expected frequency).

Stat =

k

i=1

)2

(ni npi
,
npi

In such an order, the data of the two samples are collected in vectors x and y.
The above order computes the p-value of the samples in the test.

The critical rejections at a level of signicance is

k

(ni npi )2
2k1, .
CR = samples such that
npi
A way to program the test is the following. Let us suppose that
x is the frequency vector, that is, x = (n1 , n2 , . . . , nk ) and

7.3

The value of the statistic and the p-value of the sample can be computed by means of the
following program:

r
o

In the variable stat we obtain the value of the statistic, the variable pvalue stores the p-value
of the sample.

l
e

h
c

Type 2 port: 1318.4, 1299.7, 1301.3, 1294.7, 1310.8, 1306.3, 1296.2, 1309.6, 1287.5, 1310.7,
1303.2, 1308.9, 1307.7, 1297.2, 1314.6
the samples can be found in the le DatPrac7Ejer1 which can be loaded from Campus Virtual.
Matrix A contains data of the rst port, matrix B data of the second port.

i) could we consider that the data of the rst port are drawn from a normal distribution with
mean 1310 and standard deviation 12?

iii) could we consider that data of the rst port are drawn from a normal distribution?
iv) could we consider that data of the rst port and data of the second one are drawn from the
same distribution?

In the above order, vector x should contain the sample.

a
B

Type 1 port: 1199.3, 1200.2, 1200.9, 1198.3, 1200.5, 1200.3, 1200.5, 1199.5, 1200.8, 1200.0,
1200.4, 1199.7, 1199.7, 1199.7, 1198.7, 1199.4, 1200.5, 1199.8, 1199.1, 1199.4

ii) and the data of the second kind of port?

4) A test for randomness: the tests we have considered until now assume that the sample(s) used
to infer is(are) random. A numeric sequence is said to be statistically random when it contains
no recognizable patterns or regularities.
We can check if a sample is random with the order
>> run test(x)

Exercises of Lab Practice 7

Exercise 7.1. Information rates sent by two kinds of ports are analyzed. The observed rates, in bytes
per second, are the following

probabilities,

>> expected = sum(x) p/sum(p);

>> stat = sum((x expected). 2./expected);
>> pvalue = 1 chi2cdf (stat, length(p) 1)

The order for the above purpose is

>> kolmogorov smirnov test 2(x, y)

which under the null hypothesis follows a distribution (approximately) 2k1 .

i=1

64

We want to study if X and Y have the same distribution, which will be denoted by X Y .
Note that we do not specify the distribution.

k

i
r
e
e
n

where pi > 0 and

g
n

CHAPTER 7. LAB PRACTICE 7.- GOODNESS OF FIT TESTS, A TEST FOR RANDOMNESS, AND THE

We store the two samples in two matrix A and B respectively with

>> A = [1199.3; 1200.2; 1200.9; 1198.3; 1200.5; 1200.3; 1200.5; 1199.5; 1200.8; 1200.0; 1200.4; 1199.7;
1199.7; 1199.7; 1198.7; 1199.4; 1200.5; 1199.8; 1199.1; 1199.4]
>> B = [1318.4; 1299.7; 1301.3; 1294.7; 1310.8; 1306.3; 1296.2; 1309.6; 1287.5; 1310.7; 1303.2; 1308.9;
1307.7; 1297.2; 1314.6]

That order computes the p-value of the sample in the test whose null hypothesis is
H0 : the sequence of numbers in vector x is random.
5) The Kolmogorov-Smirnov test for two samples: the Kolmogorov-Smirnov test for two samples is
a test of whether two independent random samples have been drawn from the same continuous
random variable or from random variables with the same continuous distribution.
Let us suppose that we have two random samples drawn from continuous independent random
variables X and Y .
Statistics

With respect to the rst question, we consider the order

>> p1 = kolmogorov smirnov test(A, normal, 1310, 144)
Note that in the order kolmogorov smirnov test when we consider the case of a completely specied
normal distribution, we should introduce the variance instead of the standard deviation (144 instead
of 12).
Statistics

CHAPTER 7. LAB PRACTICE 7.- GOODNESS OF FIT TESTS, A TEST FOR RANDOMNESS, AND THE

65

KOLMOGOROV-SMIRNOV TEST FOR TWO SAMPLES.

g
n

i
r
e
e
n

CHAPTER 7. LAB PRACTICE 7.- GOODNESS OF FIT TESTS, A TEST FOR RANDOMNESS, AND THE
KOLMOGOROV-SMIRNOV TEST FOR TWO SAMPLES.

The p-value we obtain is almost equal to 0, therefore we have enough evidence to reject the hypothesis
in the question i).

which collects the probabilities of each quality.

The p-value we obtain is pvalue = 0.96940. Therefore we do not reject the tested hypothesis.

To approach point ii) we perform the order

p2 =>> kolmogorov smirnov test(B, normal, 1310, 144)
obtaining the p-value p2 = 0.062242. Therefore there is not enough evidence to reject that data of type
2 port are drawn from a normal distribution with mean 1310 and standard deviation 12.

The second question is solved in the same way. Now the vector of probabilities is
p = [0.25, 0.1, 0.1, 0.3, 0.05, 0.2]

i
g

In this case the p-value is equal to 0. Thus we have enough evidence to reject the proposed model
of probabilities or proportions.

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

In relation to iii), we want to test if the data of type 1 port follow a normal distribution (without
specifying parameters). For such a purpose we consider the order
>> p3 = kolmogorov smirnov test(A, normal, mean(A), var(A, 0))
which gives a p-value equal to p3 = 0.990163 Therefore we should not reject the above hypothesis, and
so we conclude that the information rates sent by the rst port follow normal distribution.
In iv) we want to test if both data come from the same (continuous) distribution. Note that both
information rates are independent. We will use the order
>> p4 = kolmogorov smirnov test 2(A, B)
which gives a p-value p4 = 7.1776e 008. Therefore we have enough evidence to reject such a hypothesis.
Note that Octave advertises that the p-value is approximate, however, it is so small that there are no
doubts on the nal conclusion.

66

Exercise 7.3. Taken at random some processors of two brands A and B, the following processor
speeds were found:
Make A: 2.7, 2.65, 2.83, 2.95, 2.64, 2.45, 3.01, 2.56, 2.76, 2.99, 2.76, 2.87, 3.05, 2.65, 3.09, 3.05,
2.99, 2.67, 3.02

Make B: 2.87, 2.76, 2.78, 2.67, 2.97, 2.99, 2.87, 2.93, 2.78, 2.98, 3.01, 2.84, 2.88, 2.79, 2.89, 2.91,
2.87, 2.88, 2.92
the samples can be found in the le DatPrac7Ejer3 which can be loaded from Campus Virtual.
Matrix A contains data of the rst manufacturer, matrix B data of the second one.

Could we conclude that the speeds of both processors are the same?

Exercise 7.2. A company wants to evaluate a new procedure to send information through the
network. For such a purpose it takes 218 les of the same size and measures the quality of the
transmission. Such a quality is classied in six exclusive categories, namely, Excellent, Good, Normal,
Regular, Decient and Very Decient. The result appears in the following table:
Quality
Nu. Files

Excellent
24

Good
36

Normal
52

Regular
30

Dec.
44

r
o

Very Dec.
32

i) Could we consider that for the new procedure to send information, the qualities Excellent, Good,
Normal, Regular, Decient and Very Decient have probabilities of occurring 0.1, 0.15, 0.25,
0.15, 0.2 and 0.15 respectively?

l
e

ii) And probabilities 0.25, 0.1, 0.1, 0.3, 0.05 and 0.2 respectively?

h
c

For the above test we perform the order

>> pval = kolmogorov smirnov test 2(A, B)
which gives to the variable pval the p-value of the samples in the above test.

In this case we obtain that pval = 0.15164. Therefore there is not enough evidence to reject the
above hypothesis, that is, both speeds are the same (they follow the same distribution).
Exercise 7.4. Could we consider that the speeds follow a normal distribution in the case of make A?

In both cases we should consider a goodness of t test for a discrete distribution. Therefore we
will use the chi-square goodness t test .
The program we have designed to solve such a test is
>> expected = sum(x) p/sum(p);
>> stat = sum((x expected). 2./expected);
>> pvalue = 1 chi2cdf (stat, length(p) 1)
where x is the vector of observed frequencies and p is the vector of probabilities or values proportional
to such probabilities.

a
B

We test if both samples follow the same distribution. Note that both speeds are continuous and
independent.
We store data in two matrices A and B
>> A = [2.7; 2.65; 2.83; 2.95; 2.64; 2.45; 3.01; 2.56; 2.76; 2.99; 2.76; 2.87; 3.05; 2.65; 3.09; 3.05;
2.99; 2.67; 3.02]
>> B = [2.87; 2.76; 2.78; 2.67; 2.97; 2.99; 2.87; 2.93; 2.78; 2.98; 3.01; 2.84; 2.88; 2.79; 2.89; 2.91;
2.87; 2.88; 2.92]

The solution is given by the order

>> p1 = kolmogorov smirnov test(A, normal, mean(A), var(A, 0))
which gives the p-value of the sample in the test whose null hypothesis is processor speeds follow a
normal distribution in the case of the make A.
In this case the p-value is p1 = 2.6284e 005. Therefore there is not enough evidence to reject
such a hypothesis.
Exercise 7.5. Given the samples of Exercise 7.3, could we consider that they are random?

In our case
>> x = [24, 36, 52, 30, 44, 32]
which are the observed frequencies of each of the above dierent qualities, and
p = [0.1, 0.15, 0.25, 0.15, 0.2, 0.15]

We should apply a randomness tests to the samples which are in matrices A and B.
We consider the orders
>> p1 = run test(A)
Statistics

Statistics

g
n

i
r
e
e
n

CHAPTER 7. LAB PRACTICE 7.- GOODNESS OF FIT TESTS, A TEST FOR RANDOMNESS, AND THE

67

>> p2 = run test(B)

The p-values of the samples in A and B are p1 = 0.17826 and p2 = 0.53829 respectively. Therefore
we should not reject the randomness of any of the above sample.
Exercise 7.6. In Lab Practice 1 we saw that the order >> rand(n, m) generates a matrix of size
n m with random numbers of the interval (0, 1). How could we check that such an order works
properly?

We can generate numbers with the order rand and check if they are random and follow a uniform
distribution on the interval (0, 1).

We perform this mechanism a large number of times instead of only one time to avoid that a
strange sample make us to take a wrong conclusion.
In each repetition we compute a p-value in relation to randomness of the sample and a p-value
in relation to the question of the uniform distribution on the interval (0, 1).

Once we have both sets of p-values, we obtain a summary p-value of each of them to obtain the
nal conclusion. Normally the summary p-value is the mean or median of all the p-values.
For instance, we could consider
>> A = rand(500, 1000);
>> f or i = 1 : 500
>> pval(i) = run test(transpose(A(i, :)));
>> pvalor(i) = kolmogorov smirnov test(A(i, :), unif orm, 0, 1);
>> endf or
>> w1 = mean(pval)
>> w2 = median(pval)
>> v1 = mean(pvalor)
>> v2 = median(pvalor)

r
o

In the rst line we generate 500 samples (500 rows) of size 1000 by means of the order rand.

l
e

For each of the above 500 samples (each row) we obtain the p-value of the test whose null hypothesis is
H0 : the sample of the row i is random

h
c

(third line of the program), and also the p-value of the test whose null hypothesis is
H0 : row i is drawn from a uniform distribution on the interval (0, 1)

a
B

i
g

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

We try to nd out if the order rand generates random numbers of the interval (0, 1). We could
arm that the order works properly if generated numbers are random, and follow a uniform distribution
on the interval (0, 1).

(fourth line of the program).

Such p-values are stored in the vectors pval and pvalor respectively.
In the last lines of the program we obtain the mean (w1, v1) and median (w2, v2) of the both
sets of p-values.
In our simulations we obtain values of w1, v1, w2 and v2 close to 0.5. Therefore we conclude
that the order rand generates numbers on the interval (0, 1) which are random, and follow the uniform
distribution on the interval (0, 1).
Hence we should conclude that the order rand works properly.
Statistics

g
n

Bibliography

4
n
1
E
0
2 re
3
a
s
1
w
c
0
i
t
2 of ist
t
S
a
f
t
o S

[1] Eaton, J.W., Bateman, D., Hauberg, S. GNU Octave Manual Version 3. Network Theory Limited,
2008
[2] Navidi, W. Statistics for Engineers and Scientists. McGraw-Hill Companies, Inc., 2009

[3] Montgomery, D.C., Runger, G.C. Applied Statistics and Probability for Engineers. John Wiley
and Sons Inc, 2005
[4] Prakasa Rao, B. L. S. A rst course in probability and statistics. World Scientic Publishing Co.
Pte. Ltd., Hackensack, NJ, 2009
[5] Trivedi, K. Probability and Statistics with Reliability, Queueing and Computer Science Applications. John Wiley and Sons, 2002.
[6] Rohatgi, V.K., Ehsanes Saleh, A.K. An Introduction to Probability and Statistics. Wiley 2001.

l
e

r
o

h
c

a
B

69

i
g

i
r
e
e
n