You are on page 1of 9

Telecommunications Engineering

Universidad Carlos III de Madrid


Statistics

Assignment 1: Introduction to MATLAB and Descriptive Statistics

Group Students Signatures

Jorge Amat Zapatero


69 Natalia Paz García
Eva María Sánchez de Rojas Luján

IMPORTANT: The teachers of this course apply a ‘zero tolerance’ policy regarding academic
dishonesty. Students that sign up this document agree to deliver an original work. The breach
of this commitment will result in academic punishment.

Observations:

Solve the exercises in the Assignment1.pdf file. Note: It is advisable to consult the
manual for basic operation of MATLAB / Octave available on the website of the course.
_______________________________________________________________

1. Analysis of a data set

1. a) Calculate the frequency table of variable Area. The table must include the absolute,
relative, cumulative absolute and cumulative relative frequencies.

>> table = tabulate(Area)


The command tabulate calculates
table = the absolute frequencies (Count,
2nd column) and the relative
1.0000 178.0000 37.2385 frequencies in % (Percent, 3rd
2.0000 151.0000 31.5900 column).
3.0000 149.0000 31.1715

>> abs_acum = cumsum(table(:,2)) 2nd column: absolute

abs_acum =

178
329 We can calculate cumulative
478 frequencies by means of command
cumsum.
>> rel_acum = cumsum(table(:,3))

rel_acum =
3rd column: relative
37.2385
68.8285
100.0000

___________________________________________________________________________
Telecommunications Engineering – Statistics – Academic year 2018/2019
4.- Analyse the variables Gender and Goals in a double entry table. Calculate the absolute
frequency table with its marginal distributions and the relative frequency table with its
marginal distributions. (1.5 points)

a)
In this double entry table, the rows represent the gender and the columns represents the
goals (for example, the gender 1 with the goal 1 happens to be 117 times).

b)
We know the absolute frequency of a type of element is the number of times that element is
repeated, and the relative frequency is that number divided into the total number of
elements. Therefore, we can observe when tabulating both variables the absolute frequency
of each value is on the column “count” (for example, regarding the Goals the value 1 is
repeated 247 times).

On the other hand, the relative frequency is located on the “Percent” column, where we can
see the proportion of that variable with respect to the others (in this case the relative
frequency would be Percent/100)

___________________________________________________________________________
Telecommunications Engineering – Statistics – Academic year 2018/2019
2. Linear Transformations
1. Change of units. Consider the matrix internet in the file internet.mat, and consider the
variable MB (“downloaded Mb”). Define a new variable, KB, as the nº of downloaded
Kb, recall “1Mb = 1024Kb”. The new variable is the result of a linear transformation of
the form y = a + bx. From this transformation, check with MATLAB/Octave the next
theoretical relations: (2 points)

a) y = a + bx.

First, we calculate the total MB downloaded

Then we transform the first column(which are the MB into KB)

b) ymed = a + bxmed, where med is the median.

c) s 2 y = b 2 s 2 x , where s 2 is the sample quasi-variance.

d) sy = |b|sx, where s is the sample quasi-standard deviation.

___________________________________________________________________________
Telecommunications Engineering – Statistics – Academic year 2018/2019
Complete frequency table

>> table = [ table abs_acum rel_acum ]

table = Area:
The school is in an
1.0000 178.0000 37.2385 178.0000 37.2385 1 = urban,
2.0000 151.0000 31.5900 329.0000 68.8285 2 = sub-urban, or
3.0000 149.0000 31.1715 478.0000 100.0000 3 = rural area.

Absolute frequency Relative frequency Absolute cumulative Relative cumulative


Value
(Count) (Percent) frequency frequency
1.0000 178.0000 37.2385 178.0000 37.2385
2.0000 151.0000 31.5900 329.0000 68.8285
3.0000 149.0000 31.1715 478.0000 100.0000

b) In which of the three types of areas most students are concentrated?

In urban areas (Area = 1)

2.a) What is the proportion of boys and girls? Represent graphically that proportion with a
bar and a pie chart.
Relative
>> table = tabulate (Gender)
Gender Absolute frequency
(1 = Boys frequency (percent)
table =
2 = Girls) (count) -
PROPORTION
1.0000 227.0000 47.4895
1.0000 227.0000 47.4895
2.0000 251.0000 52.5105
2.0000 251.0000 52.5105

>> bar(table(:,3))
>> pie(table(:,3))

BAR CHART PIE CHART

Boys Girls

___________________________________________________________________________
Telecommunications Engineering – Statistics – Academic year 2018/2019
b) What is the proportion of boys and girls whose schools are established in urban areas?

>> Gender_Area = [Gender Area]

%First, we create a matrix with 2 columns, the first one represents the Gender and the
second, the Area. In other words, we join the Gender and Area vectors in a single matrix in
order to compare the values

>> Boys_urban = (Gender_Area(:,1) == 1 & Gender_Area(:,2)==1)

%Then, we create a vector (with 478 elements(total)) called “Boys_urban” that will have 1s
when the following conditions are met: first column is a 1 (Boy) and second column is also
a 1 (Urban Area).

>> Boys_urban = sum(Boys_urban,1)

Boys_urban =

72

%Now, we sum all the 1s in our Boys_urban vector (Therefore, counting how many boys
live in urban areas)

>> Boys_urban_percent= (Boys_urban*100)/227

Boys_urban_percent =

31.7181

%Finally, as we are asked for a percentage (proportion):


(nºboys who live in urban areas*100)/total nº of boys
Obtaining the percentage of boys who live in urban areas

We repeat the same process for the girls:

>> Girls_urban = (Gender_Area(:,1) == 2 & Gender_Area(:,2)==1)

>> Girls_urban = sum(Girls_urban,1)

Girls_urban =

106

>> Girls_urban_percent= (Girls_urban*100)/251

Girls_urban_percent =

42.2311

Therefore, the proportion of boys whose schools are established in urban areas is
31.7181% and the proportion of girls is 42.2311%
___________________________________________________________________________
Telecommunications Engineering – Statistics – Academic year 2018/2019
3. a) Do histograms of the variables Grade and Age by the variable Goals.

>> histg(Grade,Goals) >> histg(Age,Goals)

To create a histogram by groups we need In order to use this command we


to use the command histg, including had to save the file histg.m in our
between brackets the groups working directory

b) Calculate the mean and the standard deviation of the variables Grades and Sports by Age
groups.

>> >>
As we have
[MEANS,STDS]=grpstats(Grades,Age) created a histogram by groups, which means we are considering
[MEANS,STDS]=grpstats(Sports,Age)
various variables when performing the calculations, the mean and the standard
MEANS = deviation cannot be calculated
MEANSwith= the average commands mean(x) and std(x).
To calculate statistics by groups, we will make use of the command grpstats.
1.0000 We can perform different2.0000
operations within the same command, as it is written
2.3800 above 2.3400
2.3485 2.0379
2.8571 2.0582
2.9038 1.8077
2.7500 2.2500

STDS = STDS =

0 0
0.1117 0.0997
0.0860 0.0870
0.0782 0.0692
0.1353 0.1318
2.
0.7500 0.6292

___________________________________________________________________________
Telecommunications Engineering – Statistics – Academic year 2018/2019
2. Linear Transformations

1.
2. Standardization of variables. Consider the variable MB, and denote it as x. Define a new
variable y as the result of the standardization of x. The standardization consist to apply a
linear transformation such that subtracts the mean value and divides by its standard
deviation. The resulting variable has zero mean, and standard deviation and variance equal
to one.

>> MB=internet(:,1)
In the file internet.mat, which we need to (1)
import into our workspace to do the MB =
practice, we have a matrix with 4 columns. …
The first column corresponds to the
variable MB. To denote it as x, first we
isolate it as a single variable (1), and then >> x=MB
we simply create the variable x and make it
equal to MB (2). x=

(2)

Now, to perform the >> y=(x-mean(x))/std(x) >> var(y)


standardization, we apply the given
rule: y= ans =

Also, we can check if we had 1.0000
created y correctly, by applying the a)
given criteria: >> mean(y)
mean(y)=0,std(y)=1,var(y)=1 >> std(y)
ans =
Determine the values of a and b of the
ans =
-6.7782e-17
As it is so low, this value is
considered to be 0. 1
corresponding linear
transformation y = a + bx

Knowing that y=(x-mean(x))/std(x), then: a=-mean(x)/std(x) and b=1/std(x)

>> a=-mean(x)/std(x) >> b=1/std(x)

a= b=

-21.2252 b) 0.1215
b) Obtain the new standardized variable y and check in MATLAB/Octave, the next results:
___________________________________________________________________________
Telecommunications Engineering – Statistics – Academic year 2018/2019
y = 0, s^2 y = 1 y sy = 1.

Following the equation y=a+bx, we >> newy=a+b*x


create a new standardized variable
called newy. newy =

Now we check the results: Again, this value is considered 0.

>> mean(newy) >> std(newy)*std(newy) >> std(newy)

ans = ans = ans =

-1.9446e-15 1 1

3. Correlation between linearly transformed variables

When we open the file internet.mat we find a 95x4 matrix, where the first column
corresponds to MB and the second to the connection time in hours. So first, we create the variables
x for the downloaded MB and v for the connection time in hours:

>> x = internet(:,1)
LINEAR TRANSF. REQUIRED:
>> v = internet(:,2) y = a + bx & u = c + dv

Then, we make the required conversions and create 2 new variables, y and u:

“1Mb = 1024Kb” “1h = 3600s”

y = nº od downloaded KB u = connection time in seconds

b = 1024 d = 3600

>> y = x*1024 >> u = v*3600

Finally, we calculate the correlation coefficients:

___________________________________________________________________________
Telecommunications Engineering – Statistics – Academic year 2018/2019
ρy,u = >> corrcoef(y,u)
Therefore, we have check that the following expression is, in
ans = fact, true:
1.0000 0.7686
0.7686 1.0000

ρx,v = >> corrcoef(x,v)


which indicates that the correlation coefficient between two
ans = variables does not change if a change of units is applied.
1.0000 0.7686
0.7686 1.0000

___________________________________________________________________________
Telecommunications Engineering – Statistics – Academic year 2018/2019

You might also like