You are on page 1of 6

CIS 501 (Fall 2011) Lab 2

Wei Lee Woon, CIS Program, Masdar Institute, Abu Dhabi, UAE

Introduction
1. (Quickly) tie up a few loose ends from the last laboratory, particularly plotting and using functions/scripts. 2. Implement a kNN classier, and use it to classify data from a well known, publicly available data set.

In todays laboratory, the aim is to achieve the following:

Plotting

Plotting numerical results in Octave is extremely easy. For instance try to run the following code: x = linspace(0, 2*pi, 100); y = sin(x); plot(y); The previous commands display the sin functions on the screen in a separate window. The command plot takes, in the simplest form, only one argument: the values of the y-axis. If you try: plot(x, y); the plot is the same but this time the value of y-axis is plotted against the x-axis. Further, we can make more complicated graphical plot in octave. For instance lets try to plot a second curve: z = cos(x); hold on plot(x, z,*r); hold off 1

The previous commands display a second curve on the top of the previous one. First, the hold on operator tells Octave not to overdraw the previous plot. The hold off operator releases Octave to hold the gure. Try again: plot(x, y); In general without using hold on Octave is always in the hold off mode. Second, there is an extra third argument in the plot command, i.e. *r.plot accepts extra arguments after the x and y in order to set the graphical characteristics of the plots. I suggest to look to the octave help( help ones or more conveniently in the online documentation) to became familiar with dierent characteristics. The following command: axis tight is often used to help tidy up a plot and prepare it for inclusion in a report, for example. Once youre happy with your plot, use the following command to generate a graphic of the plot: print -djpg <filename>.jpg Conrm that this generates a jpeg le with the specied name. Use the help command to determine other possible le formats that can be generated using the print command. You should also take some time to familiarize yourself further with the octave plotting facilities, using the trusty help command.

Functions and Scripts

There are many places where we want to write a function that manipulates your workow of operations. So far we have worked with the interactive session of Octave. However, Octave is also a programming language which includes batch sessions, i.e. you can store your set of command into a le and run them all together later. These les are known as M-les (in common with their largelycompatible Matlab siblings). They are useful for automating computations you have to perform repeatedly from the command line. There are two main options in Octave: Scripts A script is the simplest kind of M-le that contains a sequence of statements. Lets open a new text le. In Linux you can use for instance Emacs. Write the previous plot commands: a=100; x = linspace(0, 2*pi, a); y = sin(x); plot(x, y); 2

When you nish, save the le as plotScript.m in the directory lab1 you have created. An M-le has always a .m extension. Now go back to the Octave command line and type plotScript As you can see a script runs the operations described in the le and can produce graphical output using commands like plot. Moreover, scripts can operate on existing data in the workspace, or they can create new data on which to operate. To check this, try clear-ing all variables from the workspace, then running plotScript. If you check the workspace again, you will notice that the variables a, x, y are now present. Functions Functions are special scripts which can accept input arguments and return output arguments. In this case all the internal variables are local to the function. Using the previous example we can write the following function. In a new le plotFunction.m try: function y1=plotFunction(a1) x1 = linspace(0, 2*pi, a1); y1 = sin(x1); plot(x1, y1); After which, run the following code: out=functionScript(100); You can see that octave plots the gure. However, if you type whos none of the internal variables of plotFunction.m are available in the workspace.

4
4.1

The k-NN classier


Introduction

This is the main activity associated with todays laboratory. You are required to implement two versions of the k-NN classier, which was covered in the previous lecture. To test your classier, you will be using a slightly modied version of the classic Iris data set1 . Two les have been provided to you: iris tra, containing the training instances, and iris tes, containing the test instances. Each set contains 75 instances and can be loaded directly into Octave using the load command. The following is a portion of the isis tra le:
1 http://en.wikipedia.org/wiki/Iris

ower data set

0.224 0.749 0.557 . . 0.224 0.529

0.624 0.502 0.541

0.067 0.627 0.847

0.043 0.541 1.000

1.0 0.0 0.0

0.0 1.0 0.0

0.0 0.0 1.0

0.208 0.584

0.337 0.745

0.416 0.918

0.0 0.0

1.0 0.0

0.0 1.0

As can be seen, it is a plain text le, containing 75 7 numbers. The rst four columns are the feature vectors (1 per row), while the last three columns are the class labels. Each column corresponds to one of the three Species of Iris owers. The le iris tes is structured similarly but as mentioned, this subset of the data is to be used for testing purposes only.

4.2

The task

Assume that you only know the labels of the instances in the training set. Build two k-NN classiers: an unweighted classier, and one which uses a distance weighting scheme. Please implement the following two weighting schemes: 1. Inverse distance: w= 2. Gaussian based: w = exp d2 Your code should at least include two functions myknn and myweightedknn. Function myknn should have the following function signature: function output=myknn(training_features,training_labels,test_features,k) training_features -> training features in ntraining x dim format training_labels -> labels for training instances, ntraining x nclasses format test_features -> test features in ntest x dim format k -> Number of neighbours to consider output -> ntest x 1 vector of predicted classes (ntraining and ntest are the number of training and test instances respectively. dim is the dimensionality of the feature space and nclasses is the number of classes) Function myweightedknn will be very similar, but with an additional parameter weightfunction, as follows: function output=myweightedknn(training_features,training_labels, ... ... test_features,k,weightfunction) training_features -> training features in ntraining x dim format 4 1 d

training_labels -> labels for training instances, ntraining x nclasses format test_features -> test features in ntest x dim format k -> Number of neighbours to consider output -> ntest x 1 vector of predicted classes weightfunction -> 1 (inverse) 2 (gaussian-based) This help text is from myknn, but myweightedknn should have the same signature (but should implemented the distance weighted k-NN classier instead). Once you have completed your classier, you can use the script genlabresults.m (available on Moodle) to test your classiers over a range of values of k. You might also want to note that genlabresults.m uses several functions, namely legend, title and axis, which have not been covered before, but which will very likely come in useful in the future. You should be able to produce the plot shown in gure 1 (please check).

30 Unweighted Inverse weigted Gaussian weighted

25

20

15

10

10

15

20

25

30

35

40

45

50

Figure 1: k ranging from 2 to 20, line plots

4.3

Submission guidelines

Deadline for submission is 12pm, 10th of October (next Monday). Late submissions will be rejected. As usual, only electronic submissions will be accepted. The following is the submission procedure (which is almost the same as the previous laboratory): 1. Submit only the M-les in your implementation of the myknn and myweightedknn functions. These should be sent as attachments in your submission e-mails (along with any additional support functions which are required to run your code). 2. Send your solutions via e-mail to the TA, Khulood Al Junaibi (e-mail: kaljunaibi@masdar.ac.ae ), and CC a copy to me. 3. Format your subject line as follows: [CIS501] Fall 2011, Assignment #2 solution. Name: <Your name> 4. If you do not get an acknowledgement e-mail from the TA, please re-send the assignment.

You might also like