49 views

Uploaded by FcoKraken

save

- Image Analysis and Pattern Recognition for Remote Sensing with Algorithms in ENVI/IDL
- PCA_FACE_RECOGNITION_REPORT
- robpca
- Eigenbased Multispectral Palmprint Recognition
- Face Recognition Using Laplacianfaces (Synopsis)
- system requirement specification for face recognition system
- V01I031112
- 1111_10 Gabor Filter
- Principal Component Analysis
- Pa Mi 2011
- Krzanowski - Sensitivity of Principal Components - 1984
- p 2008 Liu Extracting Principle Components for ant Analysis of Fmri Images
- Liam_Mescall - PCA Project
- Periocular Recognition Using Reduced Features
- Full Text 1
- QMFMULT2
- MST209_2005solutions
- IEEE_04587465
- Improved PCA Algorithm for Face Recognition
- 85109 Hw 3 Solutions
- Ppc A
- Attachment 1
- Modal Analysis - Solved example
- PAPER Unsupervised Learning in Neural Computation
- Eigen-04
- Rasch-based high-dimensionality data reduction and class prediction with applications to microarray gene expression data
- w15.pdf
- IRJET-Analysis of Unique Gait Patterns by Extraction of Salient Features
- mnf
- CS190.1x_week5
- Certificado Responsabilidad Regularizacion ALGARROBO
- Topicosgestion - Resumen Control 1
- Reunion Semanal 7 1.0
- Reunion Semanal 7 1.0
- BATCH 14
- DR ZAILAN MOHD ISA Customer Satisfaction in the Management o[1]
- 12
- Little Book of r for Multivariate Analysis
- Data Analysis
- Spatio temporal variation water quality
- HRM
- Tpa
- tmp252E.tmp
- Chapter 7
- 325206788 Ganapathy Vidyamurthy Pairs Trading
- Face Recognition
- Image Transform 2
- 7262410 Data Mining Handbook
- Fundamentos (Libro de Imagenes Hiperespectrales)
- Hand Motion
- August 1983 the Role of Habitat Complexity and Heterogeneity in Structuring Tropical Mammal Communities
- acp_avec_r
- Structural Design of Reinforced Concrete Pile Caps
- Report
- A Review on Tomato Authenticity Quality Control Methods in Conjunction With Multivariate Analysis (Chemometrics)
- Machine Learning Cheat Sheet 2015.pdf
- SLS_corrected_1.4.16.pdf
- ppca
- A Note on Fingerprint Recognition Systems---IEEE2011
- ch3_unit2
- Jei
- A Core Gut Microbiome in Obese and Lean Twins
- AASR-2011-2-5-84-91
- Health Care Fraud Detection Using Non Negative

You are on page 1of 15

**School of Computer Science
**

MSc in Advanced Computer Science

Imaging and Visualisation Systems

Visualisation Assignment

John S. Montgomery

msc37jxm@cs.bham.ac.uk

May 2, 2004

The data consists of a the following thirteen fields: • Alcohol • Malic acid • Ash • Alcalinity of ash • Magnesium • Total phenols • Flavanoids • Nonflavanoid phenols • Proanthocyanins • Color intensity • Hue • OD280/OD315 of diluted wines • Proline All of the fields represent continuous data. The data was contained as comma separated values in a plain text file. The first element of each row contains a number (1-3) that gives the samples class. The other fields follow after this element. directly. 1 . which has the advantage of making normalisation much easier. Parsing was therefore straightforward and mainly involved separating out the class numbers from the rest of the data. used by the PCA and SOM methods.1 Data Description and Format The data set chosen for visualisation was of three different varieties of wine. as the class information is not.Chapter 1 Introduction 1.

the visualisations were made using a simple author crafted Python script (see Appendix A). 2.pfdubois.to the point that they could 1 http://www. Figure 2. In this case two of the classes seriously overlap . It shows that two of the classes are fairly easy to separate based purely on alcohol content and those two classes tend to have a similar flavanoids content to each other. The “MLab” package from Numeric Python1 was used for calcu- lating the covariance matrix of the data and then the eigen vectors and values for use in PCA. Also by plotting the three different classes in distinct colours it was very easy to scan thumbnail images of graphs in a directory for interesting graphs (Figure 2. With thirteen fields this results in 78 graphs (n ∗ (n − 1)/2). with the only overlap occurring between those two (red circles and green squares). In this case the third class data points (blue triangles) are completely separate from the other two classes. but as the graphs were generated by a script is was straightforward. but for a very quick and easy approach this is not bad at all.1).3 shows a graph that is not as well separated. There is a certain amount of overlap. but compares the flavanoids with “colour intensity”. For actually plotting the various data points Matplotlib2 was used. By way of comparison Figure 2.sourceforge.Chapter 2 Visualisation Together with the code for Self-Organising Maps provided for this assignment. This may seem like a lot. but different from the third class. Most of the graphs showed obvious clusters of each class.com/numpy/ 2 http://matplotlib.3 shows a similar story.2 shows the alcohol content compared with the “flavanoids”3 content.1 Basic To first get an overall feel for the data I simply plotted every pair of fields against each other. Figure 2.net/ 3A type of organic chemical 2 .

3 . It would appear that “ash” content. is not a reliable indicator! It is obvious from these simple graphs that the data is actually quite two dimensional. No one data field on its own seemed enough to separate all three classes. almost be drawn from the same classes. Figure 2. but two combined seems to do a pretty good job. as used in this graph.1: Graph Thumbnails.

3: Flavanoids vs.2: Alcohol vs. Colour Intensity.6 1.4 1. Flavanoids.6 0 11 11.2 14.6 3.8 12. 6 Wine Properties 5.2 4.8 14.8 10.4 3 3. 13 Wine Properties 11.6 4.2 Color intensity 7 5.8 5.2 1.8 4.4 2.2 0. 4 .2 3.4 6 Flavanoids Figure 2.8 4.4 4.4 13.2 1 0 0.4 8.8 1.4 11.6 Flavanoids 3 2.8 2.2 12.6 15 Alcohol Figure 2.6 13 13.6 9.

5 2.3 1.5 1.1 2.4 8.8 4.3 Ash Figure 2.4: Ash vs.8 10.3 2. Colour Intensity.4 2. 5 .2 1 1.6 3. 13 Wine Properties 11.9 2.6 9.2 Color intensity 7 5.9 3.7 2.7 1.1 3.

2. 6 . The codebook vectors generated for the final map would probably be very useful for “guessing” which class a new sample of wine belongs to.2.5: SOM. as shown in Figure 2. Then these values are used to rescale the data thus: vnorm = (v − vmin )/(vmax − vmin ) Where v is the original value. so that all values varied in the range 0 to 1.2 Self Organising Maps 2. 11 Wine SOM 10 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 Figure 2. 2. To do this the minimum and maximum values in each field were first found.2.1 Preprocessing The raw data (minus the category field) was normalised. vmin and vmax are the minimum and maximum values respectively and vnorm is the value after normalising.2 Visualisation The application code provided was used to generate a 10x10 map.5. The map shows the three classes have been separated very well and in fact there is only one overlapping point. by using a simple “nearest neighbour” approximation.

The largest fields in each vector are for alcohol (-0.25 -0. 7 . The ordered eigen values are shown in 2.25 -0.00 -0. Therefore the eigen vectors used for plotting the data for the visualisation in Figure 2.3.09 -0. which has a low value in the first eigen vector (-0.0).7 shows data that has been very well separated.18 -0.07 -0.6.24 0. This would seem to suggest that the data is quite two dimensional.7 were (rounded to 2 d.3 Principle Component Analysis 2.22 -0.1 Preprocessing The data was normalised.16) and an even lower value in the second (when rounded it was 0.2.1.29 ] This shows that most of the fields are at least partially useful for separating out the data.2 Visualisation After normalising and centring the data the next step involved computing the covariance matrix of the data and the calculating the eigen vectors and corre- sponding values of that matrix. The first two eigen values are clearly.55) and “OD280/OD315 of diluted wines” (-0.09 -0. As it stands Figure 2. even without the other fields. overall.25 0.p.16 0. so that it could be used for PCA.40 -0.3. the most significant.13 0.52 0.): [ -0. The centring merely involved subtracting the means of each field from every element of that field.03 -0.00 0.33 -0.08 -0.47) meaning that these might be quite good for classification.41 0.2.01 -0.47 -0.23 -0.19 -0. but the data was also centred.55 -0. as in Section 2. Two of the classes are very well separated (red squares and blue triangles) and the only overlap that occurs is with the third class. 2.44 ] [ -0. The only field which seems to be under represented slightly is field 3 (Ash).

65 716.1 717.64 447.184 0.74 446.56 446. 8 .023 0 0 1 2 3 4 5 6 7 8 9 10 11 12 Figure 2. 0.207 0.069 0.8 716.092 0.95 717.92 446.23 Eigen Values 0.6: Eigen Values.7 717.46 447. 448 Wine PCA 447.55 717.82 447.046 0.1 446.161 0.5 716.28 447.138 0.115 0.4 717.7: PCA.2 716.85 718 Figure 2.25 717.38 446.

so would probably help greatly with accurately classifying new wine samples. this may make PCA the best method overall. So in some respects applying self organising maps and principle com- ponent analysis was overkill.Chapter 3 Conclusion The wine data set proved to be very two dimensional. Coupled with the apparent low dimensionality of the data. in terms of “bang for buck”. The self organising map in particular worked very well. achieving near “per- fect” separation of all three classes. whereas PCA took a fraction of the time (at least on this data set). However from a computational point of view it did take a considerably longer time to perform. Even just performing basic plots of some of the data fields yielded moderate separation of the three classes. However both of these techniques did increase the separation previously found. 9 .

’Nonflavanoid phenols’.split( string. ’Proline’ ] makescharts = True runsom = True maxvalues = [] minvalues = [] sumvalues = [] categories = [] def extractvalues( line ): values = string. eig.Appendix A Python Data Processing Script #!/usr/bin/python import string import os from matplotlib. ’Magnesium’. ’Hue’. gca." ) categories. xlabel. v ) sumvalues[ i ] = sumvalues[ i ] + v 10 . ". ’Color intensity’. innerproduct names = [ ’Alcohol’. len( values ) ): v = values[ i ] if i < len( maxvalues ): maxvalues[ i ] = max( maxvalues[ i ]. title. ylabel from MLab import cov.strip( line ). close. values ) for i in range( 0. ’OD280/OD315 of diluted wines’. ’Total phenols’. array. savefig. ’Proanthocyanins’. ’Alcalinity of ash’. figure.append( int( values[ 0 ] ) ) values = values[1:] values = map( float. ’Ash’. ’Flavanoids’.matlab import plot. ’Malic acid’. set. v ) minvalues[ i ] = min( minvalues[ i ].

lines ) # extract un-normalised values into the three categories values1 = map( extractvalues. lines ) ) # plot charts of the raw data if makescharts: for i in range( 0. maxv = maxvalues[ i ] minv = minvalues[ i ] norm.index( x. y3. filter( lambda x: string. values1 ) x2 = map( lambda x: x[ i ]. " " ) + "\n" origfile = file( "wine. 13 ): x1 = map( lambda x: x[ i ]. values ). else: maxvalues. lines ) ) values2 = map( extractvalues. filter( lambda x: string. values3 ) for j in range( i+1. values2 ) y3 = map( lambda x: x[ j ]. 13 ): y1 = map( lambda x: x[ j ]. y1. filter( lambda x: string.index( x. ’b^’ ) title( ’Wine Properties’ ) xlabel( names[ i ] ) ylabel( names[ j ] ) savefig( str( i ) + "-" + str( j ) ) close() #normalise the data and write it to file for SOM to use 11 . "3" ) == 0. ’gs’ ) plot( x3.index( x.close() values = map( extractvalues. values3 ) figure() plot( x1. y2. lines ) ) values3 = map( extractvalues.append( v ) sumvalues.data" ) lines = origfile.append( (v-minv)/(maxv-minv) ) return norm def totext( values ): return string. values2 ) x3 = map( lambda x: x[ i ].join( map( str.append( v ) return values def normalisedata( values ): norm = [] for i in range( 0. values1 ) y2 = map( lambda x: x[ j ]. "2" ) == 0. len( values ) ): v = values[ i ].readlines() origfile. ’ro’ ) plot( x2. "1" ) == 0.append( v ) minvalues.

P_WAIT.spawnlp( os. ’som’./som’.som.data.append( x ) y1.split( coordlines[ i ] ) x = int( xy[ 0 ] ) y = int( xy[ 1 ] ) c = categories[ i ] if c == 1: x1. ’wine. values ) ) normalisedfile = open( "wine. sumvalues ) ) 12 . ’. len( coordlines ) ): xy = string. "w" ) normalisedfile.append( x ) y3.readlines() coordsfile.writelines( normalisedlines ) normalisedfile. ’ro’ ) plot( x2.winners" ) coordlines = coordsfile.data. y1.norm’. ’b^’ ) set( gca(). range( 0.norm". ’wine. ’xticks’. map( normalisedata. ’gs’ ) plot( x3.append( y ) elif c == 2: x2. ’13’. y2. x1 = [] y1 = [] x2 = [] y2 = [] x3 = [] y3 = [] for i in range( 0. 12 ) ) set( gca().append( y ) elif c == 3: x3. range( 0.close().close() if runsom: os.append( y ) else: print "unknown category: " + c # use matplotlib functions to make the graph figure() plot( x1. 12 ) ) title( "Wine SOM" ) savefig( "som" ) close() # PCA averages = array( map( lambda x: x/len( values ). ’178’.normalisedlines = map( totext.config’ ) coordsfile = open( "som. y3. ’yticks’.append( x ) y2.

len( coordlines ) ): v = cenvalues[ i ] c = categories[ i ] x = innerproduct( eigvec1. len( eigvlist ) ) ) title( ’Eigen Values’ ) savefig( ’eigenvalues’ ) close() # find vectors with largest eigen value eigval1 = 0 eigval2 = 0 eigvec1 = [] eigvec2 = [] for i in range( len( eigvalues ) ): eigval. eigvec2 = eigval1. eigvec1 = eigval.append( x ) y1.reverse() figure() plot( range( 0. ’k-’ ) set( gca(). eigvectors[ i ] if eigval > eigval2: eigval2. eigvec = eigvalues[ i ].sort() eigvlist. range( 0. eigvlist.averages. eigvec print ’eigvalue1=’ + str( eigval1 ) print eigvec1 print ’eigvalue2=’ + str( eigval2 ) print eigvec2 x1 = [] y1 = [] x2 = [] y2 = [] x3 = [] y3 = [] for i in range( 0. values ) cenvalues = array( map( lambda x: array( x ) . v ) if c == 1: x1. len( eigvlist ) ). v ) y = innerproduct( eigvec2. ’xticks’. eigvec1 eigval1.append( y ) elif c == 2: 13 . normvalues ) ) covmatrix = cov( cenvalues ) eigvalues.eigvectors = eig( covmatrix ) eigvlist = list( eigvalues ) eigvlist.normvalues = map( normalisedata.

x2. ’gs’ ) plot( x3. y3. y1.append( x ) y3. y2.append( y ) else: print "unknown category: " + c figure() plot( x1. ’ro’ ) plot( x2.append( x ) y2.append( y ) elif c == 3: x3. ’b^’ ) title( "Wine PCA" ) savefig( "pca" ) close() 14 .

- Image Analysis and Pattern Recognition for Remote Sensing with Algorithms in ENVI/IDLUploaded byCynthia
- PCA_FACE_RECOGNITION_REPORTUploaded bysharib
- robpcaUploaded byYulifirda Isnaini
- Eigenbased Multispectral Palmprint RecognitionUploaded byIJARP Publications
- Face Recognition Using Laplacianfaces (Synopsis)Uploaded byMumbai Academics
- system requirement specification for face recognition systemUploaded byrohanrj3
- V01I031112Uploaded byIJARTET
- 1111_10 Gabor FilterUploaded byprabhavathysund8763
- Principal Component AnalysisUploaded byShahid
- Pa Mi 2011Uploaded bynithinv82
- Krzanowski - Sensitivity of Principal Components - 1984Uploaded byramdabom
- p 2008 Liu Extracting Principle Components for ant Analysis of Fmri ImagesUploaded byestombelo7098
- Liam_Mescall - PCA ProjectUploaded byLiam Mescall
- Periocular Recognition Using Reduced FeaturesUploaded byAnonymous Gl4IRRjzN
- Full Text 1Uploaded byjavad_k78
- QMFMULT2Uploaded byhadian98
- MST209_2005solutionsUploaded byrashismart2000
- IEEE_04587465Uploaded byelahe92
- Improved PCA Algorithm for Face RecognitionUploaded byTI Journals Publishing
- 85109 Hw 3 SolutionsUploaded byRafael Tomaz
- Ppc AUploaded byHong Ge
- Attachment 1Uploaded byMallela Sampath Kumar
- Modal Analysis - Solved exampleUploaded byMansoor
- PAPER Unsupervised Learning in Neural ComputationUploaded byAntonio Marcegaglia
- Eigen-04Uploaded byMangisi Haryanto Parapat
- Rasch-based high-dimensionality data reduction and class prediction with applications to microarray gene expression dataUploaded byakastrin
- w15.pdfUploaded bysnehil96
- IRJET-Analysis of Unique Gait Patterns by Extraction of Salient FeaturesUploaded byIRJET Journal
- mnfUploaded byFrancisco Javier Ceballos Aravena
- CS190.1x_week5Uploaded byRajul Srivastava

- BATCH 14Uploaded bySam Spade
- DR ZAILAN MOHD ISA Customer Satisfaction in the Management o[1]Uploaded bySaiful Hadi Mastor
- 12Uploaded byBurhan Rajput
- Little Book of r for Multivariate AnalysisUploaded byJuanacho001
- Data AnalysisUploaded byElisabeth Brielin Sinu
- Spatio temporal variation water qualityUploaded byAzubuike Ifejoku-Chukwuka
- HRMUploaded bySujeet Kumar
- TpaUploaded byEduardo Arcos
- tmp252E.tmpUploaded byFrontiers
- Chapter 7Uploaded bysmartlife0888
- 325206788 Ganapathy Vidyamurthy Pairs TradingUploaded byAmine Elghazi
- Face RecognitionUploaded byShiv Yadav
- Image Transform 2Uploaded bynoorhilla
- 7262410 Data Mining HandbookUploaded bymaxwell3333
- Fundamentos (Libro de Imagenes Hiperespectrales)Uploaded byJhan Carranza Cabrera
- Hand MotionUploaded byapi-3719303
- August 1983 the Role of Habitat Complexity and Heterogeneity in Structuring Tropical Mammal CommunitiesUploaded byÁLisson Maranho
- acp_avec_rUploaded byKamal El Abbassi
- Structural Design of Reinforced Concrete Pile CapsUploaded byapirakq
- ReportUploaded byUdegbe Andre
- A Review on Tomato Authenticity Quality Control Methods in Conjunction With Multivariate Analysis (Chemometrics)Uploaded byLucian Cuibus
- Machine Learning Cheat Sheet 2015.pdfUploaded byFlorent Bersani
- SLS_corrected_1.4.16.pdfUploaded bydalp86
- ppcaUploaded byMaxGK
- A Note on Fingerprint Recognition Systems---IEEE2011Uploaded bySVSEMBEDDED
- ch3_unit2Uploaded byRajesh Vungarala
- JeiUploaded bybsudheertec
- A Core Gut Microbiome in Obese and Lean TwinsUploaded byYue Ichiban-jō No Ane
- AASR-2011-2-5-84-91Uploaded bySanat
- Health Care Fraud Detection Using Non NegativeUploaded byNitendra Kumar