Attribution Non-Commercial (BY-NC)

27 views

Attribution Non-Commercial (BY-NC)

- Machine Learning
- Classification View of Material Master
- Using the Bitcoin Transaction Graph to Predict the Price of Bitcoin
- Sound Sence
- Biostatistics Role in Microarray Analysis
- Decoding Gender Dimorphism of Brain
- A Practical Approach to Microarray Data Analysis-Daniel P. Berrar
- Approaches to Robotic Vision Control Using Image
- research 1 (13).pdf
- IJAIEM-2013-04-30-106
- Matlab Projects
- Music Classification Using SVM.ppt
- Network Analysis System for Detecting Unknown Attack Using SVM and Network Intrusion: A Review
- Cover Letter
- IR - Unsupervised-chi Square
- METHODOLOGICAL STUDY OF OPINION MINING AND SENTIMENT ANALYSIS TECHNIQUES
- entropy-15-00416-v2
- Gunn 1998
- IAETSD-Artifact Facet Ranking And
- Detection of Collisions in Rugby Union

You are on page 1of 19

Chapter 9
CLASSIFYING MICROARRAY DATA USING
SUPPORT VECTOR MACHINES
‘Sayan Mukherjee
PostDoctoral Fellow: MIT/Whitehead Instiute for Genome Research and Center for
Biological and Computational Learning at MIT,
e-mail: sayan@mitedy
1. INTRODUCTION
Over the last few years the routine use of DNA microarrays has made
possible the creation of large datasets of molecular information
characterizing complex biological systems. Molecular classification
approaches based on machine learning algorithms applied to DNA
microarray data have been shown to have statistical and clinical relevance
for a variety of tumor types: Leukemia (Golub et al., 1999), Lymphoma
(Shipp et al., 2001), Brain cancer (Pomeroy et al, 2002), Lung cancer
(Bhattacharjee et al., 2001) and the classification of multiple primary tumors
(Ramaswamy et al., 2001),
One particular machine learning algorithm, Support Vector Machines
(SVMs), has shown promise in a variety of biological classification tasks,
including gene expression microarrays (Brown et al., 2000, Mukherjee et al.,
1999). SVMs are powerful classification systems based on regularization
techniques with excellent performance in many practical classification
problems (Vapnik, 1998, Evgeniou et al, 2000).
‘This chapter serves as an introduction to the use of SVMs in analyzing
DNA microarray data. An informal theoretical motivation of SVMS_ both
from a geometric and algorithmic perspective, followed by an application to
leukemia classification, is described in Section 2. The problem of gene
selection is described in Section 3. Section 4 states some results on a variety
of cancer morphology and treatment outcome problems. Multiclass
classification is described in Section 5. Section 6 lists software sources and9. Classifying Microarray Data Using Support Vector Machines 167
rules of thumb that a user may find helpful. The chapter concludes with a
brief discussion of SVMs with respect to some other algorithms.
. SUPPORT VECTOR MACHINES
In binary microarray classification problems, we are given 1 experiments
{Gu Yoo @aID}- This is called the training set, where x; is a vector
corresponding io the expression measurements of the # experiment or
sample (this vector has n components, each component is the expression
‘measurement for a gene or EST) and yj is a bi ss label, which will be
#1. We want to estimate a multivariate function from the training set that
will accurately label a new sample, that is f(r) =Yaon This problem of
leaming a classification boundary given positive and negative examples is a
particular case of the problem of approximating a multivariate function from
sparse data. The problem of approximating a function from sparse data is ill-
posed. Regularization is a classical approach to making the problem well-
posed (Tikhonov and Arsenin, 1977)
A problem is well-posed if it has a solution, the solution is unique, and
the solution is stable with respect to perturbations of the data (small
perturbations of the data result in small perturbations of the solution). This
last property is very important in the domain of analyzing microarray data
where the number of variables, genes and ESTs measured, is much larger
than the number of samples. In statistics when the number of variables is,
much larger that the number of samples one is said to be facing the curse of
dimensionality and the function estimated may very accurately fit the
samples in the training set but be very inaccurate in assigning the label of a
new sample, this is referred to as over-fitting. The imposition of stability on
the solution mitigates the above problem by making sure that the function is
smooth so new samples similar to those in the training set will be labeled
similarly, this is often referred to as the blessing of smoothness.
2.1 Mathematical Background of SVMs
We start with a geometric intuition of SVMs and then gi
mathematical formulation
e a more general
2.1.1 A Geometrical Interpretation
A. geometric interpretation of the SVM illustrates how this idea of
smoothness or stability gives rise to a geometric quantity called the margin
which is a measure of how well separated the two classes can be. We start by
assuming that the classification function is linear168, Chapter 9
@.1)
‘ors x and w, respectively.
‘The operation ws is called a dot product. The label of a new point yew is
the sign of the above function, Yrer= sign [f(tyey) } The. classification
boundary, all values ofx for which f(x) = 0, is a hyperplane’ defined by its
normal vector w (see Figure 9.1.
a, aA A
ang
Figure 9.1, The hyperplane separating two classes, The circles and the triangles designate the
‘members ofthe two classes. The normal vector tothe hyperplane isthe vector w.
‘Assume we have points from two classes that can be separated by a
hyperplane and x is the closest data point to the hyperplane, define x9 to be
the closest point on the hyperplane to x. This is the closest point to x that
satisfies wex=0 (see Figure 9.2). We then have the following two
equations:
wx k for some k, and
wx.
Subtracting these two equations, we obtain w(x ~%9
Dividing by the norm of w (the norm of w is the length of the vector w), we
obiain?
A liperplane isthe extension of the two or three dimensional concepts of li
to higher dimensions. Note, that in an n-dimensional space, a hypespla
‘dimensional object in the same way that a plane is a ewo dimensional object in a Uhre
‘dimensional space. This is why the normal or perpendicular to the hyperplane is always a
2-The notation “ql” refers to the absolute value ofthe variable a. The notation “al refers
the length ofthe veetor a
es and planes
is ann 1

- Machine LearningUploaded byMeraj Khan
- Classification View of Material MasterUploaded byMinal Tripathi
- Using the Bitcoin Transaction Graph to Predict the Price of BitcoinUploaded byharshv
- Sound SenceUploaded byMatthew Tucker
- Biostatistics Role in Microarray AnalysisUploaded byGerónimo Maldonado-Martínez
- Decoding Gender Dimorphism of BrainUploaded byJames Gildardo Cuasmayan
- A Practical Approach to Microarray Data Analysis-Daniel P. BerrarUploaded byBashistha Kanth
- Approaches to Robotic Vision Control Using ImageUploaded byManoj Kumar
- research 1 (13).pdfUploaded byaadafull
- IJAIEM-2013-04-30-106Uploaded byAnonymous vQrJlEN
- Matlab ProjectsUploaded byveenadorus
- Music Classification Using SVM.pptUploaded byatto_11
- Network Analysis System for Detecting Unknown Attack Using SVM and Network Intrusion: A ReviewUploaded byInternational Journal for Scientific Research and Development - IJSRD
- Cover LetterUploaded byfkia
- IR - Unsupervised-chi SquareUploaded byResa Septiari
- METHODOLOGICAL STUDY OF OPINION MINING AND SENTIMENT ANALYSIS TECHNIQUESUploaded byijsc
- entropy-15-00416-v2Uploaded byHussein Razaq
- Gunn 1998Uploaded byShaibal Barua
- IAETSD-Artifact Facet Ranking AndUploaded byiaetsdiaetsd
- Detection of Collisions in Rugby UnionUploaded byMatthew Fields
- isprsarchives-XL-1-W3-441-2013Uploaded byGaurav
- Automatic Identification of Formation Iithology from Well Log Data: A Machine Learning ApproachUploaded byJoão Augusto
- Space Elasticity TheoryUploaded byhansika
- Sentiment Analysis in Twitter With Lightweight Discourse AnalysisUploaded bySamar Hamady
- 1Uploaded byArnav Guddu
- Karan Kapoor 288 Co 13Uploaded byKaran Kapoor
- 287Uploaded byrekha_94
- RoviraUploaded byefiol
- Aspect Based Sentiment Analysis for E-Commerce Using Classification TechniquesUploaded bySikhiva Publishing House
- Breast Cancer Prediction (Final)Uploaded bySwathi Alag

- Module 6_ Air Pollutants and Control Techniques - Particulate Matter - Control Techniques _ Basic Concepts in Environmental Sciences _ APTI _ USEPAUploaded byselamitsp
- Letter of Support Template_0.pdfUploaded byselamitsp
- Redecam Bag FiltersUploaded byselamitsp
- lecture6 (1)Uploaded byRaj Bakhtani
- Fluoride Distribution in the Environment Along the GradientUploaded byselamitsp
- Notes-2nd Order ODE Pt2Uploaded byHimanshu Sharma
- lecture8.pdfUploaded byselamitsp
- lecture1(1)Uploaded byEswar Gowrinath
- Environmental Engineering Unit OperationsUploaded byselamitsp
- EulerMethod Excel Implementation Elementary Differential EquationsUploaded byselamitsp
- Cyclone Separator 3Uploaded byKanail
- Lecture2 AirUploaded byRaj Bakhtani
- Kinematic Assessment of Floc Formation Using a Monte Carlo ModelUploaded byselamitsp
- Monte Carlo Simulation in Environmental Risk Assessment —Science, Policy and Legal IssuesUploaded byselamitsp
- bag filter calculation.pdfUploaded byArun Gupta
- Lecture5 (1) AirUploaded byRaj Bakhtani
- EdgeDetect Background Foreground Independent for OCR O1-1Uploaded byselamitsp
- Daily Soil Ingestion Estimates for Children at a Superfund SiteUploaded byselamitsp
- Lecture 7Uploaded byDeovrat Kumar Raj
- Assessment of heavy metal contamination of roadside soils_Bai-2009.pdfUploaded byselamitsp
- engineering mathUploaded byselamitsp
- chapter5lecturespart6_000Uploaded byselamitsp
- 5 Gas TransferUploaded byselamitsp
- Material BalanceUploaded byalireza_e_2
- Time Series Forecasting of Hourly PM10 UsingUploaded byselamitsp
- Ecuaciones Diferenciales de 2 OrdenUploaded byJuan Jose Flores Fiallos
- k Ch10 SolutionsUploaded byselamitsp
- Improvements to SMO Algorithm for SVM Regression_ieee_smo_regUploaded byselamitsp
- Guiding Principles for Monte Carlo AnalysisUploaded byjonoherbert

- 100 C interview questions and answers - C online test.pdfUploaded byRoshni Khurana
- 35006144_K01_000_13Uploaded byarfeem
- IntegriSign Standard Product + SDK Developer's GuideUploaded byCarlos León Bolaños
- VLSI Placement_ Verilog and VHDLUploaded byshivanand_shettennav
- MpBus_02_eUploaded byVoicu Stanese
- 55326126-Commodore-128-Book-7-Peeks-and-Pokes.pdfUploaded bybandihoot
- AttrMappingUploaded byHassan Ogloo
- 272Uploaded byIoana Alexandra
- Fm Read_textUploaded byedmondo77
- ct1t5naUploaded bylore_ladiosa75
- SAP Scripts Control Commands ExamplesUploaded bySAP_2006
- 24-Efficient ABAP CodingUploaded byapi-3823432
- PC_811_TransformationGuideUploaded bykeyurpatel_99
- jtj.txtUploaded byA.R SaiArjun
- Elektronika -osnoveUploaded bypredrag152
- RR-10203-C & DS APR 2003Uploaded bympssassygirl
- ComEng22 Syllabus RevisedUploaded byHope Lee
- ABAP Runtime AnalysisUploaded byrahaman.ma
- Rx600 LCD Demo App NoteUploaded byNick Gart
- Project Case Study of Spare Part Management System[1]..Uploaded byVipul Sharma
- Alv SaptechUploaded byAlfred Lambert
- Trigers SP ViewsUploaded byn.rajesh
- Abaqus User Subroutines Reference Manual (6Uploaded byMD Anan Morshed
- sap ewm.pdfUploaded byravi
- Elmer Param ManualUploaded byJuan Maorad
- CCB DeveloperUploaded byHaneef Ali Sadiq
- Atmega16Uploaded bysuaradjus
- 10550A ENU CompanionUploaded byCristiano Motta
- PL SQL by nareshUploaded bynareshpeddapati
- Aust Cse BulletinUploaded byMishal Islam

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.