4 views

Uploaded by debo

machine learning, SVM

machine learning, SVM

© All Rights Reserved

- The 5th Tribe Caret
- Optical Tea Sorter Thesis
- An Ontology-Based Sentiment Classification Methodology for Online Consumer Reviews
- A Multiple Classifier System for Aircraft Engine
- Face Recognition Using Fused Spatial Patterns
- Fuzzy twin support vector machine based on iterative method
- An Image Mining System for Gender Classification & Age Prediction Based on Facial Features
- Excel Sheet Functions testing doc
- Detecting Original Image Using Histogram, DFT and SVM
- Diagnostic of Defects with Inductive Thermography Imaging System by Using PSO Algorithm
- To Detect Fraud Ranking For Mobile Apps Using SVM Classification
- Microsoft Excel - Advanced Formulas and Functions
- 2004_dec_1365-1371
- Excel Dictionary
- Pca
- Uvm Configure
- 11 Final Report
- Forecasting Power Output of Photovoltaic.pptx
- METHODOLOGICAL STUDY OF OPINION MINING AND SENTIMENT ANALYSIS TECHNIQUES
- 17

You are on page 1of 26

Linadd Algorithm

Experiments

Sren Sonnenburg

Fraunhofer FIRST.IDA, Berlin

Gunnar Rtsch, Konrad Rieck

Advertisement

Introduction

Linadd Algorithm

Introduction

Linadd Algorithm

Experiments

Experiments

Advertisement

Introduction

Linadd Algorithm

Outline

Introduction

Linadd Algorithm

Experiments

Experiments

Advertisement

Introduction

Linadd Algorithm

Experiments

Advertisement

Motivation

Text Classification (Spam, Web-Spam, Categorization)

Task: Given N documents, with class label 1, predict text

type.

Task: Given N executables, with class label 1, predict

whether executable is a virus.

Task: Given N sequences around Promoter/Splice Site (label

+1) and fake examples (label 1), predict whether there is a

Promoter/Splice Site in the middle

Large N is needed to achieve high accuracy (i.e. N = 107 )

Introduction

Linadd Algorithm

Experiments

Advertisement

Motivation

Formally

Given:

N training examples (xi , yi ) (X , 1), i = 1 . . . N

string kernel K (x, x0 ) = (x) (x0 )

Examples:

words-in-a-bag-kernel

k-mer based kernels (Spectrum, Weighted Degree)

Task:

Train Kernelmachine on Large Scale Datasets, e.g. N = 107

Apply Kernelmachine on Large Scale Datasets, e.g. N = 109

Introduction

Linadd Algorithm

Experiments

Advertisement

Motivation

String Kernels

Spectrum Kernel (with mismatches, gaps)

K (x, x0 ) = sp (x) sp (x0 )

k

Introduction

Linadd Algorithm

Experiments

Advertisement

Motivation

Kernel Machine

Kernel Machine Classifier:

f (x) = sign

N

X

!

i yi k(xi , x) + b

i=1

j = 1, . . . , M :

N

X

i yi k(xi , xj ) + b

i=1

Computational effort:

Single O(NT ) (T time to compute the kernel)

All O(NMT )

Costly!

Used in training and testing - worth tuning.

How to further speed up if T = dim(X ) already linear?

Introduction

Linadd Algorithm

Outline

Introduction

Linadd Algorithm

Experiments

Experiments

Advertisement

Introduction

Linadd Algorithm

Experiments

Advertisement

Linadd

Key Idea: Store w and compute w (x) efficiently

N

X

i yi k(xi , xj ) =

i=1

N

X

i=1

{z

w

1

of Spectrum Kernel of order 8 DNA)

w is extremely sparse although high dimensional (e.g. 1014 for

Weighted Degree Kernel of order 20 on DNA sequences of

length 100)

Introduction

Linadd Algorithm

Experiments

Advertisement

Linadd

Technical Remark

Treating w

w must be accessible by some index u (i.e. u = 1 . . . 48 for

8-mers of Spectrum Kernel on DNA or word index for

word-in-a-bag kernel)

Needed Operations

Clear: w = 0

Add: wu wu + v

Lookup: obtain wu

(must be highly efficient)

Storage

Explicit Map (store dense w); Lookup in O(1)

Sorted Array (word-in-bag-kernel:

all words sorted with value

P

attached); Lookup in O(log( u I (wu 6= 0)))

Suffix Tries, Trees; Lookup in O(K )

Introduction

Linadd Algorithm

Experiments

Advertisement

Linadd

Comparison of worst-case run-times for operations

clear of w

add of all k-mers u from string x to w

lookup of all k-mers u from x0 in w

clear

add

lookup

Explicit map

O(||d )

O(lx )

O(lx0 )

Sorted arrays

O(1)

O(lx log lx )

O(lx + lx0 )

Tries

O(1)

O(lx d )

O(lx0 d )

Suffix trees

O(1)

O(lx )

O(lx0 )

Conclusions

Explicit map ideal for small ||

Sorted Arrays for larger alphabets

Suffix Arrays for large alphabets and order (overhead!)

Introduction

Linadd Algorithm

Experiments

Advertisement

Linadd

Linadd directly applicable when applying the classifier.

!

N

X

f (x) = sign

i yi k(xi , x) + b

i=1

Problems

w may still be huge fix by not constructing whole w but

only blocks and computing batches

What about training?

general purpose QP-solvers, Chunking, SMO

optimize kernel (i.e. find O(L) formulation, where

L = dim(X ))

Kernel Caching infeasable

(for N = 106 only 125 kernel rows fit in 1GiB memory)

Use linadd again: Faster + needs no kernel caching

Introduction

Linadd Algorithm

Experiments

Advertisement

Linadd

Derivation I

Analyzing Chunking SVMs (GPDT, SVMlight :)

Training algorithm (chunking):

while optimality conditions are violated do

select q variables for the working set.

solve reduced problem on the working set.

end while

P

At each iteration, the vector f , fj = N

i=1 i yi k(xi , xj ),

j = 1 . . . N is needed for checking termination criteria and

selecting new working set (based on and gradient w.r.t. ).

Avoiding to recompute f , most time is spend computing

linear updates on f on the working set W

X

fj fjold +

(i iold )yi k(xi , xj )

iW

Introduction

Linadd Algorithm

Experiments

Advertisement

Linadd

Derivation II

Use linadd to compute updates.

P

Update rule: fj fjold + iW (i iold )yi k(xi , xj )

P

Exploiting k(xi , xj ) = (xi ) (xj ) and w = N

i=1 i yi (xi ):

X

fj fjold +

(i iold )yi (xi ) (xj ) = fjold + wW (xj )

iW

Observations

q := |W | is very small in practice can effort more complex

w and clear,add operation

lookups dominate computing time

Introduction

Linadd Algorithm

Experiments

Advertisement

Linadd

Algorithm

Recall we need to compute updates on f (effort c1 |W |LN):

X

fj fjold +

(i iold )yi k(xi , xj ) for all j = 1 . . . N

iW

Lookup cost)

fj = 0, j = 0 for j = 1, . . . , N

for t = 1, 2, . . . do

Check optimality conditions and stop if optimal, select working

set W based on f and , store old =

solve reduced problem W and update

clear w

w w + (i iold )yi (xi ) for all i W

update fj = fj + w (xj ) for all j = 1, . . . , N

end for

Speedup of factor cc21` |W |

Introduction

Linadd Algorithm

Experiments

Advertisement

Linadd

Parallelization

fj = 0, j = 0 for j = 1, . . . , N

for t = 1, 2, . . . do

Check optimality conditions and stop if optimal, select working

set W based on f and , store old =

solve reduced problem W and update

clear w

w w + (i iold )yi (xi ) for all i W

update fj = fj + w (xj ) for all j = 1, . . . , N

end for

transfer (or w depending on the communication costs and

size)

update of f is divided into chunks

each CPU computes a chunk of f I for I {1, . . . , N}

Introduction

Linadd Algorithm

Outline

Introduction

Linadd Algorithm

Experiments

Experiments

Advertisement

Introduction

Linadd Algorithm

Experiments

Advertisement

Datasets

Web Spam

Negative data: Use Webb Spam corpus

http://spamarchive.org/gt/ (350,000 pages)

Positive data: Download 250,000 pages randomly from the

web (e.g. cnn.com, microsoft.com, slashdot.org and

heise.de)

Use spectrum kernel k = 4 using sorted arrays on 100,000

examples train and test (average string length 30Kb, 4 GB in

total, 64bit variables 30GB)

Introduction

Linadd Algorithm

Experiments

Advertisement

Web-Spam

N

Spec

LinSpec

Accuracy

auROC

100

2

3

89.59

94.37

97 1977

6039 19063 94012 193327

255 4030

9128 11948 44706

83802 107661

92.12 96.36

97.03

97.46

97.83

97.98

98.18

97.82 99.11

99.32

99.43

99.59

99.61

99.64

kernel without (Spec) and with linadd (LinSpec)

Introduction

Linadd Algorithm

Experiments

Advertisement

Datasets

Negative Data: 14,868,555 DNA sequences of fixed length 141

base pairs

Positive Data: 159,771 Acceptor Splice Site Sequences

Use WD kernel k = 20 (using Tries) and spectrum kernel

k = 8 (using explicit maps) on 10, 000, 000 train and

5,028,326 examples

Introduction

Linadd Algorithm

Experiments

Advertisement

For

of kernels:

P linear combination

old )y k(x , x ) (O(Ld |W |N))

(

j

j

i j

jW

j

use one tree of depth d per position in sequence

for Lookup use traverse one tree of depth d per position in

sequence

Example d = 3 :

Introduction

Linadd Algorithm

Experiments

Advertisement

100000

10000

SpecPrecompute

Specorig

Speclinadd 1CPU

Speclinadd 4CPU

Speclinadd 8CPU

1000

100

10

1000

10000

100000

1000000

Introduction

Linadd Algorithm

Experiments

Advertisement

WDPrecompute

WD 1CPU

WD 4CPU

WD 8CPU

WDLinadd 1CPU

WDLinadd 4CPU

WDLinadd 8CPU

10000

1000

100

10

1000

10000

100000

1000000

10000000

Introduction

Linadd Algorithm

Experiments

Advertisement

N

500

1,000

5,000

10,000

30,000

50,000

100,000

auROC

75.55

79.86

90.49

92.83

94.77

95.52

96.14

auPRC

3.94

6.22

15.07

25.25

34.76

41.06

47.61

N

200,000

500,000

1,000,000

2,000,000

5,000,000

10,000,000

10,000,000

auROC

96.57

96.93

97.19

97.36

97.54

97.67

96.03

auPRC

53.04

59.09

63.51

67.04

70.47

72.46

44.64

Introduction

Linadd Algorithm

Experiments

Advertisement

Discussion

Conclusions

General speedup trick (clear, add, lookup operations) for

string kernels

Shared memory parallelization, able to train on 10 million

human splice sites

Linadd gives speedup of factor 64 (4) for Spectrum (Weighted

Degree) kernel and 32 for MKL

4 CPUs further speedup of factor 3.2 and for 8 CPU factor 5.4

parallelized 8 CPU linadd gives speedup of factor 125 (21) for

Spectrum (Weighted Degree) kernel, up to 200 for MKL

Discussion

State-of-the-art accuracy

Could we do better by encoding invariances?

Implemented in SHOGUN http://www.shogun-toolbox.org

Introduction

Linadd Algorithm

Experiments

Advertisement

To support the open source movement JMLR is proud to announce

a new track on machine learning open source software.

Contributions to http://jmlr.org/mloss/ should be related to

Implementations of machine learning algorithms,

Toolboxes,

Languages for scientific computing

and should include

A 4 page description,

The code,

A recognised open source license.

Contribute to http://mloss.org the mloss repository!

- The 5th Tribe CaretUploaded bySurya Prangga
- Optical Tea Sorter ThesisUploaded byYas Dissanayake
- An Ontology-Based Sentiment Classification Methodology for Online Consumer ReviewsUploaded byf6684sam
- A Multiple Classifier System for Aircraft EngineUploaded byFarzam Boroomand
- Face Recognition Using Fused Spatial PatternsUploaded byeditor3854
- An Image Mining System for Gender Classification & Age Prediction Based on Facial FeaturesUploaded byInternational Organization of Scientific Research (IOSR)
- Excel Sheet Functions testing docUploaded bySathish Kumar Karne
- To Detect Fraud Ranking For Mobile Apps Using SVM ClassificationUploaded byEditor IJRITCC
- Fuzzy twin support vector machine based on iterative methodUploaded byAnonymous vQrJlEN
- Detecting Original Image Using Histogram, DFT and SVMUploaded byIDES
- Microsoft Excel - Advanced Formulas and FunctionsUploaded bycewarrior
- Excel DictionaryUploaded bysuresh_gs9999
- Diagnostic of Defects with Inductive Thermography Imaging System by Using PSO AlgorithmUploaded byInternational Journal for Scientific Research and Development - IJSRD
- 2004_dec_1365-1371Uploaded byLê Hưu Nhân
- PcaUploaded byAnil Dongardiye
- Uvm ConfigureUploaded byRajesh Nandi
- 11 Final ReportUploaded byVinod Bawane
- Forecasting Power Output of Photovoltaic.pptxUploaded byAkhilRajeevKumar
- METHODOLOGICAL STUDY OF OPINION MINING AND SENTIMENT ANALYSIS TECHNIQUESUploaded byijsc
- 17Uploaded byAmitava Kansabanik
- 04351163Uploaded byoksejen
- Introduction to Data ScienceLecture 7Machine Learning 2Uploaded byVMKanetkar
- ReporteUploaded byAlan Díaz
- Face Recognition based on a Collection of Binary ClassifiersUploaded byRafael Henrique
- 2.session-3-5 (uploaded).pptxUploaded byDev Anand
- A Combined Method of Fractal AndUploaded bysipij
- 06022044Uploaded bycasio2008
- OKUploaded byMarco Leotta
- Advanced MS Excel Lab FullUploaded byJayasree Sushali
- Determination of Triethylene Glycol Concentration in Natural (1)Uploaded byarispriyatmono

- Complex AnalysisUploaded byricky montgomery
- 6. Circular Motion and Other Application of Newton_s LawUploaded byNurulWardhani11
- tcrm0204-417Uploaded byNora Nagy
- 24-ijecsmayUploaded bymelvinkuri
- Build OptionsUploaded byFrancisco
- MAN B&W S50ME-C8_2-GIUploaded byTheoDio Deus Ex Machina
- The Church of LightUploaded bycraftyfingers
- Easyguide Amilo d1845 EngUploaded byopardic
- Regional Sales ManagerUploaded byapi-121340011
- Case Study a Drug InteractionUploaded byGustiAgungKrisna
- ComplexNumbers GuideUploaded byJason Batson
- PA Goat Production - 2007Uploaded byJay Adones
- UntitledUploaded byeurolex
- 233141009-Abinitio-Practitioner-v-1-0-day1.pdfUploaded byVeerendra Boddu
- Penman 5ed Chap015Uploaded byHirastikanah HK
- EN_C1300_ExSite_IOP_r111412Uploaded bySakerhets
- Towards Automated Process and Workflow Management a Feasibility Study OnUploaded bySebhazz Rios R
- Canadian History - Study GuideUploaded byAnnieJafri
- Dove True Beauty Case AnalysisUploaded byCaracae
- Mr Honey's Work Study Dictionary (German-English)Uploaded byscribd01
- DigicomUploaded byDenmark Calimquim
- Weco Pneumatic Actuator Safety AlertUploaded byRafael Granado
- Fault Tolerance in GridUploaded byNeeraj Upadhyay
- Come As You AreUploaded byLambertoDiPiero
- Atlas Copco Industrial Power Tools 2014 Part1 UkUploaded byFabio Junior
- Deegan5e Sm Ch09Uploaded byRachel Tanner
- Risk Management Procedure ExampleUploaded byMarcos Poffo
- Unit 4 - Decision MakingUploaded byShreyash Mantri
- flashwave-packet-optical-networking-platform.pdfUploaded byMarcel Kebre
- Sleep Inspires InsightUploaded bychongbenglim