33 views

Uploaded by Ethan Carres Hidalgo

K means para clasificacion espectral

- rp_Verizon-DBIR-2014_en_xg.pdf
- Scikit Learn Infographic
- Knime Bigdata Energy Timeseries Whitepaper
- clustream
- Survey on Security Management of Multiple Spoofing Attackers in Wireless Networks
- Literature Survey on Detection of Brain Tumor from MRI Images
- Samvida 1 2011 Rule Book
- 05656255
- 1-s2.0-S1568494614006334-main.pdf
- Smart Response System Using Speech Emotion
- What is Machine Learning
- Hae Us Ser 18 Associative
- WEKA
- MA2161 syllabus
- Lung Disease prediction system using naive bayes and k means clustering
- ML
- A Cluster Validity Index for Fuzzy Clustering
- [IJET-V2I5P5] Authors: CHETANA M, SHIVA MURTHY. G
- Cluster Analysis
- clutoman.pdf

You are on page 1of 10

Vignesh R. Ramachandran

Johns Hopkins Applied Physics Laboratory

Laurel, MD, USA

Vinny.Ramachandran@jhuapl.edu

Herbert J. Mitchell

Naval Postgraduate School

Monterey, CA, USA

herbert.mitchell@jieddo.mil

Samantha K. Jacobs

Johns Hopkins Applied Physics Laboratory

Laurel, MD, USA

Samantha.Jacobs@jhuapl.edu

Nigel H. Tzeng

Johns Hopkins Applied Physics Laboratory

Laurel, MD, USA

Nigel.Tzeng@jhuapl.edu

Alexer H. Firpi

Johns Hopkins Applied Physics Laboratory

Laurel, MD, USA

Alexer.Firpi@jhuapl.edu

Benjamin M. Rodriguez

Johns Hopkins Applied Physics Laboratory

Laurel, MD, USA

Benjamin.Rodriguez@jhuapl.edu

1. I NTRODUCTION

signature data in the scientific community gathered from a

variety of sensors using a variety of collection techniques. As

the quantity of collected data grows, automated solutions for

searching and matching signatures need to be developed. When

searching and matching signatures, reducing computational

complexity and increasing matching accuracy are essential. We

present a signature classification method via k-means clustering

using a novel application of spectral angle mapping to efficiently

determine spectral similarity. We evaluate the method against

spectral data in the SigDB spectral analysis software application developed by the Johns Hopkins University Applied Physics

Laboratory (JHU/APL). The key component to this approach

is the set of characteristic functions used to map signatures

similarity into a spatial representation. Existing methods used

to autonomously identify and classify IR spectral data include

spectral angle mapping and key feature detection. Spectral

mapping is computationally slow due to the need for direct

individual comparison, and key feature detection improves computation time but is limited by the specific features selected for

comparison. The accuracy and computation time of the spectral

cluster classification method is evaluated against spectral angle

mapping and visual analyses on the ASTER NASA spectral

library. The goal of this method is to improve both the accuracy

and speed of classifying newly collected unlabeled spectra. We

find that the proposed method of scoring signatures offers a

speed increase of three orders of magnitude in comparing spectra at the expense of a high false positive rate, suitable for use as

a first-pass filter. We further find that the k-means cluster-based

classification is highly sensitive to the selection of initial cluster

centroids, and offer alternative solutions to use with our scoring

method.

and forensic analysis requires collecting large quantities of

data from a variety of spectrometers using a variety of

techniques in diverse environmental conditions. Variations

in observed spectral features, regardless of the quality of

the data, make signature classification and comparison both

challenging for spectral analysts and often impossible for

automated systems. As the quantity of collected data continues to grow, automated solutions are increasingly critical.

For example, the national Integrated Signatures Program

(ISP) has collected approximately one million infrared (IR)

spectra, which largely rely on manually produced metadata

for identification and classification. The manual production

of metadata, at this scale, requires significant and rising cost

and time investment to reduce errors and inconsistencies.

The primary methods currently used to autonomously identify and classify IR spectral data are spectral angle mapping

and key feature detection [1]. Spectral angle mapping cannot

compare spectra with differing domains (e.g., spectral range,

spectral resolution or inconsistently removed bands) without

significant preprocessing. Thus, spectral angle mapping is a

computationally slow process, running in linear time against

an entire reference library to identify a single new or unknown signature. Key feature detection improves computation time by comparing only predetermined feature locations

in the reference spectra, but this method requires the user to

specifically identify the spectral features of interest.

A novel approach to signature classification via scoring and

clustering is presented. A set of characteristic mathematical

functions are used as artificial reference spectra to score

library signatures, and a k-means clustering algorithm determines classification clusters in the score space. New

signatures are scored against the same characteristic functions to determine their location in score space, and thus

determine their likely classification. Since new signatures

need only be compared against cluster centroids to determine

classification,

the algorithm performs in O(k) time, where

p

k n/2; i.e., the computation time increases linearly with

respect to the predetermined number of clusters k. We apply

this approach to a reference sampling of signatures from

the ASTER spectral library [2] and evaluated accuracy and

computation time versus direct spectral angle comparison.

TABLE OF C ONTENTS

1 I NTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 R ELATED W ORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 M ETHODOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 F INDINGS AND A NALYSIS . . . . . . . . . . . . . . . . . . . . . . .

5 C ONCLUSION AND F UTURE W ORK . . . . . . . . . . . . .

R EFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

B IOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

2

2

7

8

8

9

c

978-1-4799-1622-1/14/$31.00
2014

IEEE.

IEEEAC Paper #2635, Version 1, Updated 11/15/2013.

faster than the direct spectral angle mapping. Used in tandem with traditional signature analysis, this method provides

a first-pass coarse screening of spectral classification to

reduce the size of the identification pool. Reducing the

workload on a more intensive secondary analysis allows a

much larger reference libraries to be used in the near-realtime classification of field-collected spectra. Greater access

to spectral data, in addition to the ability to provide a preliminary classification of newly collected spectra, provides

forensic analysts and first responders enhanced chemical

detection capabilities when they need it most.

of data points in the signature. Two spectral vectors are

compared by simply computing the angle between them via

their vector dot product. This method inherently assumes

that the two spectra being compared share precisely the same

domain: not only the same spectral band, but also the same

sampling resolution and specific domain values. Thus a

signature sampled at 5, 10, 15... 100 microns cannot be

immediately compared with another signature sampled at 7,

12, 17... 102 microns, though the domains almost entirely

overlap. Any mismatch in domain must be resampled through

interpolation and extrapolation. Resolving the mismatch is

computationally inefficient since, pathologically, every possible pair of spectra may require resampling.

The remainder of this paper is structured as follows. In Section 2, related work in the area of spectral signature matching

is presented. Our proposed method, signature classification

via scoring and clustering, is described in Section 3. Findings

and analyses are provided in Section 4. Finally, we conclude

and describe opportunities for future work in Section 5.

3. M ETHODOLOGY

We formulate a methodology to rapidly classify a new,

unknown signature by identifying signatures in a spectral

library with similar spectral features. Instead of individually comparing the unknown signature against each member

of the library, the proposed method precomputes a score

representation of the library against a small number (N) of

artificial reference spectra. Using a derivative of the SAM

method, the scalar spectral angle values between each library

signature and the reference spectra are treated as coordinates

of a point in N-dimensional space, and cached within the

library; when a new signature is introduced, it is compared

against the reference spectra to produce a corresponding set

of coordinates. Then, sets of spatial coordinates with smaller

Euclidean distances correspond to library spectra with the

greater similarity to the new signature. This method is intended to filter the library down to a small subset of candidate

matches.

2. R ELATED W ORK

The need for spectral signature comparison and identification

has driven substantial work in the application of pattern

recognition, unsupervised and semi-supervised learning, and

data clustering [3] [4] [5]. Further, the need to quantify and

analyze enormous quantities of spectral data has spawned

many attempts at spectral collections or databases, with

mixed results [6]. The inability to reduce various methods

of collection and phenomenologies of spectra into a least

common denominator representation has made the problem

computationally challenging, especially without the use of

copious metadata to explain the exact conditions and context

in which the spectra were collected.

The classification and identification of unlabeled data has

been studied in great detail in the spatial domain, and a

wealth effective of solutions have been developed to address the problem [3] [7]. Spatial clustering algorithms

in general attempt to determine natural boundaries between

non-uniformly distributed spatial data points. Of these, kmeans is relatively simplistic approach: given some known

number of clusters k, cluster center points are randomly

distributed among the sample data, then iteratively updated

to reflect the average of their nearby constituents. A key

assumption is advance knowledge of k, as this algorithm has

no ability to merge or split existing clusters. However, it

is the simplest in a large field of clustering solutions that

includes hierarchical clustering, fuzzy k-means, DBSCAN,

expectation-maximization, and many others [3][8]. If the

spectral classification problem can be effectively adapted into

the spatial domain, any of these existing methods can be

applied.

this paper also investigates the opportunity to classify and

identify unlabeled signatures by characterizing the generated

N-dimensional map as a spatial clustering problem. k-Means,

an elementary but very popular [3] clustering algorithm,

generates cluster associations among spatial coordinates. A

new, unlabeled signatures spectral scores will place it within

a defined cluster; then, the labels of the library spectra sharing

the same cluster become preliminary guesses at the unknown

signatures classification. Since the labels of library signatures are predetermined, and signatures with the same label

are expected to have very similar spectral characteristics, the

score clustering process also serves to validate the choice of

reference spectra.

Characteristic Spectral Angle Mapping (cSAM)

Our proposed method adapts SAM spectral comparison as

a measure of indirect spectral similarity, rather than direct.

Traditionally, SAM operates on the principle that if signature

A has spectral angle AC to signature C, then small AC

implies spectral similarity between A and C. Our modified

application instead proposes that a characteristic function,

such as y = cos x, can serve as an artificial reference

signature B. If signatures A and C have spectral angles AB

and BC to C respectively, then AB BC implies a degree

of similarity between A and C. This relationship is not as

precise as SAM: the set of spectral vectors satisfying a given

spectral angle to B trace a surface around the reference

signatures vector, as shown in Figure 1: here both A and

C present the same spectral angle to B, but A 6= C. In

this three-dimensional representation, the area of ambiguity

presents as the surface of a cone; the spectral vector of a 500-

Several direct and indirect methods exist to compare signatures against each other, such as Spectral Angle Mapping

(SAM) [4], multiple endmember spectral mixture analysis

(MESMA) [9], peak detection and similarity indices such as

the Pearson correlation [1]. In general practice, the sensitivity

and utility of each method is inversely correlated with its

runtime computational complexity [1]. The SAM algorithm

is of specific interest due to its ability to precisely describe

the difference between two signatures without regard for the

relative illumination within the spectra (which is irrelevant

to the spectral features of the observed material) [4]. SAM

measures similarity by taking a signature in two dimensions

(X and Y) and creating a spectral vector consisting of its Y

2

~

Angle to Reference Vector B

Figure 2. Intersection of 3D solutions with Spectral Angles

1 , 2 to Reference Vectors B~1 , B~2 respectively

ambiguity surface cannot be represented graphically.

The solution set of spectral vectors represents many spectra

that are equally similar to the reference signature, at any

magnitude. The magnitude of the spectral vector represents

the illumination of the signature, which is an artifact of the

collection environment and irrelevant for the purposes of

determining the similarity of spectral characteristics [4]. This

leaves a cross-section of spectral vectors that all exhibit the

same degree of similarity to B; for vectors in three dimensions, this presents as a circle orthogonal to B. This level of

ambiguity is irreducible using only one reference signature.

Using multiple reference signatures further constrains the

solution to the intersection of the corresponding vector sets,

as shown in Figure 2. Thus if two signatures A and C exhibit

spectral angles AB1 CB1 to B1 and AB2 CB2 to

B2 , it becomes increasingly likely that A C. Introducing

additional reference signatures Bi can constrain the solutions

still further, at the expense of additional calculations.

small set of simplistic functions has been chosen to illustrate

the general approach. Selection and evaluation of more appropriate characteristic functions will be the subject of further

work, as are methods of dynamically generating appropriate

characteristic functions for a given spectral library.

The k-means algorithm requires, as initialization parameters,

the expected number of classification clusters (k); some

initial selection of centroid locations for the clusters; and

an error threshold, to limit the number of iterations. The

number of expected classifications p

was estimated using a

common rule-of-thumb value, k n/2 [10], and the initialization centroids were randomly chosen from the sample

set. However, given that the library spectra are generally

well-labeled, the clustering problem can instead be tackled

with a semi-supervised solution: that is, the known material

and chemical composition of the samples can be used to

intelligently select a diverse set of signatures to serve as

the initial centroid locations. Guided initialization has the

potential to pose a significant impact on the determination

of cluster associations, as starting-point selection is known

to strongly affect the result of the k-means algorithm [11].

Initialization is also a focus of ongoing work.

library to be reduced into a score vector:

1

~ = ..

.

N

(1)

~

where N is the number of characteristic functions B. The

values can then be considered the coordinates of a point in

N -dimensional space, where B1 ..BN serve as axes (orthogonality is not required, but is desirable). In this new spatial

representation, score similarity can be characterized as the

Euclidean distance between two points; thus this enables the

use of existing spatial clustering algorithms, such as k-means,

to perform classification of spectra.

Detailed Approach

Each signature in the spectral database consists of columns

of spectral data accompanied by various optional metadata

properties, such as sensor identification and calibration, environmental conditions, sample identification and description,

axis units and labels, and any known observable associations.

The data is in the wavelength domain with value columns

representing either reflectivity or emissivity, as indicated by

axis properties (see Figure 3, an example signature from

the ASTER library [12]). Note that NaN float values are

used to represent invalid or removed data points within the

spectra, such as deliberately suppressed water bands. A

hash of the spectral data uniquely identifies the signatures;

therefore, two signatures having the same identifier are assumed to be identical, and cannot both exist in the database.

The phenomenology of the signature (LWIR, MWIR, SWIR,

VIS/NIR) is also indicated by metadata properties.

Preconditions

The choice of mathematical functions used to produce reference spectra, and thus the spectral angle scores used for

comparison, is a critical factor in the utility of this approach.

Poorly chosen functions result in spectral angles that are

highly similar for many or all library spectra. Functions that

perform well in one spectral band may perform poorly in

others, thus requiring different sets of functions for different

3

domain locations in which the former is defined. Because

the characteristic functions are real and continuous, real Y

values are always returned at any signature-specified domain

location. NaN values are ignored in the comparison (lines 910), and thus removed or suppressed bands have no impact

on the score. For each real-valued datapoint, the product

is accumulated through the algebraic definition of the dotproduct (line 12 of the algorithm):

~B

~ =

A

M

X

ai bi = a1 b1 + ... + aM bM

(2)

i=1

Figure 3.

Library

the Y value of one data point in the signature. Here, B is

an artificial vector that is automatically generated based on a

function fj . Correspondingly, bi = fj (xi ), where xi is the X

value corresponding to ai s Y value. The geometric definition

of the dot product then determines the angle between the

spectral vectors (line 17):

~B

~ = |A||B| cos AB =

A

based on a set of desirable properties:

M

X

ai bi

(3)

i=1

spectra are two-dimensional.

Each function is real and differentiable within the domain

of interest, so that a spectral angle mapping against any real

spectra will yield a real-valued solution.

Each function generates a broad range of score values

against available library spectra within the domain of interest.

Each function is linearly independent and preferably orthogonal from the other selected characteristic functions

within the domain of interest, so that each presents a unique

spectral vector.

= AB = cos1

PM

a i bi

|A||B|

i=1

(4)

the loop (lines 13-14). Each resulting AB is that signatures

score against a characteristic function fj ; together they result

in a score vector:

Sif1

Sif2

(5)

~Si =

...

SifN

and constrain the selection of characteristic functions to those

that generate a wide distribution of real-valued spectral angle

scores within the database of spectra. Thus, the selection

of appropriate specific characteristic functions is contingent

on the nature of the library against which it is applied. The

number of functions to be used is likewise flexible; more axes

for comparison lead to less ambiguity in the set of solutions,

at the expense of computation time and volume of score data.

The choice of characteristic functions can be validated by a

set of empirical desirability tests against the database:

library to create a baseline set of score values. The twodimensional array of scores is indexed by each signatures

unique identifier. If additional signatures are added to the

library, sets of scores are calculated for the new additions and

stored.

When an unknown signature S is introduced, the same

method is used to calculate a set of scores. A Euclidean

distance value is then computed against each existing signatures set of scores. The scores of each database signature

serve as its coordinates in the N-dimensional function space.

The unknown signatures location, as determined by its own

scores, should then be located closest to other signatures that

share similar spectral characteristics. These nearest neighbors

are then selected for further automated or visual analysis, at

the users discretion.

are any of the produced scores NaN (undefined, infinite,

or otherwise non-real)?If so, this indicates the function is

not real and/or differentiable throughout the entire domain of

interest.

2. Are the produced spectral scores broadly distributed?If

not, the function is not a good discriminator for the signatures

of interest.

any time the spectral library is updated. The set of k cluster

centroids C is initialized as a random sampling of locations

within the dataset (line 5). Each signature Si is associated

with the centroid closest to it by Euclidean distance (lines 79), then new centroid locations are calculated for each cluster

representing the average of the cluster constituents locations

(lines 10-12). Then the sum change in centroid positions

between the current iteration and the previous is calculated

with scores produced by another function?If so, the functions may not be linearly independent, or they may measure

highly correlated spectral features. One of the functions may

be used, but not both.

The cSAM algorithm is used to determine score values for the

database, as shown in Algorithm 1. The procedure compares

4

1: procedure GENERATE S CORES(S, f )

2:

scores := 2D array of [signature IDs][score values]

3:

for each Si in S do

4:

for each fj (x) in f do

5:

product 0

6:

sM ag 0

7:

f M ag 0

8:

for each datapoint (X, Y ) in Si do

9:

if X or Y is NaN then

10:

skip datapoint

11:

else

12:

product = product + (Y fj (X))

13:

sM ag = sM ag + Y 2

14:

f M ag = f M ag + fj (X)2

15:

end if

16:

end for

17:

scores[Si ][fj ]=cos1 ( sMproduct

)

ag f M ag

18:

end for

19:

end for

20:

return scores

21: end procedure

Name

10-nm Cosine

1-m Cosine

100-m Cosine

Equation

y = cos(100x)

y = cos(x)

x

y = cos( 100

)

observable metadata are all stored in various database tables

and referenced by the signatures unique hash identifier.

The data used for comparison was selected from the Advanced Spaceborne Thermal Emission Reflection Radiometer

(ASTER) Spectral Library 2.0, a collection of spectra of natural and man-made materials produced by a collaboration of

the Jet Propulsion Laboratory, the Johns Hopkins University,

and the United States Geological Survey [2]. The data spans

the 0.4 to 15.4 m wavelength, which includes the visual

and near-infrared (VIS/NIR), shortwave (SWIR), and thermal

infrared (TIR) electromagnetic bands. All of the selected signatures describe directional hemispherical reflectance as collected by the NASA Terra spaceborne hyperspectral imaging

platform, are represented in percent reflectivity, and consist

of approximately 400-600 data points each. The data are not

uniformly sampled; for example, some begin at 0.43 m and

others at 0.3 m. However, all of the data do exhibit the

same sampling resolution: 2-nm up to 0.8m, 20-nm between

0.8m - 5m, and 100-nm between 5m - 14m.

falls below the threshold parameter (line 6), indicating that

the cluster associations have stabilized. The algorithm returns

the final cluster centroid locations, along with the mapping of

signatures to assigned cluster (line 19).

Algorithm 2 k-Means Clustering

1: procedure KMEANS (dbScores[S(f )],

Number of

classes k, Error Threshold )

2:

C, C 0 := size k arrays of centroid locations

3:

A := mapping of signatures to assigned cluster #

4:

. Change between iterations

5:

C k points randomly selected from dbScores

6:

while do

7:

for each Si in S do

8:

A (Si nearest cluster in C)

9:

end for

10:

for each Cj in C do

11:

Cj0 = average of all points mapped to Cj in A

12:

end for

13:

=0

14:

for each Cj0 in C 0 do

15:

= + |Cj Cj0 |

16:

end for

17:

C C0

18:

end while

19:

return C, A

20: end procedure

as reference spectra for this paper. The selections, all cosine

functions of varying frequency, are intended to mirror the

desirable properties identified above while remaining computationally trivial to execute. Under the assumption that

all signature wavelength values are represented in microns,

the selected functions capture spectral features at the 10nanometer, 1-micrometer and 100-micrometer resolutions.

The computation of discrete values for these functions with

respect to the described Signature Scoring algorithm was

hard-coded into the software implementation.

1,800 samples of various minerals, soils, vegetation, and

manmade materials were selected from the ASTER library

for comparison. Spectral angle scores were generated against

the three characteristic functions above and stored in the

database. The k-Means algorithm

was performed against

q

N

the score dataset using k =

2 = 30 randomly selected

signatures as initial centroid locations. The selected error

threshold was = 1105 . The centroid locations converged

on a stable solution at this error threshold after 29 iterations.

One signature, 14259.61 (a sample of lunar dust collected

from the Apollo 11 mare site), was chosen to represent the

unknown signature (Figure 4). Its score data was manually

removed from the database, then the cSAM plugin was run

to recalculate the score and determine its cluster association.

The runtime of both the database scoring / clustering process

and the signature classification process were recorded. Also,

to evaluate the accuracy and run-time of the scoring process

without clustering, the unknown signatures score values

were recomputed and spatially compared against all 1,799

database spectra via Euclidean distance computation.

Implementation

In support of ISP spectral analysis, JHU/APL has developed

a standardized database schema representation of spectral

signatures, and an associated Java-language software application SigDB to aid in exchange, preliminary analysis, comparison and classification of collected spectra. The scoring

and clustering methodology described herein was developed

as a plug-in capability for the SigDB application, which

enabled immediate access to a large quantity of spectral data

and a framework for analysis. SigDB stores signature data

in an SQL relational database as IEEE754 64-bit floating

100-m Cosine

same dataset. The same signature was selected as the unknown and compared against the 1,799 other signatures. This

SAM implementation does not perform any interpolation or

extrapolation to align signature domains; as a result, database

signatures with minimum/maximum domain values were automatically pre-filtered. Further, if the code detects any

mismatch in domain values within the two compared spectra

during comparison (such as a missing/suppressed datapoint

in one but not the other), it immediately terminates that

comparison, reports a NaN spectral score, and moves on to

the next comparison; however, these still impact the run-time

of the algorithm. These are known and accepted limitations

of the traditional SAM algorithm, and are usually worked

around via data interpolation and extrapolation. The reported

matches and run-times of each process were recorded and

compared.

Water)

Results

Table 2 shows the number of computations and runtimes

for each of the algorithm processes performed. The first

row is a traditional spectral angle mapping comparison of

the unknown signature against the database spectra. Although the database contains 1,800 signatures, only 477

match the same minimum/maximum domain, so only these

were selected for comparison; of these, only 17 signatures

matched every datapoints domain precisely. Thus, the other

460 calculations were terminated before completion. This

process took approximately forty seconds; the generated

scores, along with the name of each signature, is shown

in Table 3. The second row includes computation of

scores against each of the three characteristic functions, kmeans cluster classification (which included 29 iterations of

the clustering algorithm), and storage in the database. All

database spectra were scored, but no actual comparisons were

performed in this step. This process took approximately four

minutes. The third row includes calculation of the unknown

signatures scores against the characteristic functions, then

Euclidean distance comparison of those scores against each

of the 30 cluster centroids to determine classification. This

process took ten milliseconds. The final row, which is not

performed in the normal course of k-means analysis, was a

recalculation of the unknown signatures scores (in order to

against all 1,799 other score sets in the database. This process

took fifteen milliseconds, and the cSAM Euclidean distances

to the signatures scored by traditional SAM are also shown in

Table 3.

Figure 5 illustrates the overall spread of spectral angle scores

of the 1,800 ASTER signatures against two of the three

selected characteristic functions. The third function, 10-nm

cosine, results in minimal differentiation between signatures;

all fall in the range [88.2, 92.1] degrees, so this axis is omitted

in the figure. The narrow differentiation of scores by the

10-nm cosine and the relatively broad distribution of scores

against the other two functions illustrate both implications of

the second empirical test of desirability: the former indicates

that the 10-nm cosine is undesirable for the dataset at hand,

while the latter indicates that the 1-m and 100-m cosines

do perform well as discriminators. Table 4 describes the

location of the cluster centroids in the score space produced

by the three functions.

6

Process

# Score Calculations

SAM Algorithm

cSAM Score Computation

cSAM Cluster Comparison

cSAM Euclidean Score Comparison

477 (17)

1800

1

1

# Comparisons

Run-Time (ms)

477 (17)

0

30

1800

39,794

223,340

10

15

Avg ms per

Comparison

83

N/A

0.33

0.01

Degrees) and cSAM Euclidean Distances

Signature Name

14148.183

12024.69

12023.139

14149.18

12070.405

12030.135

61241.98

64801.34

68501.609

10084.1939

62231.15

60051.19

14141.146

67941.72

67701.36

61221.79

Sea water

Spectral Angle

cSAM Distance

1.333

1.656

2.651

2.670

2.760

3.041

3.369

4.159

4.561

4.643

4.776

5.550

7.066

8.133

8.356

8.855

24.929

1.063

0.695

1.901

2.032

2.094

2.046

3.017

3.642

4.303

4.810

4.329

5.080

5.637

7.648

7.972

7.852

8.786

The results were evaluated by comparing the results of the

traditional SAM approach to the score values produced by

cSAM, as well as the cluster classifications produced by kmeans. The SAM algorithm was only able to compare the

unknown signature with those that were in precisely the same

domain, which coincided with data produced by the same

sensor, and thus largely correlated with the most probable

association: as shown in the second column of Table 3 and

Figure 6, lunar dust signatures all scored within < 12.9 ,

and the one non-lunar signature compared, sea water, scored

= 23.003 . For the same signatures, cSAM produced Euclidean distances as shown in the third column of Table 3. The

full spread of Euclidean distances to the unknown signature

is shown in Figure 7, which illustrates the spatial distribution

of spectral scores with respect to the unknown. All of the

lunar dust signatures, which we consider true matches, scored

within the closest 11% of spectra within the database.

The k-means clustering placed the unknown signature in

Cluster #3. The other lunar dust signatures were placed

in Clusters 3 (7 signatures), 11 (8 signatures) and 27 (1

signature). The sea water signature was also placed in Cluster

27.

7

Cluster

10-nm

1-m

100-m

Signatures

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

90.443

89.401

90.636

89.874

90.451

89.293

90.587

90.633

90.104

90.366

90.575

90.630

90.175

90.522

90.289

90.282

90.859

90.380

89.745

90.624

90.476

90.355

89.594

89.747

90.483

89.429

90.591

90.121

90.413

90.546

90.380

123.94

93.475

124.72

80.886

119.21

116.74

114.72

126.14

118.52

106.13

107.24

131.17

86.036

125.50

119.15

75.508

117.53

140.77

97.332

74.903

68.062

122.00

134.34

57.068

130.32

78.987

124.53

76.509

105.788

16.014

42.293

38.517

33.820

7.3951

48.691

22.568

29.288

20.840

37.312

27.278

37.788

24.880

9.4432

27.491

9.6618

58.144

14.352

30.669

24.690

7.8526

15.130

1.4625

29.652

30.304

36.741

2.5388

13.485

24.125

19.860

61

62

57

70

59

44

40

68

89

43

53

72

122

49

61

89

13

72

48

64

77

48

39

78

47

56

120

38

18

43

decrease in the time to identify potential matches. This could

vastly improve the performance and accuracy of existing

spectral systems in use by scientific, defense and emergency

response stakeholders.

Two primary areas have been identified for further investigation. Characteristic functions appropriate for use with

libraries consisting of different spectral bands and spectral

resolutions should be considered and evaluated, based on

the desirable properties and empirical tests described above.

Automated polynomial-based approaches may also allow the

characteristic functions to be generated dynamically based

on the actual content of the spectral library. Other methods

besides k-means should also be considered and evaluated.

This includes fuzzy k-means, which would reduce the partitioning of dense areas of the score space; semi-supervised

approaches, which can take advantage of the copious label

data within the library; and dynamic determination of the

number of classes/clusters, based on the known material

content of the library.

Spectral Scores to the Unknown Signature (sorted ascending

from left)

ACKNOWLEDGMENT

The authors would like to thank the Integrated Signatures

Program for their support, Thomas Spisz (JHU/APL) for

information on the Spectral Angle Mapper algorithm, and

Edward Birrane (JHU/APL) and Jason Oxenrider (JHU/APL)

for editing and review.

1. Scoring against characteristic functions via the cSAM

algorithm generally approximates the spectral similarity between signatures, as appropriate for a first-pass filter.

R EFERENCES

investment to perform characteristic computations, the time

to compare a newly captured signature against a large library

is reduced from a linear-scale operation to near-constant time.

[1]

effectively partition the cSAM score-space into usable classifications. However, the use of semi-supervised approaches

(using existing classification information stored in the spectral library), better heuristic selection of the number of likely

classes k, and informed selection of the cluster-initialization

centroids all are likely to dramatically improve classification

accuracy.

[2]

[3]

[4]

The importance of careful selection of characteristic functions was clearly illustrated by the 10-nm cosine functions

inability to discriminate amongst the library spectra. Intuitively, a 10-nm-scale curvature is negligible when compared

against spectra on the micron scale; therefore, the range of

spectral resolutions of the library spectra are a significant

factor in the efficacy of the functions.

[5]

[6]

Characteristic spectral angle mapping is a potentially powerful approach to reducing the run-time cost of autonomous

spectral classification and identification against large signature data sets. By converting the spectral classification

problem into a spatial problem, cSAM enables the application

of many existing well-developed classification approaches.

Our preliminary results indicate a good correlation between

the chosen characteristic functions spatial scores and brute

[7]

[8]

8

J. Li, D. B. Hibbert, S. Fuller, and G. Vaughn, A comparative study of point-to-point algorithms for matching

spectra, Chemometrics and Intelligent Laboratory Systems, vol. 82, no. 1-2, pp. 5058, May 2006.

A. M. Baldridge, S. J. Hook, C. I. Grove, and G. Rivera,

The aster spectral library version 2.0, Remote Sensing

of Environment, vol. 113, pp. 711715, 2009.

R. O. Duda, P. E. Hart, and D. G. Stork, Pattern

Classification, 2nd ed. John Wiley and Sons, 2001.

Y. Sohn and N. S. Rebello, Supervised and unsupervised spectral angle classifiers, Photogrammatric

Engineering and Remote Sensing, vol. 68, no. 12, pp.

12711280, December 2002.

F. A. Kruse, J. W. Boardman, and J. F. Huntington,

Comparison of airborne hyperspectral data and eo-1

hyperion for mineral mapping, IEEE Transactions on

Geoscience and Remote Sensing, vol. 41, no. 6, pp.

13881400, June 2003.

C. Salvaggio, L. E. Smith, and E. J. Antoine,

Spectral signature databases and their application/misapplication to modeling and exploitation of

multispectral/hyperspectral data, in Algorithms and

Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XI, S. S. Shen and P. E. Lewis, Eds.,

vol. 5806. SPIE, 2005.

K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed., W. Rheinboldt, Ed. New York: Academic Press, October 1990.

M. Ester, H.-P. Kriegel, J. S, and X. Xu, A densitybased algorithm for discovering clusters in large spa-

[9]

[10]

[11]

[12]

B IOGRAPHY [

tial databases with noise, in Proceedings of 2nd International Conference on Knowledge Discovery and

Data Mining, E. Simoudis, J. Han, and U. Fayyad,

Eds., American Association for Artificial Intelligence.

Menlo Park, California: The AAAI Press, 1996, pp.

226231.

P. E. Dennison, K. Q. Halligan, and D. A. Roberts, A

comparison of error metrics and constraints for multiple

endmember spectral mixture analysis and spectral angle

mapper, Remote Sensing of Environment, vol. 93, no. 3,

pp. 359367, November 2004.

K. V. Mardia, J. T. Kent, and J. M. Bibby, Multivariate

Analysis. London: Academic Press, 1979, pp. 360

384.

F. Robinson, A. Apon, D. Brewer, L. Dowdy, D. Hoffman, and B. Lu, Initial starting point analysis for kmeans clustering: a case study, in Proceedings of ALAR

2006 Conference on Applied Research in Information

Technology, 2006.

(2008, December) Aster spectral library. [Online].

Available: http://speclib.jpl.nasa.gov/

B.S. in Computer Science from the Georgia Institute of Technology in 2007

and an M.S. In Aerospace Engineering

from the University of Maryland, College Park in 2013. He has worked

at the Johns Hopkins University Applied Physics Laboratory since 2008 as

a Ground Software Engineer, designing command, telemetry, data processing and network engineering solutions for NASA missions

(such as the Van Allen Probes and MESSENGER spacecraft)

as well as a variety of other civil and defense applications.

Mr. Ramachandran currently serves as the Vice-Chair of the

American Institute of Aeronautics and Astronautics (AIAA)

Mid-Atlantic Section, and has twice served as the General

Conference Chair of the AIAA Young Professionals, Students

and Education Conference (YPSE).

Herbert Mitchell received a B.S. in

Chemistry from Washington and Lee

University and a M.S. in Analytical

Chemistry from University of Virginia.

He entered the U.S. Navy after graduation and served as a scientist dealing

with the effects of nuclear weapons effects on humans and on the chemistry

of the atmosphere. In his government

roles and afterwards as a contractor

supporting the defense department and other agencies, he

authored several reports, worked on several special projects,

and served on several committees investigating scientific

phenomena. He has a record of leading them to successful

conclusions. Often these projects were of interest to high

levels of government. His interests have generally been to

develop novel ways to use wide ranges of sensors to better

acquire data of needed interest to the Defense Department.

For the last decade he has been working for the Physics

Department of the Naval Postgraduate School and has been

working at several agencies in the Washington, DC area, most

recently at the Joint IED Defeat agency (JIEDDO).

Samantha Jacobs received a B.S. in

Physics from Georgia Southern University in 2012. In 2013 she joined

the Johns Hopkins University Applied

Physics Laboratory as an associate

Ground Software Engineer in the Space

Department. Her work in the Space

Department includes automated testing,

network engineering solutions, and data

processing.

Nigel Tzeng received a B.S. In Computer Science and a M.S. In Software Engineering from the University of Maryland College Park. Mr. Tzeng has

over 20 years experience in spacecraft

ground systems, command and control (C2) systems, data visualization

and software engineering. He joined

the Johns Hopkins University Applied

Physics Laboratory (JHUAPL) in 2003

and is currently a senior member of the Space Department

technical staff. Mr. Tzeng leads the development of signature

and geospatial analysis/exploitation software systems and

served as the Group Chief Scientist for the C2 Systems Engineering Group from 2007-2009 as well as been the Principal

Investigator of several C2 research initiatives. His primary

area of research are command and control, geospatial visualization and collaboration. Prior to joining JHUAPL, Mr.

Tzeng worked in telecommunications, e-commerce, advanced

traffic management systems, spacecraft simulation (Landsat,

SOHO), spacecraft command and control (SAMPEX, TRMM,

FUSE), and science data processing/visualization (COBE).

He was the lead software architect and designer of the City of

Louisville Advanced Traffic Management System and developer of the DIRBE, FIRAS and DMR sky map visualization

software on COBE.

Alexer Firpi received a B.S. in electrical

engineering from Polytechnic University

(San Juan, Puerto Rico), an M.S. in

electrical engineering from the University of Puerto Rico (Mayaguez, Puerto

Rico), and a Ph.D. in electrical engineering from Michigan State University

(East Lansing, MI). After concluding his

doctoral studies, Dr. Firpi did postdoctoral work at different institutions in

diverse research areas such as intelligent control, biomedical

engineering, imaging genetics, and bioinformatics. He is

currently a senior staff member at Johns Hopkins University

- Applied Physics Lab. Dr. Firpis research focuses on

machine learning, brain-computer interfaces, computational

intelligence, and any other research problem that can be

automated using machine-learning approaches. He is the

author of more than 20 peer-reviewed publications and two

book chapters.

Benjamin Rodriguez received a Bachelors of Science (B.S.) and Masters of

Science (M.S.) in Electrical Engineering from the University of Texas, and

received a Doctor of Philosophy (Ph.D.)

in Electrical and Computer Engineering

from the Air Force Institute of Technology, Graduate School of Engineering

and Management, Electrical and Computer Engineering Department, WrightPatterson Air Force Base, OH. He is the Section Supervisor

for Space Systems and Architectures in the Space Department with The Johns Hopkins University Applied Physics

Laboratory. He is also an instructor at The Johns Hopkins

University, Whiting School of Engineering for the Department of Electrical and Computer Engineering as well as the

Department of Computer Science.

10

- rp_Verizon-DBIR-2014_en_xg.pdfUploaded byFrancisco
- Scikit Learn InfographicUploaded byAlexandre Farb
- Knime Bigdata Energy Timeseries WhitepaperUploaded bymayank
- clustreamUploaded bysandeep83
- Survey on Security Management of Multiple Spoofing Attackers in Wireless NetworksUploaded byEditor IJRITCC
- Literature Survey on Detection of Brain Tumor from MRI ImagesUploaded byInternational Organization of Scientific Research (IOSR)
- Samvida 1 2011 Rule BookUploaded byKunal Suryavanshi
- 05656255Uploaded byMohammad Afwanul Hakim
- 1-s2.0-S1568494614006334-main.pdfUploaded byNovieka Distiasari
- Smart Response System Using Speech EmotionUploaded byEditor IJRITCC
- What is Machine LearningUploaded byTHIRUNEELAKANDAN
- Hae Us Ser 18 AssociativeUploaded byxj112358
- WEKAUploaded byad
- MA2161 syllabusUploaded byNinjha Ashley
- Lung Disease prediction system using naive bayes and k means clusteringUploaded byMohammad Farhan
- MLUploaded byMalvikaSingh
- A Cluster Validity Index for Fuzzy ClusteringUploaded bykalokos
- [IJET-V2I5P5] Authors: CHETANA M, SHIVA MURTHY. GUploaded byInternational Journal of Engineering and Techniques
- Cluster AnalysisUploaded byalvin fn
- clutoman.pdfUploaded bylikufanele
- Guided Filter Technique: Various Aspects In Image ProcessingUploaded byEditor IJRITCC
- Motheye SPIE6545-34ME20Mar07Uploaded byblurnewbie
- Data-Driven Baseline Estimation of Residential Buildings for Demand ResponseUploaded bypasomaga
- 150 spatial 2.pdfUploaded byKobalt von Kriegerischberg
- Macro bendUploaded bykhuecamau
- a17-carpineto.pdfUploaded byAnca Boloșteanu
- 12s MidI -SampleExam Print1Uploaded byDivya Gn
- 02_The-Basics_Part1.pdfUploaded byamila_vithanage
- LADE13 Inner Product SpacesUploaded byRoumen Guha
- lec_1Uploaded bysparkywolf

- 9.10 New FeaturesUploaded byadepeli
- Dr. Hua Ma Explains Why the Universe Has Three Large Spatial DimensionsUploaded byPR.com
- Molecular symmetry with quaternionsUploaded byvanalexblues
- 4D Vector Product DerivationUploaded bytalanum1
- Catia Interview QuestionsUploaded byMr. Pradeep Wadkar
- Motion Planning Closed Chain MechanismsUploaded bySAN JUAN BAUTISTA
- MicroStationV8iIntroductionto 3DSs3.pdfUploaded byAngs Taz
- Auto Cad Command SummaryUploaded byAsif Ameer
- elements of art principles of design art criticism - handoutUploaded byapi-294607284
- AIRCRAFT IN THE TRANSONIC VELOCITY RANGE.pdfUploaded byChegrani Ahmed
- 399-362-1-PBUploaded bykarvinit
- Tensor Analysis-Chapter 3Uploaded byqftgauge
- Scientific work placeUploaded bypeloton10
- UT Dallas Syllabus for math2419.0u1.11u taught by Richard Ketchersid (rok100020)Uploaded byUT Dallas Provost's Technology Group
- VectorMethods (1).pptxUploaded bysukri arjuna
- Mtech Scheme & Syllabus_keralauniversityUploaded bysendtomerlin4u
- Actuation Principle of PMSMUploaded byRadharaman Das
- Types of Elements used in Finite Element Analysis.pdfUploaded byTayyab Zafar
- 304dupplan docxUploaded byapi-376801179
- Unit 1Uploaded bycooooool1927
- 2019.PDFUploaded byTessfaye Wolde Gebretsadik
- Student Guide 161-192Uploaded bySriharsha SV
- ch5-6Uploaded byggghghk
- GD&T fileUploaded byMidhilesh Ravikindi
- Algebra I HonorsUploaded byinalal
- 4 Modular CoordinationUploaded byazcomp
- Tutorial matlabUploaded byCarl Alb
- Module 1Uploaded byJabbar Aljanaby
- Math 3206Uploaded bymohdbhaisi
- Space FramesUploaded by10jalanjawa