Compact Clustering of Multidimensional Vectors

An algorithm for storing multidimensional temporal and spatial vectors.

Copyright © 2008 Douglas D. Hammer, Principal Investigator

This paper explores the process of partitioning multidimensional vectors into clusters representative of both spatial and temporal relation. This process extends the k-means clustering to a root k-means, where each cluster organizes multiple time scales in relation to their spatial orientation. Were vectors of multiple time frames measured by their dimension or length establish a radial distance metric, in addition to their unit vector or direction that establishes their principal direction or axis of rotation. In the clustering process k clusters are chosen to represent multiple time scales, and centroid or root mean distance is measured from its closest vectors relative to the cluster's maximum dimensional root mean distance. This process partitions vectors within similar time frames, in addition to their spatial axis of direction. As an illustration of this in the cluster map presented on the front page nine yellow vectors share equal event horizons. Near this cluster two other vectors occupy a similar time horizon with root mean distances of a greater angle of separation or variance of direction, placing them in the larger cluster.

Vector Normalization and Direction
In order to place vectors on a two dimensional plane vector sums and normals are calculated to find the each vector's unit vector. Normals a generally calculated using the sqrt of sums squared. The reason for raising the vector to it's dimensional power and taking this root can be observed when finding differences of vectors in the range of [-1,..+1]. The equation below is the vector sum:

Our next equation calculates the vector normal or magnitude. Finding the vector normal is vital is the calculation of vector products, and angles between different vectors.

In the third equation we calculate the unit vector that has a magnitude of [0,..1] and is representative of vector direction. This value in the next section is used to find the principal axis of rotation or theta, to calculate a radial transform based on the vector's dimension.

Event Horizons of Vector Translation
Taking two vectors v1=[0] and v2=[0,0,0] from their global origin both have the same magnitude or vector normal and the same sum. As a consequence their unit vectors under translation sit on the same axis, where their dimensional component has a radial coefficient of v1=1 and v2=3. Dimension is the distance from the center or radius normalized to their cluster's maximum vector dimension. This keeps vectors with similar dimension within the same cluster, irrespective of their direction. Where each axis represents the unit vector with a magnitude of [0..1]. The unit vector is an indication of direction, where it is translated onto its principal axis of rotation. Here theta is 360o .

In the transform above r is the radius or the scaled distance from their cluster's maximum dimension and n is the vector's dimension. Where the center represents a universal time scale of zero, for vectors of any dimension, radiating outwards to an infinite time scale. The equation below indicates each cluster kj maximum vector dimension phi, for all vectors partitioned to the jth cluster of k.

Calculating Centroids or Root Mean Cluster Centers
Each cluster has a set of vectors that is the basis for its center, or sometimes called the centroid. This value for the centroid is the standard mean or average. After successive iterations the clustering converges to a specific set of vectors. During this convergence the center of each cluster is calculated, at each iteration.

This process continues until all of the vectors are partitioned into their own cluster. After the clustering stops new centers are calculated. The final clustering draws centers for each cluster connected to the locations of each of its vectors' locations. In addition changing the value of r in the translation allows us to zoom in on a specific time scale and search for related events in nearby clusters. The clustering of multidimensional vectors with the process of rK-means is an efficient tool for analyzing the relationships on multiple scales of space and time, that mimics kmeans when the maximum vector dimension is equal to two. rK-means can also scale to multidimensional vector data for time and frequency analysis applications, that allows us to understand events on multiple time scales, and build better models.





The image to the left is a cluster of one hundred vectors at 18% of the maximum time scale. Figure (a) at 20% shows the birth of two clusters. Figure (b) at 23% the two clusters begin to diverge as seen in (c) at 25% the clusters' reach a maximum divergence. In figure (d) at 45% the second cluster splits into three separate clusters. At closer inspection two of the four clusters in figure (d) have four vectors to the far left of the figure with different time scales, as distance from their respective centroids. The two clusters share similar structure in respect to their directional vectors. This observation is only revealed when all of the vectors are viewed at 45% of their maximum dimension or time scale.