To use the toolbox for the design of a data mining algo-rithm, a
training data set
is required. This data set is nor-mally given by a binary Matlab project file, containingmatrices and vectors with predefined structures and names.This data set is normally given by a binary Matlab projectfile, containing matrices with given names. Additionally,the user is able to add own textual identifiers and furtherinformation to the matrices and structures. Missing infor-mation is compensated by standard values and identifiers.The import of data from text files (single files or completedirectories, single features or time series) is possible.The training data set is organized with n = 1, ..., N datapoints, each containing
time series (described by a matrix with the dimen-sion s
K, with K - number of sample points),
s single features (vector with the dimension s)
discrete output variables (vector with the dimensions
).The management of multiple output variables (i.e. diagno-ses with respect to diseases in medical applications, deci-sions for therapies, qualitative evaluations of therapy suc-cesses, gender, age-groups etc.) for each data point allowsa flexible selection of multiple classification problems.Additionally, input and output variables may be switcheddepending on the problem.Gait-CAD implements the
standardized data miningprocess
proposed by . The main components are shownin Figure 2. Gait-CAD permits a comfortable handling of numerous algorithms for the
selection of data points (e.g. detection of outliers,discarding of incomplete data points and features, se-lection of parts of data sets),
feature extraction (e.g. spectrograms, FFT analysis,correlation analysis, linear filtering, calculation of ex-trema, mean values, fuzzification etc.),
evaluation and selection of features and time series(e.g. multivariate analysis of variances, t-test, informa-tion measures, regression analysis),
feature aggregation (e.g. discriminant analysis, princi-pal component analysis - PCA, independent compo-nent analysis - ICA),
supervised and unsupervised classification (e.g. deci-sion trees, cluster algorithms, Bayes classifier, artifi-cial neural networks (ANN), nearest neighbour algo-rithms, support vector machines - SVM, fuzzy sys-tems), and
validation strategies (e.g. cross-validation, bootstrap).Additionally, there are various possibilities to visualizeresults, automatically log results and process steps in textand LaTeX files, rename variables etc.For some functions, Gait-CAD uses additional commercialMatlab toolboxes (e.g. Signal, Statistics, Neural Network,and Wavelet toolbox from the MathWorks, Inc.) or freelyavailable GNU-GPL toolboxes. But most of the self-implemented functions require only a standard Matlabinstallation.The
is realized with plugins. Pluginsare single Matlab functions called
, which areincluded in a special directory or in the working directory.
EvaluationmeasuresProblem formulation(formalized)Problem formulation(verbalized)Collectingtraining data set
Design of a data mining method (Gait-CAD)
Figure 2: Design process of a data mining algorithm They generate
new time series from one (e.g. by low-pass or high-pass filtering, segmentation) or more (e.g. minimum,mean or maximum value) existing time series, or
new single features from one time series in a pre-defined segment (e.g. mean value for the completetime series or the first 50% of sampling points). Thesegment can be defined by a special file or interac-tively by selecting a region of interest.Gait-CAD contains a large number of pre-defined pluginsand segments. The structure allows a user-defined expan-sion with special feature types for each specific applicationfield.
are recorded sequences of clicked menu items andcontrol elements. The main advantages are an automationof long sequences of operations (e.g. for the use in differentprojects) and the opportunity for the integration of user-defined functions. A manual modification is possible dueto its textual Matlab syntax.
Application-specific extension packages
can be easilyintegrated into the graphical user interface. Gait-CADcontains templates for new menu items and control ele-ments as a starting point for a manual modification. Itallows the integration of own functions using any parame-ter from the control elements or available variables. Anexample is a special package for electroneurography pro-vided by the University of Freiburg. It contains the algo-rithms described in .
In many clinical applications, the available data set con-tains time series of recorded bioelectric signals such asmuscle, nerve, or brain signals.The automatic design of data mining solutions offers anobjective and reliable method for the generation of hy-potheses for clinical trials, the data-based design of clinicaldecision support systems for diagnosis and therapy plan-ning, and the adaptation of medical devices to individualpatients.An example for the latter task is the detection of user inten-tions from brain, nerve or muscle signals or the informa-tion processing of nerve signals from natural limbs forneuroprostheses (Figure 3).