Welcome to Scribd. Sign in or start your free trial to enjoy unlimited e-books, audiobooks & documents.Find out more
Download
Standard view
Full view
of .
Look up keyword or section
Like this
1Activity

Table Of Contents

1.1 Data Matrix
1.2 Attributes
1.3 Data: Algebraic and Geometric View
1.3.1 Distance and Angle
1.3.2 Mean and Total Variance
1.3.3 Orthogonal Projection
1.3.4 Linear Independence and Dimensionality
1.4 Data: Probabilistic View
1.4.1 Bivariate Random Variables
1.4.2 Multivariate Random Variable
1.4.3 Random Sample and Statistics
1.5 Data Mining
1.5.1 Exploratory Data Analysis
1.5.2 Frequent Pattern Mining
1.5.3 Clustering
1.5.4 Classification
1.6 Further Reading
1.7 Exercises
Data Analysis Foundations
Numeric Attributes
2.1 Univariate Analysis
2.1.1 Measures of Central Tendency
2.1.2 Measures of Dispersion
2.2 Bivariate Analysis
2.2.1 Measures of Location and Dispersion
2.2.2 Measures of Association
2.3 Multivariate Analysis
2.4 Data Normalization
2.5 Normal Distribution
2.5.1 Univariate Normal Distribution
2.5.2 Multivariate Normal Distribution
2.6 Further Reading
2.7 Exercises
Categorical Attributes
3.1 Univariate Analysis
3.1.1 Bernoulli Variable
3.1.2 Multivariate Bernoulli Variable
3.2.1 Attribute Dependence: Contingency Analysis
3.3 Multivariate Analysis
3.3.1 Multi-way Contingency Analysis
3.4 Distance and Angle
3.5 Discretization
3.6 Further Reading
3.7 Exercises
Graph Data
4.1 Graph Concepts
4.2 Topological Attributes
4.3 Centrality Analysis
4.3.1 Basic Centralities
4.3.2 Web Centralities
4.4 Graph Models
4.4.1 Erdös-Rényi Random Graph Model
4.4.2 Watts-Strogatz Small-world Graph Model
4.4.3 Barabási-Albert Scale-free Model
4.5 Further Reading
4.6 Exercises
Kernel Methods
5.1 Kernel Matrix
5.1.1 Reproducing Kernel Map
5.1.2 Mercer Kernel Map
5.2 Vector Kernels
5.3 Basic Kernel Operations in Feature Space
5.4 Kernels for Complex Objects
5.4.1 Spectrum Kernel for Strings
5.4.2 Diffusion Kernels on Graph Nodes
5.5 Further Reading
5.6 Exercises
High-Dimensional Data
6.1 High-Dimensional Objects
6.2 High-Dimensional Volumes
6.3 Hypersphere Inscribed within Hypercube
6.4 Volume of Thin Hypersphere Shell
6.5 Diagonals in Hyperspace
6.6 Density of the Multivariate Normal
6.7 Appendix: Derivation of Hypersphere Volume
6.8 Further Reading
6.9 Exercises
Dimensionality Reduction
7.1 Background
7.2 Principal Component Analysis
7.2.1 Best Line Approximation
7.2.2 Best Two-dimensional Approximation
7.2.3 Best r-dimensional Approximation
7.2.4 Geometry of PCA
7.3 Kernel Principal Component Analysis (Kernel PCA)
7.4 Singular Value Decomposition
7.4.1 Geometry of SVD
7.4.2 Connection between SVD and PCA
7.5 Further Reading
7.6 Exercises
Frequent Pattern Mining
Itemset Mining
8.1 Frequent Itemsets and Association Rules
8.2 Itemset Mining Algorithms
8.2.1 Level-Wise Approach: Apriori Algorithm
8.2.2 Tidset Intersection Approach: Eclat Algorithm
8.2.3 Frequent Pattern Tree Approach: FPGrowth Algorithm
8.3 Generating Association Rules
8.4 Further Reading
8.5 Exercises
Summarizing Itemsets
9.1 Maximal and Closed Frequent Itemsets
9.2 Mining Maximal Frequent Itemsets: GenMax Algo-
9.3 Mining Closed Frequent Itemsets: Charm algorithm
9.4 Non-Derivable Itemsets
9.5 Further Reading
9.6 Exercises
Sequence Mining
10.1 Frequent Sequences
10.2 Mining Frequent Sequences
10.2.1 Level-Wise Mining: GSP
10.2.2 Vertical Sequence Mining: SPADE
10.2.3 Projection-Based Sequence Mining: PrefixSpan
10.3 Substring Mining via Suffix Trees
10.3.1 Suffix Tree
10.3.2 Ukkonen’s Linear Time Algorithm
10.4 Further Reading
10.5 Exercises
Graph Pattern Mining
11.1 Isomorphism and Support
11.2 Candidate Generation
11.2.1 Canonical Code
11.3 The gSpan Algorithm
11.3.1 Extension and Support Computation
11.3.2 Canonicality Checking
11.4 Further Reading
11.5 Exercises
Pattern and Rule Assessment
12.1 Rule and Pattern Assessment Measures
12.1.1 Rule Assessment Measures
12.1.2 Pattern Assessment Measures
12.1.3 Comparing Multiple Rules and Patterns
12.2 Significance Testing and Confidence Intervals
12.2.1 Fisher Exact Test for Productive Rules
12.2.2 Permutation Test for Significance
12.2.3 Bootstrap Sampling for Confidence Interval
12.3 Further Reading
12.4 Exercises
Representative-based Clustering
13.1 K-means Algorithm
13.2 Kernel K-means
13.3 Expectation Maximization (EM) Clustering
13.3.1 EM in One Dimension
13.3.2 EM in d-Dimensions
13.3.3 Maximum Likelihood Estimation
13.3.4 Expectation-Maximization Approach
13.4 Further Reading
13.5 Exercises
Hierarchical Clustering
14.1 Preliminaries
14.2 Agglomerative Hierarchical Clustering
14.2.1 Distance between Clusters
14.2.2 Updating Distance Matrix
14.2.3 Computational Complexity
14.3 Further Reading
14.4 Exercises and Projects
Density-based Clustering
15.1 The DBSCAN Algorithm
15.2 Kernel Density Estimation
15.2.1 Univariate Density Estimation
15.2.2 Multivariate Density Estimation
15.2.3 Nearest Neighbor Density Estimation
15.3 Density-based Clustering: DENCLUE
15.4 Further Reading
15.5 Exercises
Spectral and Graph Clustering
16.1 Graphs and Matrices
16.2 Clustering as Graph Cuts
16.2.1 Clustering Objective Functions: Ratio and Normalized Cut
16.2.2 Spectral Clustering Algorithm
Algorithm 16.1: Spectral Clustering Algorithm
16.2.3 Maximization Objectives: Average Cut and Modularity
16.3 Markov Clustering
16.4 Further Reading
17.1.3 Pair-wise Measures
17.1.4 Correlation Measures
17.2 Internal Measures
17.5 Exercises
Classification
Probabilistic Classification
18.1 Bayes Classifier
18.1.1 Estimating the Prior Probability
18.1.2 Estimating the Likelihood
18.2 Naive Bayes Classifier
18.3 Further Reading
18.4 Exercises
Decision Tree Classifier
19.1 Decision Trees
19.2 Decision Tree Algorithm
Algorithm 19.1: Decision Tree Algorithm
19.2.1 Split-point Evaluation Measures
19.2.2 Evaluating Split-points
19.2.3 Computational Complexity
19.3 Further Reading
19.4 Exercises
Linear Discriminant Analysis
20.1 Optimal Linear Discriminant
20.2 Kernel Discriminant Analysis
20.3 Further Reading
20.4 Exercises
Support Vector Machines
21.1 Linear Discriminants and Margins
21.2 SVM: Linear and Separable Case
21.3 Soft Margin SVM: Linear and Non-Separable Case
21.3.1 Hinge Loss
21.3.2 Quadratic Loss
21.4 Kernel SVM: Nonlinear Case
21.5 SVM Training Algorithms
21.5.1 Dual Solution: Stochastic Gradient Ascent
21.5.2 Primal Solution: Newton Optimization
Classification Assessment
22.1 Classification Performance Measures
22.1.1 Contingency Table Based Measures
22.1.2 Binary Classification: Positive and Negative Class
22.1.3 ROC Analysis
22.2 Classifier Evaluation
22.2.1 K-fold Cross-Validation
22.2.2 Bootstrap Resampling
22.2.3 Confidence Intervals
22.2.4 Comparing Classifiers: Paired t-Test
22.3 Bias-Variance Decomposition
22.3.1 Ensemble Classifiers
22.4 Further Reading
22.5 Exercises
Index
0 of .
Results for:
No results containing your search query
P. 1
Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms

Ratings: (0)|Views: 17|Likes:
Published by Fabrizio Malfanti
Data Mining and Analysis:
Fundamental Concepts and Algorithms
Data Mining and Analysis:
Fundamental Concepts and Algorithms

More info:

Published by: Fabrizio Malfanti on Oct 01, 2013
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

05/23/2014

pdf

text

original

You're Reading a Free Preview
Pages 14 to 85 are not shown in this preview.
You're Reading a Free Preview
Pages 99 to 477 are not shown in this preview.
You're Reading a Free Preview
Pages 491 to 496 are not shown in this preview.
You're Reading a Free Preview
Pages 523 to 659 are not shown in this preview.

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->