You are on page 1of 1

Key

AI & Machine learning: brainstorming project planning


Step 1: Objectives Step 2: Audit
= human perception
Exploration/ Cleaning Collection + Cleaning
= human preferences Transformation
Audit: Explore ☺ Scrape (python, VBA)
= human knowledge Objective: Are your Need
Anova ☺ Surveys Need to Divide by largest value (0 < x < 1) ☺
= human motivation No do you Yes data clean No more
have one? or Cross tabs ☺ data or Yes Download
transform Yes Convert to Binary ☺
= human communication your data?
complete? Plots ☺ variables?
(Check licences) Create Unique Keys ☺
= GPU may be required
Sort Log Transformation
☺ = favorites or must try No
Groupby Count Queries
No Square Root Transformation
Yes Remove Illegal Characters No Missing Values Arcsine transformation
Set Objectives Visualizations No Sample Balancing
Imputation & Instrumental Variables ☺
Ratios
Logical exclusions/inclusions ☺ Mean substitution
Financial & Social Sample Box-Cox (Square Root + log)
Under sampling (random) Delete incomplete records
"The Equation" ☺ Reduction balanced
Too many Collect more data Normalize (Min-Max)
Over sampling (random) ?
KPI’s Principle components analysis (PCA) ☺ variables All clean, Change y or x Yes
SMOTE (synthetic minority) Missing Standardize (Z-score)
Methods Yes with Yes balanced, Copy adjacent observations
Factor Analysis ☺ Values?
high complete? Weight balancing Maximum likelihood estimates RobustScaler
2 x 2 matrix* (ease vs impact) ☺ Hierarchical Factor Analysis (HFA) ☺ correlations? Change prediction thresholds Expectation-Maximization imputation Create Percentiles
Killer Ideas Workshop ☺ Multi-dimentional scaling (MDS) ☺ Multiple imputation
Co-creation workshop ☺ Yes No Create Scales
Singular value decomposition (SVD)
Problem finding
No
Correspondence Analysis Step 4: Options
Framing Geostatistics
Ideation Cognitive/ Symbolic Computing Data Identify Kriging & Variogramming ☺
only?
Yes Geographic? Yes
Creative thinking Supervised Machine Learning Data Type Simulation
Need to
group the Cholesky decomposition
Cluster Analysis Yes data into
Markov Chain Geostats
K-means ☺
classes, Skip Step 3 No Support Vector Machine
segments,
Hierarchical k-means ☺ etc? Cellular automata
Hierarchical Cluster Analysis☺ etc.
Task
CLARA ☺ No Spatial autocorrelations
K-medoids ☺ Rescale Moran's I
DBSCAN ☺ Clean Time Series Cross-Sectional & Geary's C
Do you have Yes Images? Text (x) Time Series Cross-Sectional
Fuzzy ☺ Yes See Transformation
a model? and
HCPC ☺ Data (y, x)
Sort (APRIORI)
Tools
Text? Yes Univariate/ Multivariate* CART (regressor/classifier)
What type
of y
Random Forest (regressor) variable?
No Yes Statistical
Do you Step 3: Modelling Yes Econometrics
ETS (automatic) ☺ ─ Time series Analysis
want to
predict, ARIMA (automatic) ☺ ─ Structural Models
Modelling ( y = f (x) ) Do
or classify, images VAR ☺ (Vector Autoregressive) Neural Networks
explain Modelling Workshop Yes No have Can the text Econometrics ☺ (custom models)
something? List all y's captions? be
transformed Eyeballing, etc. ☺
List all x's for each y Damped (exponential smoothing) Binary (y = 0,1) Categorical/Nominal Continuous
into data Ranked (y = 1,2,3, etc.)
(binary, (y = classes) (bounded, unbounded)
Tools Theta model Statistical
cardinal)? Statistical Statistical-Bounded (Sigmoid)
Funnel Analysis ☺ Yes SES (simple exponential smoothing) Logistic regression (LR) ☺
Statistical
No Ordinal regression ☺
Logistic regression
Customer Decision Journey Holt-Winters Probit model Multinomial Logistic (Logit)
Non-linear regression
Regression ☺ Ordinal logistic regression ☺
Markov Chain ☺ OLS Stepwise logistic regression Bass model
Machine
Bayesian Belief Networks Translation Tools (Captions) No Stepwise Regression Linear Discriminant Analysis Multinomial Probit Conjoint Analysis ☺
Tools (Descriptive, other) Diffusion Curves
? See Text Tools Combinations or averages Log-binomial regression Categorical regression
Boxes and Arrows (etc.) ☺ Machine Learning Statistical-Unbounded
1 Variable
Inspirations Machine Learning Poisson regression Ordered probit Survival/hazard (y = Time)
Discrete Scales No
Tools BNNJ (Bayesian Neural Network) Cox regression Decision Trees ☺ Cumulative-logit model Conjoint Analysis
Physics, mathematics, chemistry,
Percentiles, sorts, filter
Tools (images) CART ☺ RNNJ (Recurrent Neural Networks) Machine Learning OLS regression, GLM
Counts/histograms biology, botany, zoology, etc. Yes Random Forest ☺ Continuation-ratio model
Random Forest ☺ MLP (Multi-layer perception) Decision Trees (CART) ☺ Ridge regression
Proportions Psychology, sociology, economics, Neural Nets (deep learning) Partial Proportional Odds
Neural Networks SVR (Support Vector Machine) Random Forests ☺ K-Nearest Neighbour model Robust regression
Mean, Median, Mode linguistics , geography, etc. Convolutional Neural Networks
etc. GP ( Gaussian Processes) Bayesian Networks Lasso regression
Machine Translation (MT) (CNN) Adjacent-category logit model
Standard Deviations Finance, marketing, operations, Naïve Bayes Elastic Net regression
GRNN (Generalised Regression Support Vector Machines (SVM)
Rule-based (e.g. Apertium) etc. Polytomous logistic model
Variance organizational behaviour, etc. Neural Networks) Naïve Bayes Support Vector Machines Stepwise regression
Sources Statistical (e.g. Moses) RBF (Radial Basis Functions) (SVM) Stereotype logistic model Tobit regression
Skew (Kurtosis) K-Nearest Neighbour
Neural (e.g. Open NMT, PyTorch, KNN (K-Nearest Neighbour Extreme Learning Machines Machine Learning Econometrics (custom)
Ranges, quartiles arXiv.org, PubMed, Agricola, Metrics
Tensorflow) Regression)
ROC (receiver operating Neural Networks See categorical for few ranks Machine Learning
Coefficient of variation eric.ed.gov, researchgate.net,
LSTM (long short-term memory)
Want to characteristic) Decision Trees (CART) ☺
Box plots worldcat.org, stack exchange/overflow CART (regression trees)
generate
Github, google/youtube, experts Yes new
No AUC (area under the curve) Random Trees ☺
Tests Kolmogorov-Smirnov, Random Forest (regression) Confusion matrix / Contingency
images? Neural Networks
Tools etc. Table etc.
Anderson-Darling,
Shapiro-Wilk, Lilliefors Generative Adversarial
Text Tools Networks (GAN)
t-test
Control Charts Semantic Graph Analysis ☺ etc.

2+ Variables Text Scoring ☺


Deliverables (GUI, etc.) Step 5: Allocation
Scales Natural Language Programming
Output Visualization Interfaces Format Physicality Diagnostics Optimization
Correlations Natural Language Processing (NLP)
No required? Yes Allocation & Control
See data reduction Optimal budgets Excel Expert System Pdf Screen Q-Q Plots
Natural Language Understanding (NLU) Delphi Asset Allocation
Optimal stock
See clustering Access Chat bot Word Robot Dendograms Markowitz-MPT (Modern Portfolio
Natural Language Creativity (NLC) advisor Reinforcement Learning (RL) Response Analysis ☺
Plots Theory) ☺
Autodriving Decision support Q-learning (off policy, model free)
Natural Language Speculation (NLS) D3.js RSS Automobile ROC/AUC Optimization algorithms Kelly Criterion
Spearman R. system system
Latent Dirichlet allocation (LDA, SLDA)
SARSA (on policy, not greedy) No Objective Dynamic Programming Sharpe Ratio
Kentall Tau Fraud detector AmCharts GUI XML Companion Perceptual Maps DQN (deep Q network, neural net) Achieved?
Sentiment Analysis DPPG (deep deterministic policy gradient) Stochastic Knapsack Fama and French (3 factor model)
Pearson Correlation
Resume sorter Highcharts Wikis Portals Design methods Value Curves NAF (normalized advantage function) Linear Programming Black-Litterman
etc. Custom Coding
Spam filters GeoGebra Talking heads APP Human Factors Spider Diagrams A3C (model free; Asynchronous Advantage Non-linear Programming Other (multi-factor, event-based)
Actor-Critic) Integer Linear Programming
etc. Custom etc. JPG Ergonomics 2 x 2 matrix
Yes
HRL (hierarchical reinforcement learning)
Geometric Programming
Audio/Visio Anthropometrics Plots/scatter/bar HIRO (Google brain, HRL, DDPG)
Feudal (HRL, Feudal RL)
etc.
© Philip M. Parker, PhD., INSEAD (Chair Professor of Management Science). 2019-2020.

You might also like