You are on page 1of 5

Definitions:

AI = ability of a machine to perform cognitive functions we associate with human minds

Machine learning = tool that can be used to enhance humans’ abilities to solve problems and
make informed inferences on a wide range of problems

Statistics = science concerned with developing and studying methods for collecting, analyzing,
interpreting and presenting empirical data

Overfitting = creating a regression model that fits too closely to a specific set of data. It works
too well for the present data but terrible for predictions.

Probability = mathematical language used to discuss uncertain events

AI = ability of a machine to perform cognitive functions we associate with human minds, such as
perceiving, reasoning, learning, and problem solving. Examples of technologies that enable AI
to solve business problems are robotics and autonomous vehicles, computer vision, language,
virtual agents, and machine learning.

Machine Learning algorithms detect patterns and learn how to make predictions and
recommendations by processing data and experiences, rather than by receiving explicit
programming instruction. The algorithms also adapt in response to new data and experiences to
improve efficacy over time

Artificial Intelligence is used when we have massive amounts of data

Before you had


● Structured Data
● Generated by companies
● Regular updating
● n entries
Now you have
Unstructured Data
● Generated by users
● Real time updates
● n x n x… entries

The four V’s of Big Data are


- Volume (Data at rest)
- Velocity (Data in Motion)
- Variety (Data in many forms)
- Veracity (Data in Doubt)
Since digital data is diverse it’s difficult for most traditional technologies to capture paths,
storage & analysis

Data governance is a collection of processes, roles, policies, standards, and metrics that
ensure the effective and efficient use of information in enabling an organization to achieve its
goals.

Data must meet standards for quality and integrity

Why do you have to have Data Governance?

Drive precise search and retrieval


● Ensure business continuity and Derive maximum value
● Increase consistency and confidence in decision making
● Minimize time and effort to resolve data issues by joint understanding between users
and IT
Protect data as key corporate asset
● Reduce risk of duplication Satisfy regulatory requirements
● Reduce risks
Data Governance has different levels of maturity:
1) Initial
● No clear data ownership
● No tools nor documentation for use
● Disagreements between business users and IT due to non common language or
definitions
● Reactivity to data issues
2) Repeatable
● No data ownership or stewardship nor governance (only self appointed business
unit)
● Data owners commissioned for specific projects
● Standards defined for projects
● Limited collaboration of shared data, policies… outside business unit
3) Defined
● Data ownership, stewardship and governance defined but not fully across
business
● Limited collaboration throughout the organization
● Shared repository of data models and documents
● Data quality measures and metrics are developed
● Data issues addressed by IT unit
4) Managed
● Implementation of data ownership, stewardship and governance
● Collaboration in place
● Support of executive level
● Governance process regularly reviewed
● Models, data documents and data quality reviewed and approved
5) Optimized
● Data ownership, stewardship & governance enforced and improved continuously
● Issues about data ownership, control, access and possession (OCAP) are
resolved
● Data governance process employed

Artificial Intelligence is used in:


- Recognizing patterns: Speech Recognition, facial identity, etc
- Recommender Systems:
- Noisy data, commercial pay-off (e.g., Amazon, Netflix)
- Information retrieval: Find documents or images with similar content
- Computer vision: detection, segmentation, depth estimation, optical flow, etc.
- Robotics: perception, planning, autonomous driving etc
- Learning to play games
- Internet of Things

Statistics is the science concerned with developing and studying methods for collecting,
analyzing, interpreting and presenting empirical data
Two fundamental ideas in the field of statistics are uncertainty and variation
Probability is a mathematical language used to discuss uncertain events
Statisticians attempt to understand and control (where possible) the sources of variation in any
situation

Machine Learning is used when:


● Human expertise does not exist
● Humans are unable to explain their expertise
● Complex solutions change in time

Machine learning is a tool that can be used to enhance humans’ abilities to solve
problems and make informed inferences on a wide range of problems
Machine Learning is also a category of algorithms that allows software applications to become
more accurate in predicting outcomes without being explicitly programmed

The processes involved in machine learning require searching through data to look for patterns
and adjusting program actions accordingly

Machines that learn are useful to humans because, with all of their processing power, they are
able to more quickly highlight or find patterns in big data that would have otherwise been
missed by human beings

Machine Learning can be:


1) Descriptive - what happened?
2) Predictive - what will happen?
3) Prescriptive - What to do to achieve goals?

When using machine learning we may be overfitting, that is, believing in random correlations
(that do not entail causation). The solution to overfitting is Cross-validation or Backtesting to
assess stability:
● Divide data into Training & Testing subsamples
● Fit model using Training data – Assess using Testing data

Machine Learning can either be

SUPERVISED
- Data scientists determine which variables, or features, the model should analyze and
use to develop prediction
- Once training is complete, the algorithm will apply what was learned to new data

Or UNSUPERVISED

Do not need to be trained with desired outcome data


- Use an iterative approach called deep learning to review data and arrive at conclusion
- Used for more complex processing tasks (image recognition, speech-to-text and natural
language generation)
- Work by combing through millions of examples of training data and automatically
identifying often subtle correlations between many variables
- Once trained, the algorithm can use its bank of associations to interpret new data

The objectives of Machine Learning optimization are:


- Minimize loss of accuracy + cost of search/complexity
- Minimize bias + uncertainty/variability #mistakes + lack of replicability
- Minimize distance to the best response + errors we make looking for it
Predictors should be
- Expressive
- Easy to train
- Do not overfit

DEEP LEARNING

Deep learning is a type of machine learning that can process a wider range of data resources,
requires less data preprocessing by humans, and can often produce more accurate results than
traditional machine-learning approaches.
In deep learning, interconnected layers of software-based calculators known as “neurons” form
a neural network. The network can ingest vast amounts of input data and process them through
multiple layers that learn increasingly complex features of the data at each layer. The network
can then make a determination about the data, learn if its determination is correct, and use what
it has learned to make determinations about new data. For example, once it learns what an
object looks like, it can recognize the object in a new image.

Deep learning became prominent around 2012

Deep learning was allowed by


- Computing power (GPU Graphics Processor Units coming from games)
- Availability of databases of images (for training) e.g. Image-net at Stanford
- Specific architecture of deep learning networks

Deep learning networks are mainly of two types:+


1) Convolutional neural network
a) A multilayered neural network with a special architecture designed to extract
increasingly complex features of the data at each layer to determine the output
2) Recurrent neural network
a) A multilayered neural network that can store information in context nodes,
allowing it to learn data sequences and output a number or another sequence

Main Machine Learning Techniques

- Regression based: Lasso, Ridge, Elastic Net


- involves identifying a correlation -- generally between two variables -- and
using that correlation to make predictions about future data points
- Clustering: nearest neighbors, k-means
- groups a specified number of data points into a specific number of groupings
based on like characteristics
- Decision Trees and Forests
- use observations about certain actions and identify an optimal path for arriving at
a desired outcome (both in regression & classification)
- Support Vector Machine
- Is a classifier that uses the concept of a separating hyperplane, with a few
misclassified items (support vectors). The “machine” version uses a kernel
(nonlinear function) to extend to nonlinear separation.
- Reinforcement Learning
- involves models iterating over many attempts to complete a process. Steps that
produce favorable outcomes are rewarded and steps that produce undesired
outcomes are penalized until the algorithm learns the optimal process
- Neural Networks and Deep Learning
- Utilize large amounts of training data to identify correlations between many
variables to learn to process incoming data in the future. Deep learning proceeds
to decomposition into unitary decisions.

To pass the Turing Test a computer should have


- Natural language processing
- Knowledge representation
- Automated reasoning
- Machine Learning

If we also consider the physical world it should also have


- Computer vision
- Robotics (to manipulate objects)

You might also like