Professional Documents
Culture Documents
1
BIG DATA ANALYTICS (2017 REGULATION)
Analytics:
Analytics is the process of breaking the problem into simpler parts and using inferences based on data to
take decisions. Analytics is not a tool or technology, rather it is a way of thinking and acting.
Analytics is used in all sorts of industries like Finance Analytics, Healthcare Analytics, Retail analytics,
Telecom Analytics, Web Analytics.
Analytics Lifecycle:
let us consider the following stages of an Analytics project lifecycle.
1. Problem Identification : A problem is a situation that is judged as something that needs to
be corrected.
2. Hypothesis Formulation : Break down problems and formulate hypotheses.
3. Data Collection
4. Data Exploration : Before a formal data analysis can be conducted, the analyst must know
how many cases are in the dataset, what variables are included etc.
5. Data Preparation/ Manipulation : Data comes to you in a form that is not easy to analyze. We need to
clean data and check it for consistency, extensive manipulation of the data is needed in order to analyze.
6. Model planning / Building : This is really the entire process of building the solution and implementing
the solution.
7. Validate Model
8. Evaluate/Monitor results
BIG DATA ANALYTICS (2017 REGULATION)
Machine Learning:
Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use
to perform a specific task without using explicit instructions. It is seen as a subset of artificial intelligence.
(The primary aim is to allow the computers learn automatically without human intervention or assistance)
Types of learning algorithms:
1. Supervised learning : Supervised Learning Algorithms are the ones that involve direct supervision of the
operation. In this case, the developer labels sample data corpus and set strict boundaries upon which the
algorithm operates.
The most common fields of use for supervised learning are price prediction and trend forecasting in
sales, retail commerce, and stock trading. In both cases, an algorithm uses incoming data to assess the
possibility and calculate possible outcomes.
Supervised machine learning includes two major processes:
Classification: The process where incoming data is labeled based on past data samples and manually trains the
algorithm to recognize certain types of objects and categorize them accordingly.
Regression: The process of identifying patterns and calculating the predictions of continuous outcomes. The
system has to understand the numbers, their values, grouping etc.
List of Common Algorithms:
Nearest Neighbor
Naive Bayes
Decision Trees
Linear Regression
Neural Networks
BIG DATA ANALYTICS (2017 REGULATION)
2. Unsupervised Learning : Is the one that does not involve direct control.
Supervised machine learning is that you know the results and need to sort out the data, then in case of
unsupervised machine learning algorithms the desired results are unknown and yet to be defined.
The unsupervised machine learning algorithm is used for exploring the structure of the
information; Extracting valuable insights; Detecting patterns; implementing this into its operation to
increase efficiency. (Digital Marketing etc.)
3. Semi-supervised Learning : typically a small amount of labeled data and a large amount of unlabeled
data.
4. Reinforcement Learning : Is a learning method that interacts with its environment by producing
actions and discovers errors or rewards.
3. Semi-supervised Learning : typically a small amount of labeled data and a large amount of unlabeled
data.
4. Reinforcement Learning : Is a learning method that interacts with its environment by producing
actions and discovers errors or rewards.