Professional Documents
Culture Documents
Elements of Analytics
Definition
Data analytics is the use of processes and technology to extract valuable insight out of datasets.
This insight is then applied in a number of ways depending on the business, its industry, and other
unique requirements.
This is important because it helps businesses become data-driven, meaning decisions are supported
through the use of data.
Data analytics is also helping businesses to predict problems before they occur and map out possible
solutions.
Use of Analytics
1. For Summarizing
2. For Exploring associations
3. For Exploring relationships
4. For Classifying / grouping
For Summarizing
•
Sources for forecasting in data science
Time Series Data: A type of data that is collected and recorded over time, such as sales data,
stock prices, or weather patterns. This data can be used to create statistical models that can be
used to make predictions about future values.
External Data: Data that is sourced from outside an organization, such as economic indicators,
demographic data, or industry-specific data. This data can provide additional context and
information that can be used to inform forecasting models.
Predictive Models: Mathematical models that use historical data to make predictions about
future events. These models can be based on a variety of techniques, such as regression
analysis, decision trees, or machine learning algorithms.
Machine Learning: A type of artificial intelligence that involves the use of algorithms to learn
patterns in data and make predictions. Machine learning models can be used for forecasting by
training the algorithms on historical data and using them to make predictions about future values.
Use of Data Analytics in Media and Entertainment
1.
2.
3.
4.
5.
Unleashing the Data Landscape – Types and Reading it Right
• Data that resides in a fixed field within The phrase “unstructured data” usually
a record or file is called structured data. refers to information that doesn’t reside
• Structured data has the advantage of
in a traditional row-column database.
being easily entered, stored, queried and Unstructured data files often include text
analyzed. and multimedia content.
Data and its types – how to classify
• Qualitative Data - represent some • Primary Data - collected for the first time by an
characteristics or attributes investigator for a specific purpose
• Quantitative Data – represented • Secondary Data - sourced from someplace that
numerically and calculations can be has originally collected it
performed on them
• Discrete Data- can take only certain specific values rather than a range of values
• Continuous Data– can take values between a certain range with the highest and lowest values
Different Data Types
• Tracking patterns
• Classification
• Association
• Outlier detection
• Clustering
• Regression
• Prediction
Some examples of data analysis tools & techniques
Frequency
Cross-tabulations Mean, Median,
Counts (Number Classification Association
Mode, SD
& Percentages)
Anomaly
Correlation Cluster Analysis Factor Analysis Regression
Detection
Data Variables
Scale Categorical
(Measurements/ Numerical/count ) (appear as categories)
Continuous:
Discrete: Ordinal: Nominal:
Measurements
Counts/ integers obvious order no meaningful order
takes any value
Variables in Analytics
Variables are any characteristics that can take on different values, such
as height, age, species, or exam score.
we often want to study the effect of one variable on another one. For
example, you might want to test whether students who spend more time
studying get better exam scores.
Variables in Analytics
The variables in a study of a cause-and-effect relationship are called the independent and
dependent variables.
The independent variable is the cause. Its value is independent of other variables in your
study.
The dependent variable is the effect. Its value depends on changes in the independent
variable.
Another definition:
An independent variable is a variable that represents a quantity that is being manipulated
in an experiment
Examples:
Measurement of Frequency using Graphs
When Represented using graphs
Examples:
Central Tendency
A central tendency or average is a single value which represents the whole set of figures and
all other individual items concentrate around it.
An average is somewhere within the range of the data, it is therefore called measure of
central tendency.
Measure of Central Tendency
A measure of central tendency is a summary statistic that represents the center point or
typical value of a dataset.
These measures indicate where most values in a distribution fall and are also referred
to as the central location of a distribution.
Mean
Median
Mode
Measure of Central Tendency
The mean is the average of a data set.
Sum of observation divided by number of observations
• Data Collection
• Data Cleaning
• Data Exploration (using Pivot Table)
Using MS Excel
• Data Visualization
• CONDITIONAL FORMATTING
• CONCATENATE
• TRIM
MS Excel - • TRANSPOSE
• SORT
techniques • FILTER
• TEXT TO COLUMN
• REMOVE DUPLICATE
• LEN
• LEFT-RIGHT-MID
• UPPER, LOWER & PROPER
• PIVOT TABLE
• SUBTOTAL
• VLOOK UP
• DATA ANLYSIS TOOLPACK
• MACROS
STEPS
• Identify errors and inconsistencies
• Remove duplicate values
• Remove spaces
• Fill in missing values
FORMULAS
• REMOVE DUPLICATES
• TRIM
• TEXT TO COLUMN
• CONCATENATE
• LEFT, RIGHT & MID
• UPPER, LOWER & PROPER
Function & Formulas for Data Analysis in Excel
❑ Conditional Formatting: Conditional formatting in Excel enables you to highlight cells with a certain color,
depending on the cell's value.
❑ Sort: You can sort your Excel data on one column or multiple columns. You can sort in ascending or
descending order.
❑ Filter: Filter your Excel data if you only want to display records that meet certain criteria.
❑ Subtotal - The SUBTOTAL function is a built-in Excel function that is used to calculate subtotals within a
range of data The function can perform a variety of calculations, including sum, count, average, minimum,
and maximum.
❑ Pivot Tables: Pivot tables are one of Excel's most powerful features. A pivot table allows you to extract
the significance from a large, detailed data set.
❑ Analysis ToolPak: The Analysis ToolPak is an Excel add-in program that provides data analysis tools for
financial, statistical and engineering data analysis.
Conditional Formatting
Options
• Highlight Rules
• Top / Bottom Rules
• Data Bars
• Color Scales
• Icon Sets
• Find Duplicates
• Mange Rules
Sort
• Basic Sort
• A to Z
• Z to A
• Sort by Color
• Reverse List
• Custom Sort
• Levels
• Top to Bottom
• Left to right
Filter
• Filters
• Basic Filters
• Advanced Filter
• Options
• Number Filter
• Text Filter
• Date Filter
• Filter by Color
PivotTables allow you to easily organize, filter, summarize, and analyze raw data
• Create Pivot Table
Pivot Tables – Select Data Columns
– Go to Insert Tab
– Create Pivot Table
– Select a table or range
– Choose destination
Pivot
Tables
Options
Pivot Tables & Slicers
The Data Analysis Toolpak is an add-in feature for
Data Analysis tool Microsoft Excel that provides advanced data analysis
functions, tools, and techniques. It is designed to help
pack in Excel users perform complex statistical and engineering
analyses in Excel quickly and easily
Data Analysis tool pack techniques
Descriptive Statistics: This tool calculates basic statistical measures such as mean, median, mode, standard deviation,
and variance for a data set.
Histogram: This tool creates a frequency distribution and a histogram chart for a set of data.
Regression: This tool performs linear regression analysis to fit a line to a set of data and determine the relationship
between two variables.
Anova: This tool performs analysis of variance to determine whether there are statistically significant differences
between multiple groups.
Correlation: This tool calculates the correlation coefficient between two variables to determine the strength of their
relationship.
Sampling: This tool creates a sample from a population to estimate the characteristics of the population.
Moving Average: This tool calculates the moving average of a data set, which is the average of a subset of data points
over a period of time.
t-Test: This tool performs a t-test to compare the means of two samples and determine whether they are significantly
different.
Resources
• http://www.chandoo.org
• http://www.excelexposure.com
• http://www.youtube.com
• http://www.udemy.com
• http://www.elearnexcel.com/