You are on page 1of 17

Data Analytics – Intro to Key

Concepts

Dr. K. Ganesan
Chairperson – VITBS & Senior Professor in School
of Information Technology and Engineering
Vellore Institute of Technology, Vellore – 632 014
kganesan@vit.ac.in
Phone : 6382203768
What is Data?
• Data is a collection of facts, such as numbers, words,
measurements, observations or just descriptions of
things, reference or analysis.
• E.g: Tanjore, Raja Raja Chola, Big Temple, 1010 AD
• Information : Linking data to objects / person as a fact.
• E.g: Raja Raja Chola built Big Temple at Tanjore in 1010
AD.
• Knowledge : Extraction / summarization / Interpretation
• E.g: When we say a story at the end we give the moral
of the story which may not be present verbaticaly in the
story.
Types of Data
• Data can be in various forms:
• Structured: 21MBA0102, stored in database and can be
accessed via SQL (RDBMS)
• Unstructured: emails, whatsapp, social media, … accessed
via NoSQL (MongoDB)
• Text : English, Multilingual
• Hypertext : Used in Web (connecting web pages)
• Image and Graphics – Pictorial representation
• Audio : Speech and Voice
• Video : Stored and Live
• Animation
• Sensor data : Analog form (can be converted to digital)
Amount of data generated every day
• 500 million tweets are sent
• 294 billion emails are sent
• 4 petabytes of data are
created on Facebook
• 4 terabytes of data are
created from each connected
car
• 65 billion messages are sent
on WhatsApp
• 5 billion searches are made
• By 2025, it’s estimated that
463 exabytes of data will be
created each day globally –
that’s the equivalent of
212,765,957 DVDs per day!
What is Data Analytics?
• Data analytics refers to the process and practice of analyzing data to
answer questions, extract insights, and identify trends.
• This is done using an array of tools, techniques, and frameworks
depending on the type of analysis being conducted. (Ref : Harvard
Business School Online)
• The four major types of analytics include:
• Descriptive analytics, which looks at data to examine, understand,
and describe something that’s already happened.
• Diagnostic analytics, which goes deeper than descriptive analytics by
seeking to understand the why behind what happened.
• Predictive analytics, which relies on historical data, past trends, and
assumptions to answer questions about what will happen in the future.
• Prescriptive analytics, which aims to identify specific actions that one
should take to reach future targets or goals.
Business Analytics
• Applying data analytics tools and methodologies in a business setting
is typically referred to as business analytics.
• Budgeting and forecasting: By assessing a company’s historical
revenue, sales, and costs data and its goals for future growth, an
analyst can identify the relevant budget and investments.
• Risk management: By understanding certain business risks occurring
—and their associated costs—an analyst can make cost-effective
recommendations to help mitigate them.
• Marketing and sales: By understanding key metrics, such as lead-to-
customer conversion rate, a marketing analyst can identify the
number of leads one must generate to fill the sales pipeline.
• Product development (or research and development): By
understanding how customers have reacted to product features in
the past, an analyst can help an organization in product
development, design, and user experience in the future.
What is a Data Science?
• Data analytics is mainly focused on understanding datasets and
gleaning insights whereas data science is centered on building,
cleaning, and organizing datasets.
• Data scientists create and leverage algorithms, statistical models,
and their own custom analyses to collect and shape raw data into
something that can be easily understood.
• Data Scientists are performing some key functions:
• Data wrangling: We clean and organize data to be readily used.
• Statistical modeling: We run data through different models—such as
regression, classification, and clustering models to identify
relationships between variables and gain insight from the numbers.
• Programming: The process of writing computer programs and
algorithms in a variety of languages—such as R, Python, and SQL to
analyze large datasets efficiently than through manual analysis.
Skills needed for a Data Analyst
• Data Analysis can be used in any branch of Business
– Human Resource
– Finance
– Operations
– Sales / Marketing
• Domain Knowledge
– Go to the field and talk to customers, find their pains
• Analytical Thinking
– Problem solving using Tools
– Critical Thinking
• Applying Right Tool for the given problem (many times Excel will do)
• Communication
– Power point presentation is more important
Ways of Looking at Data
• Time Series and cross-sectional data
• Time series record developments over time; say,
monthly ice-cream output
• Cross sectional data captures a situation at a moment of
time; e.g value of sales at various branches on one day.
• Scales of Measurement
• Nominal or categorical data identify classifications only;
e.g, sex (male/female), departments
(HR/Finance/Marketing), sales regions (North, South,
East, West) – Here no quantities are implied – No maths
can be done on them.
• Ordinal or Ranked data
• Categories can be sorted into certain order (ascending /
descending), but differences between ranks are not
necessarily equal. e.g. customer feedback (bad /
average / good / excellent) – No Maths can be done
• Interval scale data
• Measurable differences are identified, but the zero
point is arbitrary. e.g, Is 20 deg Celsius twice as hot as
10 deg Celsius? Convert to Fahrenheit to see that it is
not. The equivalents are 68 deg and 50 deg Fahrenheit.
Temperature is measured on an interval scale with
arbitrary zero points (0 deg Celsius and 32 deg
Fahrenheit) – We cannot divide and do comparisons.
• Ratio scale data
• There is a true zero and measurements can be compared as
ratios. e.g, if Ram, Gopi, and Bala achieve a sales target of
Rs. 250K, 500K and 1000K in a given month, then we can
claim that the achievement of Bala is twice that of Gopi and
4 times that of Ram – We can divide and do the Maths.
• Continuity
• Certain results are presented in one type of data only. e.g
assign 0.4 of a salesman for promoting a product in a region.
Here only whole numbers are permitted.
• Discrete values are counted in whole numbers (integers);
e.g, the number of satisfied customers
• Continuous variables do not increase in steps.
Measurements such as heights and weights are continuous –
Real (decimal) numbers are permitted.
• Fractions, percentages and proportions
• Monetary systems are based on 100 subdivisions. e.g, 100 paise
is equal to 1 Rupee; 100 cents to the dollar.
• Amounts less than one big unit are fractions.
• 50 paise = 0.5 rupee or 50% of one Rupee.
• Proportions and percentage are all same thing with different
names.
• ¾ means divide 3 by 4 = 0.75 (proportion). If we multiply it by
100 we get percentage.
• Percentage increases and decreases
• A percentage increase followed by the same percentage decrease
does not give us back where we started. e.g, do not accept a 50%
increase in salary for six months, to be followed by a 50% cut.
• e.g Rs. 1000 increased by 50% is Rs. 1500.
• 50% of decrease then on Rs.1500 is Rs. 750
• A frequent business problem is finding what a number
was before it was increased by a given percentage.
• Simply divide by (1+i), where is the percentage
increase expressed as a proportion.
• E.g, If an invoice is for Rs. 575 including 15% GST, the
tax exclusive amount is 575/(1.15) = Rs. 500.
• Fractions
• If anything is increased by an amount x/y, the
increment is x/(x+y) of the new total.
• E.g, if Rs. 100 is increased by ½, the increment of Rs.
50 is expressed as 1/(1+2) = 1/3 of the new total
• Rs. 100 increased by ¾ is Rs. 175; the Rs. 75 increment
is 3/(3+4) = 3/7 of the new total, namely, Rs. 175
• Scientific notations (expressing small and big numbers)
• Individuals use small amounts of cash; corporates talk of huge amount of
cash.
• E.g, Thousand, Million, Billion, Trillion, etc
• They are used to save time writing out large and small numbers.
• E.g, 1.25 x 106 is equal to 12,50,000;
• 1.25 x 10-6 is equal to 0.00000125
• Rounding
• Values ending in 4 or less are rounded down (1.24 becomes 1.2),
amounts ending in 5 or more are rounded up (1.25 becomes 1.3).
• Two time two equals four (Wrong with Rounding)
• 1.5 and 2.4 both round to 2
• 1.5 x 1.5 is 2.25, which rounds to 2
• 2.4 x 2.4 is 5.76, which rounds to 6
• 1.45 rounds to 1.5, which rounds a second time to 2 despite the original
being nearer to 1.
• Note : Rounding is to be done after multiplying or dividing
• Relationship between Proportions and Growth
• E.g, the finance director has received annual 10% pay
raises for the last 10 years. By how much ahs his
salary increased? Not 100%, but 160%.
• Think of proportionate increase. He earned 1.1 times
the amount in the year before.
• In year one, he received the base amount (1.0) times
1.1 = 1.1.
• In year two, total growth was 1.1 x 1.1 = 1.21.
• In year three, 1.21 x 1.1 = 1.331 and so on up to 2.358
x 1.1 = 2.594 in tenth year.
• Take away 1 and multiply by 100, we get 159.4%
increase (rounded as 160%)
• Powers – When the growth rate is always the same,
multiply the proportion by itself a number of time.
• Here, 1.1 was multiplied by itself 10 times ( In Maths
1.110)
• A 2.0% monthly price rise of a commodity is equivalent to
an annual rate of inflation of 26.8%, not 24%.
• If India’s GDP is 1.7% higher during Jan-March quarter than
during Oct-Dec quarter then it is equivalent to an annual
rate of increase of 7% (1.017 x 1.017 x 1.017 x 1.017)
• Brackets – When the order of operation is important, we
use brackets.
• 4 x (2 + 3 ) = 20 is different from (4 x 2) + 3 = 11.
• We may use more than one brackets [(4 x 2) + 3] x 6 = 66.
• Perform innermost ones first and then go to outer ones.
• In statistics, Roman letters are used for sample data (p =
proportion from a sample).
• Greek letters indicate population data (eg π).
• Logarithms
• Another name for a power or exponent.
• Ten raised to the power of 3 is 103 = 10 x 10 x 10 = 1000.
• 3 is the logarithm of 1000 to the base 10
• Logs are used for flattening out growth rates.
• In Maths, multiplication and division of large number
can be done using Logarithms.
• Used in outlier analysis and visualization of data whose
range (difference between max and min) is high.
• Converting a nonlinear data into a linear form.

You might also like