You are on page 1of 53

BIG DATA

&
SOURCES OF DATA

Dr. Deepak R. Gupta


What is “BIG DATA”?
"Big data is high volume, high velocity, and/or high variety information assets that require new forms
of processing to enable enhanced decision making, insight discovery and process optimization.“-
Gartner, 2012

Big Data and its characteristics coined by Doug Laney of Gartner around 2001.
Activity 1: Data Scenarios
1
An ecommerce site gets thousands of transactions and millions of clicks (“events”) in a
day.

2
The HR team is sitting on last ten years of employee attrition data, trying to figure out
how that can be used.

3
A retail store gets an average of 5000 footfalls and 50% transactions in a day.

4
A hospital and diagnostic healthcare provider serves 10,000 patients in a month across
hundreds of disease categories.
Big Data @ Amazon
Data Explosion !!
1. Data is created constantly, and at an ever-increasing rate.

2. Mobile phones, social media, imaging technologies to determine a medical


diagnosis-all these and more create new data, and that must be stored somewhere
for some purpose.

3. Devices and sensors automatically generate diagnostic information that needs to be


stored and processed in real time.

4. Merely keeping up with this huge influx of data is difficult, but substantially more
challenging is analysing vast amounts of it, especially when it does not conform to
traditional notions of data structure, to identify meaningful patterns and extract
useful information.

5. These challenges of the data deluge present the opportunity to transform business,
government, science, and everyday life.
Changing Face of Data
• Big Data is the exponential growth & availability of data, both structured and unstructured, because of Internet and fast
growing technologies
Drivers of Big Data

1. Medical information, such as genomic sequencing and diagnostic imaging

2. Photos and video footage uploaded to the World Wide Web

3. Video surveillance, such as the thousands of video cameras spread across a city

4. Mobile devices, which provide geospatial location data of the users, as well as metadata about text
messages, phone calls, and application usage on smart phones

5. Smart devices, which provide sensor-based collection of information from smart electric grids, smart
buildings, and many other public and industry infrastructures

6. Non-traditional IT devices, including the use of radio-frequency identification (RFID) readers, GPS
navigation systems, and seismic processing
Examples of Data generated

• Social media and genetic sequencing are among the fastest-growing sources of
Big Data and examples of untraditional sources of data being used for analysis
• In 2012 Facebook users posted 700 status updates per second worldwide, which
can be leveraged to deduce latent interests or political views of users and show
relevant ads.
• Facebook can also construct social graphs to analyze which users are connected
to each other as an interconnected network.
• Big data can be applied to real-time fraud detection, complex competitive
analysis, call centre optimization, consumer sentiment analysis, intelligent traffic
management, and to manage smart power grids, to name only a few applications
• Genetic sequencing and human genome mapping provide a detailed
understanding of genetic makeup and lineage.
• The health care industry can predict which illnesses a person is likely to get in
his lifetime and take steps to avoid these maladies or reduce their impact through
the use of personalized medicine and treatment.
• Pharmaceutical companies can use this data for different medications and
pharmaceutical drugs, heightening risk awareness of specific drug treatments
Different Sources of Data
The “BIG” Dilemma!

1. 80% of the world’s information is unstructured

2. Unstructured information is growing at 16 times the rate of structured information

3. Raw computational power is growing at such a pace that today’s off-the-shelf


commodity box possesses what a supercomputer could do half a decade ago

“We are drowning in information, but starved for knowledge.”


Tom Peters, Thriving on Chaos
The “BIG” Evolution !
Characteristics of Big Data – The Seven Vs

• The seven Vs used in Big Data are shown in following figure:

Volume

Velocity

Variety

Variability

Veracity

Visualisation

Value
Characteristics of Big Data – The Seven Vs

Volume

• Volume is how much data we have – what used to be measured in Gigabytes is now measured in
Zettabytes (ZB) or even Yottabytes (YB). The IoT (Internet of Things) is creating exponential
growth in data. The volume of data is projected to change significantly in the coming years.

Velocity

• Velocity is the speed in which data is process and becomes accessible. I remember the days of
nightly batches, now if it’s not real-time it’s usually not fast enough.

Variety

• Variety describes one of the biggest challenges of big data. It can be unstructured and it can include
so many different types of data from XML to video to SMS. Organizing the data in a meaningful
way is no simple task, especially when the data itself changes rapidly.

Dr. Deepak R. Gupta


Characteristics of Big Data – The Seven Vs
Variability

Variability is different from variety. A coffee shop may offer 6 different blends of coffee, but if you get the same blend
every day and it tastes different every day, that is variability. The same is true of data, if the meaning is constantly
changing it can have a huge impact on your data homogenization.

Veracity

Veracity is all about making sure the data is accurate, which requires processes to keep the bad data from accumulating
in your systems. The simplest example is contacts that enter your marketing automation system with false names and
inaccurate contact information. How many times have you seen Mickey Mouse in your database? It’s the classic
“garbage in, garbage out” challenge.

Visualization

Visualization is critical in today’s world. Using charts and graphs to visualize large amounts of complex data is much
more effective in conveying meaning than spreadsheets and reports chock-full of numbers and formulas.

Value

Value is the end game. After addressing volume, velocity, variety, variability, veracity, and visualization – which takes a
lot of time, effort and resources – you want to be sure your organization is getting value from the data.

Dr. Deepak R. Gupta


Data – The New Oil?

The 3 traits:
• Dynamic The 3 Types:
• Interpretable • Descriptive
• Ready for action • Predictive
• Prescriptive

INTELLIGENCE ANALYTICS

DATA The 3 Dimensions of Data


• Real time- Historical
• Structured- Unstructured
• Internal- External
• Combine all of this into a “single
version” of truth
Structured v/s Unstructured Data

• Anything that has a well-defined arrangement, easy to understand structure and


comprehensible hierarchy is considered a structurally sound entity. Anything which doesn’t
have the above-mentioned attributes is considered unorganised and structurally weak entity.

• The joys of having a structurally sound data are many like they can be seamlessly added in
a relational database and are easily searchable by simplest of search engine operations or
even algorithms; whereas, the unstructured data is a nightmare for the designers to connect
the random strands of data with the existing meaningful ones and present it as a structure.

• Structural data is closer to machine language than the unstructured data.


Structured v/s Unstructured Data

Semi-structured Data
Product_Id Product _Name Product _Price

1 Pen INR 5
2 Paper INR 10

Chapter 1 : Business Transformation with Big Data


Structured v/s Unstructured Data

• Well-defined arrangement, easy to understand structure and comprehensible hierarchy is considered a structurally
sound entity.
• Seamlessly added in a relational database and are easily searchable by simplest of search engine operations or even
algorithms; whereas, the unstructured data is a nightmare for the designers to connect the random strands of data with
the existing meaningful ones and present it as a structure.
• Structural data is closer to machine language than the unstructured data.
Unstructured Data

Unstructured data generally has no organizing structure, and Big Data technologies use
different ways to add structure to this data. Typical example of unstructured data is, a
heterogeneous data source containing a combination of simple text files, images, videos etc
The Power of Big Data

• Big Data can bring “big values” to our life in almost every aspects.

• Technologically, Big Data is bringing about changes in our lives because it allows diverse and
heterogeneous data to be fully integrated and analyzed to help us make decisions.

• Today, with the Big Data technology, thousands of data from seemingly unrelated areas can help support
important decisions.

• This is the power of Big Data. Areas of Applications

• Health and Well being

• Policy making and public opinions

• Smart cities and more efficient society

Links : https://www.youtube.com/watch?v=-Gj93L2Qa6c
Big Data Eco-System
Who Uses Big Data ?

1 2
• Banking • Government

3 4
• Education • Healthcare

5 6
• Manufacturing • Retail
SOURCES OF DATA

Dr. Deepak R. Gupta


SOURCES
OF DATA

SECONDARY
DATA

PRIMARY
DATA

Dr. Deepak R. Gupta


SECONDARY DATA
• It means research activity conducted with the help of established
information (data) by different agencies and also by using the
information available from the internal sources of the company.
• The information published in trade journals, commercial press and
data internally generated by the company are used for the
Secondary research.
• Such research is usually conducted within the marketing research
department of the company by the research staff appointed.
• companies generally do not depend fully on Secondary research.
• They prefer to supplement Secondary research with Primary
investigation.

Dr. Deepak R. Gupta


SECONDARY DATA

MERITS DEMERITS
• Easy and quick • May not be exactly as per
needs
• Economical
• Needs modification
• Reliable data available
• Testing required
• Absence of • Too much dependence
interviewee’s bias undesirable
• Convenience • Secondary method
• Suitable to small firms • Lacks practical-orientation

Dr. Deepak R. Gupta


SECONDARY DATA
TYPES OF
SECONDARY
DATA

INTERNAL EXTERNAL
SOURCE SOURCE

Dr. Deepak R. Gupta


INTERNAL SOURCES
• Periodical statements,
• Reports and statistical data.
• Past research reports
• Files, documents and correspondence of the company are also useful for
reference purpose.
• Sales orders and Sales Report of different areas are useful for marketing
research.
• Customer's complaints
• Salesmen’s reports are useful for securing information about market
situation.

Dr. Deepak R. Gupta


EXTERNAL SOURCES
• Trade Journals
• "Business Today", "Business India" and Even business newspapers ie Economic Times, Mint, Business
Standard
•  Directories
• Chambers of Commerce and trade associations
• Subscription Services / Syndicated Services
• Auto-journals, PC World
• Publications of Trade Associations and Chambers of Commerce
• Publications of Management and Economic Consultants
• Publications of Banks and Financial Institutions
• Publication from RBI, Committee appointed by Ministry
• Company Reports
• Annual Reports
• Specialised Libraries
• Government Publications and Publications of International Organisations
• IMF, WTO, WHO,
• Census Data, National Sample Survey, Population Statistics (Demographics)

Dr. Deepak R. Gupta


INTERNAL VS EXTERNAL SOURCES
INTERNAL SOURCES EXTERNAL SOURCES
Meaning Internal sources of data collection External sources of data collection
means the use of data published by
means data collected from the external agencies.
documents available with the company

Use of Information available from internal Information available from external


information sources can be used directly for research sources cannot be used directly as it is.
Modifications as per the nature of research
purpose. Modifications are not required work are required.

Examples Purchase records, sales records, Trade journals, annual reports of


companies, surveys conducted
periodical sales reports and annual census reports etc. are examples by press,
of
reports are the examples of internal external sources of data collection.
sources of data collection

Coverage Limited coverage as they relate to Wide coverage as they are varied in
company only. character
Reliability Internal sources are more reliable as External sources may not supply accurate
they supply accurate data. Verification data. Naturally, a verification of data
of data is not required before actual use is necessary

Availability Internal sources are easily available and External sources are not easily available
that too without any extra cost. Money is required to be spent on them.

Dr. Deepak R. Gupta


PRIMARY SOURCE
• Primary investigation means collecting first hand information by actually
visiting markets or meeting consumers and dealers who are directly
connected with the marketing activities.
• Data collected for the first time through Primary survey are called
primary data.
• Here, the data are collected through suitable questionnaire and
interviewing a limited number of people (a sample) selected from a/large
group.
• Customers, traders and suppliers are the major sources supplying
primary data.
• The primary data collected are superior to secondary data. Researchers
turn to the primary data in order to overcome the limitations of
secondary data which include incompatibility, obsolescence and bias.

Dr. Deepak R. Gupta


TYPES OF PRIMARY DATA
• INTERVIEW
• OBSERVATION
• EXPERIMENTATION

Dr. Deepak R. Gupta


INTERVIEW TYPES
• MAIL SURVEYS
• TELEPHONE SURVEYS
• PERSONAL INTERVIEW
• PANEL RESEARCH

Dr. Deepak R. Gupta


MAIL SURVEY - MERITS
• Economical • Investigators not required
• Wide coverage • Simple and Direct Method
• Speed in data collection • Centralized Control
• Avoids interviewee’s bias • Convenient to Medium /
• Convenience to respondents Small Companies
• More information available • Views of Family members
available

Dr. Deepak R. Gupta


MAIL SURVEY - DEMERITS
• Problem of “No Replies” • Effects of ambiguous
• Updated mailing list question
required • Changes in question not
• Poor response, if possible
questionnaire is defective • Not suitable when quick
• Lacks accuracy of information is required
Information • Non – verbal responses are
• Limited use not noted

Dr. Deepak R. Gupta


TELEPHONE SURVEY - MERITS
• Economical • Frank Response
• Quick response • Orderliness
• High speed • Sample selection easy
• Information available from • Secrecy of respondents
VIP’s
• Simplicity

Dr. Deepak R. Gupta


TELEPHONE SURVEY -
DEMERITS
• Brief questionnaire • Non availability of proper
required sample
• Limited / Brief information • Limited coverage of sample
available • Interviewer’s Bias
• Difficult to contact large no • Difficulty in checking
of respondents validity of Information
• Non verbal response are not
available

Dr. Deepak R. Gupta


PERSONAL INTERVIEWS -
MERITS
• Flexibility • Better quality response
• Better co-operation from • Personal questions can be
respondents asked
• Benefit of longer duration • Products can be shown
Interview before recording response
• Availability of reliable • Suitable when information
information from limited respondents is
• Availability of detailed to be collected
information • Non verbal responses are
• Improves quality of observed
research work
Dr. Deepak R. Gupta
PERSONAL INTERVIEW –
DEMERITS
• Costly • Problem of personal bias
• Time consuming • Possibility of rush
• Information supplied may interviews
not be accurate • Respondents from cross
• Long term planning section of the society may
required not be available
• Effective supervision on • Information supplied may
interviewers required not be recorded properly

Dr. Deepak R. Gupta


PERSONAL INTERVIEW - TYPES
• INDIVIDUAL INTERVIEW
• Structured Interview
• Semi structured Interview
• Unstructured Interview
• Depth Interview
• GROUP INTERVIEW
• Focus Group Interview

Dr. Deepak R. Gupta


DEPTH INTERVIEW
• Useful for finding • Costly
• Consumer Motivation • Time Consuming
• Attitudes
• Feelings and Desires • Findings cannot ne
quantified
• Opportunity to express
oneself freely • Shortage of Trained
Interviewers
• Deeper insight to problem
• Data tabulation is a problem
• Detailed Investigation

Dr. Deepak R. Gupta


FOCUS GROUP

Dr. Deepak R. Gupta


FOCUS GROUP INTERVIEW
• Saves Time • Less vocal participant don’t
• Saves Money contribute
• Different views on each • May not represent the
topic is available actual population
• Every member gets an • Totally depends upon group
opportunity moderator
• Generation of new Ideas • Unrelated topics are
discussed more often
• Extensively used method
• One sided discussion often
takes place

Dr. Deepak R. Gupta


PANEL RESEARCH
• In Panel Research the same sample is used again and again.
• A Panel may be:
• Individual
• Consumer
• Housewives
• Firms
• CONSUMER / OMNIBUS PANEL (FIXED SAMPLE)
• “ Panels consists of persons, households or business firms who report
their purchasing activities at periodic intervals over time and who are
typically selected based on a combination of their willingness and
representativeness”
• - Ronal M. Weiers

Dr. Deepak R. Gupta


CONSUMER PANEL - DIARY

Dr. Deepak R. Gupta


Audience
Panel

Dr. Deepak R. Gupta


PANELS - MERITS
• Supply useful information • Facilitates introduction of
• Longer interviews are remedial measures
possible • Continuous supply of
• Reliable data available information
• Economical method • Real motives are visible
• Positive response from • Facilitate product testing
panel members

Dr. Deepak R. Gupta


PANELS - DEMERITS
• Biased outlook of panel • Panel members behave like
members experts
• Limited co-operation from • Costly
panel members • Panel members drop
• Absence of representative gradually
character

Dr. Deepak R. Gupta


OBSERVATION METHOD
“accurate watching and noting of phenomena as they occur in
nature with regard to cause and effect or mutual relation”

“an act of recognizing and noting acts / occurrences.”

• Observation involves recording of events / actions as they


take place in the environment.
• A consumer may be observed while purchasing soap or
toothpaste at a retail shop.

Dr. Deepak R. Gupta


OVERT AND COVERT OBSERVATION
• Observations can be overt (everyone knows they are being
observed) or covert (no one knows they are being observed
and the observer is concealed).
• The benefit of covert observation is that people are more
likely to behave naturally if they do not know they are being
observed.
• However, you will typically need to conduct overt
observations because of ethical problems related to
concealing your observation.  

Dr. Deepak R. Gupta


OBSERVATION METHOD - TYPES
• Simple Direct observation
• Indirect observation
• Beer Bottles
• Structured observation
• Hotel (Single Visitor Vs Family)
• Unstructured observation
• Purchasing of Soap in Retail Stores
• Mechanical observation
• Manual observation
• Disguised observation
• Mystery Shoppers
• Undisguised observation
Dr. Deepak R. Gupta
OBSERVATION METHODS –
MERITS / DEMERITS
• Accuracy • Certain elements are missed
• Factual Information • Human errors possible
available • Purpose is defeated, if
• Records events as they secrecy is not maintained
occur • Costly method
• Economical • Observer’s bias
• Objective data available • Limited application
• Effective method • Need support of personal
Interview

Dr. Deepak R. Gupta


Thank You !!

Dr. Deepak R. Gupta

You might also like