You are on page 1of 24

Data and Information

 Data are plain facts.


 The word "data" is plural for "datum."
 Data is nothing but facts and statistics stored or free flowing over a network, generally
 it's raw and unprocessed.
 When data are processed, organized, structured or presented in a given context so as to
 make them useful, they are called Information.
 It is not enough to have data (such as statistics on the economy).
 Data themselves are fairly useless, but when these data are interpreted and processed to
 determine its true meaning, they becomes useful and can be called Information.

For example: When you visit any website, they might store you IP address, that is data,

in return they might add a cookie in your browser, marking you that you visited the

website, that is data, your name, it's data, your age, it's data.

• What is Data?

 The quantities, characters, or symbols on which operations are performed by a computer, which may be
stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical
recording media.
 Data and Information in return they might add a cookie in your browser, marking you that you visited the
website, that is data, your name, it's data, your age, it's data.

Big data: What is Big Data?

Big Data is a collection of large datasets that cannot be processed using traditional computing techniques. For
example, the volume of data Facebook or Youtube need require it to collect and manage on a daily basis, can fall
under the category of Big Data. However, Big Data is not only about scale and volume, it also involves one or
more of the following aspects − Velocity, Variety, Volume, and Complexity.

The quantities, characters, or symbols on which operations are performed by a computer, which may be stored
and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording
media.

Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data that is huge in
volume and yet growing exponentially with time. In short such data is so large and complex that none of the
traditional data management tools are able to store it or process it efficiently. “Extremely large data sets that
may be analyzed computationally to reveal patterns , trends and association, especially relating to human
behavior and interaction are known as Big Data.”

BIG DATA:

1. Data is defined as the quantities, characters, or symbols on which operations are performed by a computer.

2. Data may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or
mechanical recording media.

3. Big Data is also data but with a huge size.

4. Big Data is a term used to describe a collection of data that is huge in size and yet growing exponentially with
time.
5. In short such data is so large and complex that none of the traditional data management tools are able to store
it or process it efficiently.

Characteristics of Big Data 3 ‘V’s of Big Data – Variety, Velocity, Volume.

1) Variety :Data Types


Variety of Big Data refers to structured, unstructured, and semistructured data that is gathered from multiple
sources. While in the past, data could only be collected from spreadsheets and databases, today data comes in an
array of forms such as emails, PDFs, photos, videos, audios, SM posts, and so much more.

2) Velocity :Data Speed

Velocity essentially refers to the speed at which data is being created in real-time. In a broader prospect, it
comprises the rate of change, linking of incoming data sets at varying speeds, and activity bursts.

3) Volume :Data Quantity

We already know that Big Data indicates huge ‘volumes’ of data that is being generated on a daily basis from
various sources like social media platforms, business processes, machines, networks, human interactions, etc.
Such a large amount of data are stored in data warehouses.

BigData sources:

Users

Sensors

Applications

Systems

TYPES:

Figure 1.1: Types of Big Data

I) Structured:
1. Any data that can be stored, accessed and processed in the form of fixed format is termed as a
Structured Data.
2. It accounts for about 20% of the total existing data and is used the most in programming and
computer-related activities.
3. There are two sources of structured data - machines and humans.
4. All the data received from sensors, weblogs, and financial systems are classified under machine-
generated data.
5. These include medical devices, GPS data, data of usage statistics captured by servers and applications.
6. Human-generated structured data mainly includes all the data a human input into a computer, such as his
name and other personal details.
7. When a person clicks a link on the internet, or even makes a move in a game, data is created.
8. Example: An 'Employee' table in a database is an example of Structured Data.

Employee_ID Employee_Name Gender


420 Angel Priya Male

100 Babu Bhaiya Male

202 Babita Ji Female

400 Jethalal Tapu Ke Papa Male


Gada
007 Dhinchak Pooja Female
9. Tools generates structured data:
a. Data Marts
b. RDBMS
c. Greenplum
d. TeraData
II) Unstructured:
1. Any data with unknown form or the structure is classified as unstructured data.
2. The rest of the data created, about 80% of the total account for unstructured big data.
3. Unstructured data is also classified based on its source, into machine-generated or human-
generated.
4. Machine-generated data accounts for all the satellite images, the scientific data from
various experiments and radar data captured by various facets of technology.
5. Human-generated unstructured data is found in abundance across the internet since it
includes social media data, mobile data, and website content.
6. This means that the pictures we upload to Facebook or Instagram handle, the videos we
watch on YouTube and even the text messages we send all contribute to the gigantic heap
that is unstructured data.
7. Examples of unstructured data include text, video, audio, mobile activity, social media
activity, satellite imagery, surveillance imagery etc.
8. The Unstructured data is further divided into:
a. Captured data:
 It is the data based on the user’s behavior.
 The best example to understand it is GPS via smartphones which help the user
each and every moment and provides a real-time output.
b. User-generated data:
 It is the kind of unstructured data where the user itself will put data on the internet
every movement.
 For example, Tweets and Re-tweets, Likes, Shares, Comments, on YouTube,
Facebook, etc.
9. Tools generates unstructured data:
a. Hadoop
b. HBase
c. Hive
d. Pig
e. MapR
f. Cloudera

III) Semi-Structured:

1. Semi Structured data is information that does not reside in a RDBMS.


2. Information that is not in the traditional database format as structured data, but
contains some organizational properties which make it easier to process, are included in
semi-structured data.
3. It may organized in tree pattern which is easier to analyze in some cases.
4. Examples of semi structured data might include XML documents and
NoSQL databases. Personal data stored in an XML file
<rec><name>Angel Priya</name><sex>Male</sex></rec>
<rec><name>Babu Bhaiya</name><sex>Male</sex></rec>
<rec><name>Babita Ji</name><sex>Female</sex></rec>
<rec><name>Jethalal Tapu Ke Papa Gada</name><sex>Male</sex></rec>
<rec><name>Dhinchak Pooja</name><sex>Female</sex></rec>

CHARACTERISTICS OF BIG DATA:

I) Variety:
1. Variety of Big Data refers to structured, unstructured, and semi structured data that is
gathered from multiple sources.
2. The type and nature of data is having great variety.
3. During earlier days, spreadsheets and databases were the only sources of data considered
by most of the applications.
4. Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs, audio,
etc. are also being considered in the analysis applications.

II) Velocity:

1. The term velocity refers to the speed of generation of data.


2. Big Data Velocity deals with the speed at which data flows in from sources like business
processes, application logs, networks, and social media sites, sensors, Mobile devices, etc.
3. The flow of data is massive and continuous.
4. The speed of data accumulation also plays a role in determining whether the data is
categorized into big data or normal data.
5. As can be seen from the figure 1.2 below, at first, mainframes were used wherein
fewer people used computers.
6. Then came the client/server model and more and more computers were evolved.
7. After this, the web applications came into the picture and started increasing over the
Internet.
1. Then, everyone began using these applications.
2. These applications were then used by more and more devices such as mobiles as they
were very easy to access. Hence, a lot of data!
Figure 1.2: Big Data Velocity
I) Volume:

1. The name Big Data itself is related to a size which is enormous.


2. Size of data plays a very crucial role in determining value out of data.
3. Also, whether a particular data can actually be considered as a Big Data or not, is
dependent upon the volume of data.
4. Hence, 'Volume' is one characteristic which needs to be considered while dealing with Big
Data.
5. This refers to the data that is tremendously large.
6. As shown in figure 1.3 below, the volume of data is rising exponentially.
7. In 2016, the data created was only 8 ZB and it is expected that, by 2020, the data would
rise up to 40 ZB, which is extremely large.

Figure 1.3: Big Data Volume

OTHER CHARACTERISTICS OF BIG DATA:

I) Programmable:
1. It is possible with big data to explore all types by programming logic.
2. Programming can be used to perform any kind of exploration because of the scale of the
data.

II) Data Driven:


1. The data driven approach is possible for scientists.
2. As data collected is huge amount.

III) Multi Attributes:


1. It is possible to deal with many gigabytes of data that consist of thousands of attributes.
2. As all data operations are now happening on a larger scale.

IV) Veracity:

1. The data captured is not in certain format.


2. Data captured can vary greatly.
3. Veracity means the trustworthiness and quality of data.
4. It is necessary that the veracity of the data is maintained.
5. For example, think about Facebook posts, with hashtags, abbreviations, images, videos,
etc., which make them unreliable and hamper the quality of their content.

6. Collecting loads and loads of data is of no use if the quality and trustworthiness of the
data is not up to the mark.

Evaluation of big data:-


The term 8Big Data9 has been in use since the early 1990s. John R. Mashey is given the credit of
making the term 8Big Data9 popular [7]. Big Data is not something that is completely
new or
only used from last two decades. People have been trying to use data analysis and
analytics
techniques to support their decision-making process from very long years back. The tremendous
increase of both structured and un-structured data sets made the task of traditional data analysis
very difficult and this transformed into 8Big Data9 in the last decade. The evolution of Big Data
can be classified in to 3 phases, where every phase has its own characteristics and capabilities
and has contributed to the contemporary meaning of Big Data.
APPLICATIONS OF BIG BATA:

I) Healthcare & Public Health Industry:


1. Big Data has already started to create a huge difference in the healthcare sector.
2. With the help of predictive analytics, medical professionals and HCPs are now able
to provide personalized healthcare services to individual patients.
3. Like entire DNA strings can be decoded in minutes.
4. Apart from that, fitness wearable’s, telemedicine, remote monitoring – all powered by Big
Data and AI
– are helping change lives for the better.

II) Academia
1. Big Data is also helping enhance education today.
2. Education is no more limited to the physical bounds of the classroom – there are
numerous online educational courses to learn from.
3. Academic institutions are investing in digital courses powered by Big Data technologies
to aid the all- round development of budding learners.

III) Banking
1. The banking sector relies on Big Data for fraud detection.
2. Big Data tools can efficiently detect fraudulent acts in real-time such as misuse of
credit/debit cards, archival of inspection tracks, faulty alteration in customer stats, etc.

IV) Manufacturing

1. According to TCS Global Trend Study, the most significant benefit of Big Data in
manufacturing is improving the supply strategies and product quality.
2. In the manufacturing sector, Big data helps create a transparent infrastructure, thereby,
predicting uncertainties and incompetence’s that can affect the business adversely.

V) IT
1. One of the largest users of Big Data, IT companies around the world are using Big Data
to optimize their functioning, enhance employee productivity, and minimize risks in
business operations.

2. By combining Big Data technologies with ML and AI, the IT sector is continually
powering innovation to find solutions even for the most complex of problems.
Challenges of Big Data

The following are the five most important challenges of the Big Data

a) Meeting the need for speed In today’s hypercompetitive business environment, companies
not only have to find and analyze the relevant data they need, they must find it quickly.

b) Visualization helps organizations perform analyses and make decisions much more rapidly,
but the challenge is going through the sheer volumes of data and accessing the level of detail
needed, all at a high speed.

c) The challenge only grows as the degree of granularity increases. One possible solution is
hardware. Some vendors are using increased memory and powerful parallel processing to crunch
large volumes of data extremely quickly.

d) Understanding the data

 It takes a lot of understanding to get data in the RIGHT SHAPE so that you can use
 visualization as part of data analysis.

d) Addressing data quality

 Even if you can find and analyze data quickly and put it in the proper context for the
 audience that will be consuming the information, the value of data for DECISION
MAKING PURPOSES will be jeopardized if the data is not accurate or timely.

This is a challenge with any data analysis.

e) Displaying meaningful results

 Plotting points on a graph for analysis becomes difficult when dealing with extremely
 large amounts of information or a variety of categories of information
 For example, imagine you have 10 billion rows of retail SKU data that you’re trying to
compare. The user trying to view 10 billion plots on the screen will have a hard time
seeing so many data points.
 By grouping the data together, or “binning,” you can more effectively visualize the data.

f) Dealing with outliers

 The graphical representations of data made possible by visualization can


communicate trends and outliers much faster than tables containing numbers and
text.
 Users can easily spot issues that need attention simply by glancing at a chart.
Outliers typically represent about 1 to 5 percent of data, but when you’re working
with massive amounts of data, viewing 1 to 5 percent of the data is rather difficult
 We can also bin the results to both view the distribution of data and see the
outliers.
 While outliers may not be representative of the data, they may also reveal
previously
 unseen and potentially valuable insights.
 Visual analytics enables organizations to take raw data and present it in a
meaningful way that generates the most value. However, when used with big data,
visualization is bound to lead to some challenges.
Big Data Analytics Challenges

Challenges include  Analysis,  Capture,  Data Curation,  Search,  Sharing,  Storage, 


Transfer,  Visualization,  Querying,  Updating

Need For Synchronization Across Disparate Data Sources As data sets are becoming bigger and
more diverse, there is a big challenge to incorporate them into an analytical platform. If this is
overlooked, it will create gaps and lead to wrong messages and insights.

2. Acute Shortage Of Professionals Who Understand Big Data Analysis The analysis of data is
important to make this voluminous amount of data being produced in every minute, useful. With
the exponential rise of data, a huge demand for big data scientists and Big Data analysts has been
created in the market. It is important for business organizations to hire a data scientist having
skills that are varied as the job of a data scientist is multidisciplinary. Another major challenge
faced by businesses is the shortage of professionals who understand Big Data analysis. There is a
sharp shortage of data scientists in comparison to the massive amount of data being produced.

3. Getting Meaningful Insights Through The Use Of Big Data Analytics It is imperative for
business organizations to gain important insights from Big Data analytics, and also it is
important that only the relevant department has access to this information. A big challenge faced
by the companies in the Big Data analytics is mending this wide gap in an effective manner.

4. Getting Voluminous Data Into The Big Data Platform It is hardly surprising that data is
growing with every passing day. This simply indicates that business organizations need to handle
a large amount of data on daily basis. The amount and variety of data available these days can
overwhelm any data engineer and that is why it is considered vital to make data accessibility
easy and convenient for brand owners and managers.

5. Uncertainty Of Data Management Landscape With the rise of Big Data, new technologies and
companies are being developed every day. However, a big challenge faced by the companies in
the Big Data analytics is to find out which technology will be best suited to them without the
introduction of new problems and potential risks.

6. Data Storage And Quality Business organizations are growing at a rapid pace. With the
tremendous growth of the companies and large business organizations, increases the amount of
data produced. The storage of this massive amount of data is becoming a real challenge for
everyone. Popular data storage options like data lakes/ warehouses are commonly used to gather
and store large quantities of unstructured and structured data in its native format. The real
problem arises when a data lakes/ warehouse try to combine unstructured and inconsistent data
from diverse sources, it encounters errors. Missing data, inconsistent data, logic conflicts, and
duplicates data all result in data quality challenges.

7. Security And Privacy Of Data Once business enterprises discover how to use Big Data, it
brings them a wide range of possibilities and opportunities. However, it also involves the
potential risks associated with big data when it comes to the privacy and the security of the data.
The Big Data tools used for analysis and storage utilizes the data disparate sources. This
eventually leads to a high risk of exposure of the data, making it vulnerable. Thus, the rise of
voluminous amount of data increases privacy and security concerns.

Why is Big Data Important?

The importance of big data does not revolve around how much data a company has but how a

company utilizes the collected data. Every company uses data in its own way; the more
efficiently
a company uses its data, the more potential it has to grow. The company can take data from any

source and analyze it to find answers which will enable:

1. Cost Savings: Some tools of Big Data like Hadoop and Cloud-Based Analytics can

bring cost advantages to business when large amounts of data are to be stored and these

tools also help in identifying more efficient ways of doing business.

2. Time Reductions: The high speed of tools like Hadoop and in-memory analytics can

easily identify new sources of data which helps businesses analyzing data immediately

and make quick decisions based on the learning.

3. Understand the market conditions: By analyzing big data you can get a better

understanding of current market conditions. For example, by analyzing customers’

purchasing behaviors, a company can find out the products that are sold the most and

produce products according to this trend. By this, it can get ahead of its competitors.

4. Control online reputation: Big data tools can do sentiment analysis. Therefore, you

can get feedback about who is saying what about your company. If you want to monitor

and improve the online presence of your business, then, big data tools can help in all

this.

5. Using Big Data Analytics to Boost Customer Acquisition and Retention

The customer is the most important asset any business depends on. There is no single

business that can claim success without first having to establish a solid customer base.

However, even with a customer base, a business cannot afford to disregard the high

competition it faces. If a business is slow to learn what customers are looking for, then

it is very easy to begin offering poor quality products. In the end, loss of clientele will

result, and this creates an adverse overall effect on business success. The use of big data

allows businesses to observe various customer related patterns and trends. Observing

customer behavior is important to trigger loyalty.


6. Using Big Data Analytics to Solve Advertisers Problem and Offer Marketing

InsightsBig data analytics can help change all business operations. This includes the ability to

match customer expectation, changing company’s product line and of course ensuring

that the marketing campaigns are powerful.

7. Big Data Analytics As a Driver of Innovations and Product Development

Another huge advantage of big data is the ability to help companies innovate and

redevelop their products.

Business Intelligence vs Big Data

Although Big Data and Business Intelligence are two technologies used to analyze data to help

companies in the decision-making process, there are differences between both of them. They
differ

in the way they work as much as in the type of data they analyze.

Traditional BI methodology is based on the principle of grouping all business data into a central

server. Typically, this data is analyzed in offline mode, after storing the information in an

environment called Data Warehouse. The data is structured in a conventional relational database

with an additional set of indexes and forms of access to the tables (multidimensional cubes).

A Big Data solution differs in many aspects to BI to use.

These are the main differences between Big Data and Business Intelligence:

1. In a Big Data environment, information is stored on a distributed file system, rather than

on a central server. It is a much safer and more flexible space.

2. Big Data solutions carry the processing functions to the data, rather than the data to the

functions. As the analysis is centered on the information, it´s easier to handle larger

amounts of information in a more agile way.

3. Big Data can analyze data in different formats, both structured and unstructured. The

volume of unstructured data (those not stored in a traditional database) is growing at levels
much higher than the structured data. Nevertheless, its analysis carries different challenges.

Big Data solutions solve them by allowing a global analysis of various sources of

information.

4. Data processed by Big Data solutions can be historical or come from real-time sources.

Thus, companies can make decisions that affect their business in an agile and efficient way.

5. Big Data technology uses parallel mass processing (MPP) concepts, which improves the

speed of analysis. With MPP many instructions are executed simultaneously, and since the

various jobs are divided into several parallel execution parts, at the end the overall results

are reunited and presented. This allows you to analyze large volumes of information

quickly.
Why hype around big data analytics?
Data Analytics and its type
Analytics is the discovery and communication of meaningful patterns in data. Especially,
valuable in areas rich with recorded information, analytics relies on the simultaneous
application of statistics, computer programming, and operation research to qualify
performance. Analytics often favors data visualization to communicate insight. 
Firms may commonly apply analytics to business data, to describe, predict, and improve
business performance. Especially, areas within include predictive analytics, enterprise decision
management, etc. Since analytics can require extensive computation(because of big data), the
algorithms and software used to analytics harness the most current methods in computer
science. 
In a nutshell, analytics is the scientific process of transforming data into insight for making
better decisions. The goal of Data Analytics is to get actionable insights resulting in smarter
decisions and better business outcomes. 
It is critical to design and built a data warehouse or Business Intelligence(BI) architecture that
provides a flexible, multi-faceted analytical ecosystem, optimized for efficient ingestion and
analysis of large and diverse data sets.

There are four types of data analytics: 


1. Predictive (forecasting)
2. Descriptive (business intelligence and data mining)
3. Prescriptive (optimization and simulation)
4. Diagnostic analytics
Predictive Analytics: Predictive analytics turn the data into valuable, actionable information.
predictive analytics uses data to determine the probable outcome of an event or a likelihood of
a situation occurring. 
Predictive analytics holds a variety of statistical techniques from modeling, machine, learning,
data mining, and game theory that analyze current and historical facts to make predictions
about a future event.  Techniques that are used for predictive analytics are: 
 Linear Regression
 Time series analysis and forecasting
 Data Mining
There are three basic cornerstones of predictive analytics:
 Predictive modeling
 Decision Analysis and optimization
 Transaction profiling
Descriptive Analytics: Descriptive analytics looks at data and analyze past event for insight as
to how to approach future events. It looks at the past performance and understands the
performance by mining historical data to understand the cause of success or failure in the past.
Almost all management reporting such as sales, marketing, operations, and finance uses this
type of analysis. 
The descriptive model quantifies relationships in data in a way that is often used to classify
customers or prospects into groups. Unlike a predictive model that focuses on predicting the
behavior of a single customer, Descriptive analytics identifies many different relationships
between customer and product. 
Common examples of Descriptive analytics are company reports that provide historic
reviews like: 
 Data Queries
 Reports
 Descriptive Statistics
 Data dashboard
Prescriptive Analytics: Prescriptive Analytics automatically synthesize big data,
mathematical science, business rule, and machine learning to make a prediction and then
suggests a decision option to take advantage of the prediction. 
Prescriptive analytics goes beyond predicting future outcomes by also suggesting action
benefit from the predictions and showing the decision maker the implication of each decision
option. Prescriptive Analytics not only anticipates what will happen and when to happen but
also why it will happen. Further, Prescriptive Analytics can suggest decision options on how to
take advantage of a future opportunity or mitigate a future risk and illustrate the implication of
each decision option. 
For example, Prescriptive Analytics can benefit healthcare strategic planning by using
analytics to leverage operational and usage data combined with data of external factors such as
economic data, population demography, etc. 
Diagnostic Analytics: In this analysis, we generally use historical data over other data to
answer any question or for the solution of any problem. We try to find any dependency and
pattern in the historical data of the particular problem. 
For example, companies go for this analysis because it gives a great insight into a problem, and
they also keep detailed information about their disposal otherwise data collection may turn out
individual for every problem and it will be very time-consuming.  Common techniques used
for Diagnostic Analytics are: 
 
 Data discovery
 Data mining
 Correlations

You might also like