You are on page 1of 10

www.anuupdates.

org
Unit – 1: Introduction to Big data

Data, classification Of Digital Data--structured, unstructured, semi-structured data, characteristics of data,


evaluation of big data, definition, and challenges of big data, what is big data and why to use big data?
business intelligence Vs big data.

……………………………………………………………………………………………………………………………..

1. Data:
In the pursuit of knowledge, data is a collection of discrete values that convey information, describing quantity,
quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further
interpreted. A datum is an individual state in a set of data.

Digital Data Classification:

Process of classifying data in relevant categories so that it can be used or applied more efficiently. The
classification of data makes it easy for the user to retrieve it. Data classification holds its importance when
comes to data security and compliance and also to meet different types of business or personal objective. It is
also of major requirement, as data must be easily retrievable within a specific period of time.

2. Types of Digital Data Classification:


Data can be broadly classified into 3 types.

1. Structured Data:

Structured data is created using a fixed schema and is maintained in tabular format. The elements in structured
data are addressable for effective analysis. It contains all the data which can be stored in the SQL database in
a tabular format. Today, most of the data is developed and processed in the simplest way to manage
information.

Examples –

Relational data, Geo-location, credit card numbers, addresses, etc.

Consider an example for Relational Data like you have to maintain a record of students for a university like
the name of the student, ID of a student, address, and Email of the student. To store the record of students
used the following relational schema and table for the same.

S_ID S_Name S_Address S_Email

1001 A Delhi A@gmail.com

1002 B Mumbai B@gmail.com

2. Unstructured Data:

It is defined as the data in which is not follow a pre-defined standard or you can say that any does not follow
any organized format. This kind of data is also not fit for the relational database because in the relational
database you will see a pre-defined manner or you can say organized way of data. Unstructured data is also
very important for the big data domain and To manage and store Unstructured data there are many platforms
to handle it like No-SQL Database.

Examples –

Word, PDF, text, media logs, etc.

1 © www.anuupdates.org Prepared by D.Venkata Reddy M.Tech(Ph.D), UGC NET, AP SET Qualified


www.anuupdates.org

3. Semi-Structured Data:

Semi-structured data is information that does not reside in a relational database but that have some
organizational properties that make it easier to analyze. With some process, you can store them in a relational
database but is very hard for some kind of semi-structured data, but semi-structured exist to ease space.

Example –

XML data.

Features of Data Classification:

The main goal of the organization of data is to arrange the data in such a form that it becomes fairly available
to the users. So it’s basic features as following.

• Homogeneity – The data items in a particular group should be similar to each other.

• Clarity – There must be no confusion in the positioning of any data item in a particular group.

• Stability – The data item set must be stable i.e. any investigation should not affect the same set of
classification.

• Elastic – One should be able to change the basis of classification as the purpose of classification changes.

3. Five Characteristics Of Good Quality Data!

One of the most important things to always remember is that not all data could be considered of fine quality
hence making them limited in their usefulness. In order to fully realize the benefits of data, it has to be of high
quality. This means that one should look out for certain characteristics in the data. These are:

1. Data should be precise which means it should contain accurate information. Precision saves time of the
user as well as their money.

2. Data should be relevant and according to the requirements of the user. Hence the legitimacy of the
data should be checked before considering it for usage.

3. Data should be consistent and reliable. False data is worse than incomplete data or no data at all.

4. Relevance of data is necessary in order for it to be of good quality and useful. Although in today’s
world of dynamic data any relevant information is not complete at all times however at the time of
its usage, the data has to be comprehensive and complete in its current form.

5. A high quality data is unique to the requirement of the user. Moreover, it is easily accessible and could
be processed further with ease.

4. What is big data?


Big data refers to data that are so large and complex that traditional methods of collection and analysis are
not possible. The amount and variety of big data has increased exponentially over the past decade.

Data which are very large in size is called Big Data. Normally we work on data of size MB(WordDoc ,Excel)
or maximum GB(Movies, Codes) but data in Peta bytes i.e. 10^15 byte size is called Big Data. It is stated that
almost 90% of today's data has been generated in the past 3 years.

Sources of Big Data

These data come from many sources like

• Social networking sites: Facebook, Google, LinkedIn all these sites generates huge amount of data on
a day to day basis as they have billions of users worldwide.

2 © www.anuupdates.org Prepared by D.Venkata Reddy M.Tech(Ph.D), UGC NET, AP SET Qualified


www.anuupdates.org
• E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge amount of logs from which users
buying trends can be traced.

• Weather Station: All the weather station and satellite gives very huge data which are stored and
manipulated to forecast weather.

• Telecom company: Telecom giants like Airtel, Vodafone study the user trends and accordingly publish
their plans and for this they store the data of its million users.

• Share Market: Stock exchange across the world generates huge amount of data through its daily
transaction.

3V's of Big Data

1. Velocity: The data is increasing at a very fast rate. It is estimated that the volume of data will double
in every 2 years.

2. Variety: Now a days data are not stored in rows and column. Data is structured as well as unstructured.
Log file, CCTV footage is unstructured data. Data which can be saved in tables are structured data like
the transaction data of the bank.

3. Volume: The amount of data which we deal with is of very large size of Peta bytes.

5. Challenges with Big Data


The challenges in Big Data are the real implementation hurdles. These require immediate attention and need
to be handled because if not handled then the failure of the technology may take place which can also lead to
some unpleasant result. Big data challenges include the storing, analyzing the extremely large and fast-growing
data.

Some of the Big Data challenges are:

1. Sharing and Accessing Data:

o Perhaps the most frequent challenge in big data efforts is the inaccessibility of data sets from
external sources.

o Sharing data can cause substantial challenges.

o It include the need for inter and intra- institutional legal documents.

o Accessing data from public repositories leads to multiple difficulties.

o It is necessary for the data to be available in an accurate, complete and timely manner because
if data in the companies information system is to be used to make accurate decisions in time
then it becomes necessary for data to be available in this manner.

2. Privacy and Security:

o It is another most important challenge with Big Data. This challenge includes sensitive,
conceptual, technical as well as legal significance.

o Most of the organizations are unable to maintain regular checks due to large amounts of data
generation. However, it should be necessary to perform security checks and observation in real
time because it is most beneficial.

o There is some information of a person which when combined with external large data may
lead to some facts of a person which may be secretive and he might not want the owner to
know this information about that person.

o Some of the organization collects information of the people in order to add value to their
business. This is done by making insights into their lives that they’re unaware of.
3 © www.anuupdates.org Prepared by D.Venkata Reddy M.Tech(Ph.D), UGC NET, AP SET Qualified
www.anuupdates.org

3. Analytical Challenges:

o There are some huge analytical challenges in big data which arise some main challenges
questions like how to deal with a problem if data volume gets too large?

o Or how to find out the important data points?

o Or how to use data to the best advantage?

o These large amount of data on which these type of analysis is to be done can be structured
(organized data), semi-structured (Semi-organized data) or unstructured (unorganized data).
There are two techniques through which decision making can be done:

▪ Either incorporate massive data volumes in the analysis.

▪ Or determine upfront which Big data is relevant.

4. Technical challenges:

o Quality of data:

▪ When there is a collection of a large amount of data and storage of this data, it comes
at a cost. Big companies, business leaders and IT leaders always want large data storage.

▪ For better results and conclusions, Big data rather than having irrelevant data, focuses
on quality data storage.

▪ This further arise a question that how it can be ensured that data is relevant, how much
data would be enough for decision making and whether the stored data is accurate or
not.

o Fault tolerance:

▪ Fault tolerance is another technical challenge and fault tolerance computing is extremely
hard, involving intricate algorithms.

▪ Nowadays some of the new technologies like cloud computing and big data always
intended that whenever the failure occurs the damage done should be within the
acceptable threshold that is the whole task should not begin from the scratch.

o Scalability:

▪ Big data projects can grow and evolve rapidly. The scalability issue of Big Data has lead
towards cloud computing.

▪ It leads to various challenges like how to run and execute various jobs so that goal of
each workload can be achieved cost-effectively.

▪ It also requires dealing with the system failures in an efficient manner. This leads to a
big question again that what kinds of storage devices are to be used.

4 © www.anuupdates.org Prepared by D.Venkata Reddy M.Tech(Ph.D), UGC NET, AP SET Qualified


www.anuupdates.org
6. Why to use big data?
Importance of Big Data :

The Big Data analytics is indeed a revolution in the field of Information Technology. The use of Data analytics
by the companies is enhancing every year.Big data has the characteristics of high variety, volume, and
velocity.Big Data involves the use of analytics techniques like machine learning, data mining, natural language
processing, and statistics. With the help of big data multiple operations can be performed at a single platform.
You can store Tbs of data, pre process it , analyze the data and visualize the data with the help of couple of
big data tools.

Data is extracted, prepared and blended to provide analysis for the businesses. Large enterprises and
multinational organizations use these techniques widely these days in different ways.

Big data analytics helps organizations to work with their data efficiently and use that data identify new
oportunities. Different technqiues and algorithms can be applied to predict from data. Mutliple business
strategies can be applied for future success of the company and that leads to smarter business moves, more
efficient operations and higher profits.

Following are the three main reasons that why Big data is so important and efficient.

Cost reduction. Big data technologies such as Hadoop and cloud-based analytics bring significant cost
advantages when it comes to storing large amounts of data

Faster, better decision making. With the speed of Hadoop and in-memory analytics, combined with the
ability to analyze new sources of data, businesses are able to analyze information immediately and make
decisions based on what they’ve learned.

New products and services. With the ability to gauge customer needs and satisfaction through analytics
comes the power to give customers what they want.

Real-time Benefits of Big Data Analytics:

The use of Big Data analytics is very flexible to another fields as well. With the use of big data alot there has
been an enormous growth in multiple industries. Some of them are

• Banking

• Technology

5 © www.anuupdates.org Prepared by D.Venkata Reddy M.Tech(Ph.D), UGC NET, AP SET Qualified


www.anuupdates.org
• Consumer

• Manufacturing

Specially in Banking sector, big data tools have been associated with their system. Multiple operations can be
performed on transactional data moreover tools like Apache Hive facilitate users to query on their data to
get results in a very short period of time. A user can optimize the query engine to get better query
performance.

The usability of big data is also increased in educational sector. There are new options for research and
analysis using data analytics.The insights provided by the big data analytics tools help in knowing the needs
of customers better.

Job Opportunities and Big Data Analytics:

With huge interest and investment in the Big Data technologies, the professionals carrying the skills of big data
analytics are in huge demand. Fields like Data Analytics and Data Engineering have the most worth now a
days. IT Executives , Business Analysts and Software developers are learning big data tools & techniques to
grow with the market of jobs & opportunities since some of the big data tools are based on Python and Java
so it is easier for the programmers who already working on these languages moreover users who know how
to pre-process and has skills like data cleaning, can easily learn about Big Data analyzation tools and analytics.
With the help of visualization tools like Power Bi, Qlikview, Tableau etc , a user can easily analyze the data
and present a new marketing strategy.

In different domains of industry, the nature of the job differs and so does the requirement of the industry.
Since analytics is the emerging in every field, the workforce needs are equally enormous. The job titles may
include Big Data Analyst, Big Data Engineer, Business Intelligence Consultants, Solution

6 © www.anuupdates.org Prepared by D.Venkata Reddy M.Tech(Ph.D), UGC NET, AP SET Qualified


www.anuupdates.org
7. business intelligence Vs big data
Comparison of Objectives Business Intelligence Big Data

The purpose of Business


Intelligence is to help the business The main purpose of Big Data is to
to make better decisions. Business capture, process, and analyze the
Purpose Intelligence helps in delivering data, both structured and
accurate reports by extracting unstructured to improve customer
information directly from the data outcomes.
source.

Operation systems, ERP databases, Hadoop, Spark, R Server, hive,


EcoSystem / Components
Data Warehouse, Dashboard etc. HDFS etc.

Below is the list of tools used for


business intelligence. Below is the list of tools used in
These tools enable a business to Big Data. These tools or
collate, analyze and visualize data, frameworks store a large amount
which can be used in making of data and process them to get
better business decisions and to insights from data to make good
come up with good strategic plans. decisions for the business.

• Tableau • Hadoop

• Qlik Sense • Spark


Tools • Hive
• Online analytical
processing (OLAP) • Polybase
• Sisense • Presto
• Data Warehousing • Cassandra
• Digital Dashboards and • Plotly
Data mining
• Cloudera
• Microsoft Power BI
• Storm etc
• Google Analytics etc

Below are the six features of


Business Intelligence
Big data can be described by some
Location intelligence, Executive
characteristics such as Volume,
Characteristics/ Properties Dashboards, “what if”
Variety, Variability, Velocity, and
analysis, Interactive reports,
Veracity.
Metadata layer, and Ranking
reports

Below is the list of benefits of Below is the list of benefits of Big


Business Intelligence Data

• Helps in making better • Better Decision making


Benefits
business decisions • Fraud detection
• Faster and more accurate • Storage, mining, and
reporting and analysis analysis of data

7 © www.anuupdates.org Prepared by D.Venkata Reddy M.Tech(Ph.D), UGC NET, AP SET Qualified


www.anuupdates.org
• Improved data quality • Market prediction &and
forecasting
• Reduced costs
• Improves the service
• Increase revenues
• Helps in implementing the
• Improved operational
new strategies
efficiency etc.
• Keep up with customer
trends

• Cost savings

• Better sales insights, which


helps in increasing
revenues etc

The banking
Social media, Healthcare, Gaming sector, Entertainment, and Social
Applied Fields
Industry, Food Industry etc media, Healthcare, Retail and
wholesale etc

8. The Evaluation of Big Data


970s and before was the era of mainframes. The data was essentially primitive and structured. Relational
databases evolved in 1980s and 1990s. The era was of data intensive applications. The World Wide Web
(WWW) and the Internet of Things (IOT) have led to an onslaught of structured, unstructured, and multimedia
data. Refer Table

8 © www.anuupdates.org Prepared by D.Venkata Reddy M.Tech(Ph.D), UGC NET, AP SET Qualified


www.anuupdates.org

9 © www.anuupdates.org Prepared by D.Venkata Reddy M.Tech(Ph.D), UGC NET, AP SET Qualified


www.anuupdates.org

10 © www.anuupdates.org Prepared by D.Venkata Reddy M.Tech(Ph.D), UGC NET, AP SET Qualified

You might also like