Professional Documents
Culture Documents
Course Outcomes: Use various techniques of Big data storage and analytics in IoT
https://bau.edu/blog/characteristics-of-big-data/
https://energie.labs.fhv.at/~repe/bigdata/introduction-to-big-data-projects/introduction-to-big-data/
https://www.javatpoint.com/what-is-big-data
Department of Electronics and Telecommunication Engineering, VIIT, Pune-48 4
Introduction to Big Data
What is “big data”?
• "Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of
processing to enable enhanced decision making, insight discovery and process optimization” (Gartner 2012)
• Complicated (intelligent) analysis of data may make a small data “appear” to be “big”
Bottom line: Any data that exceeds our current capability of processing can be regarded as “big”
• Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using
on-hand database management tools or traditional data processing applications.
• The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization.
• The trend to larger data sets is due to the additional information derivable from analysis of a single large set of
related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be
found to "spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime,
and determine real-time roadway traffic conditions.”
Structured data
Structured data has certain predefined organizational properties and is present in structured or tabular schema, making it easier to
analyze and sort. In addition, thanks to its predefined nature, each field is discrete and can be accessed separately or jointly along with
data from other fields. This makes structured data extremely valuable, making it possible to collect data from various locations in the
database quickly.
Unstructured data
Unstructured data entails information with no predefined conceptual definitions and is not easily interpreted or analyzed by standard
databases or data models. Unstructured data accounts for the majority of big data and comprises information such as dates, numbers, and
facts. Big data examples of this type include video and audio files, mobile activity, satellite imagery, and No-SQL databases, to name a
few. Photos we upload on Facebook or Instagram and videos that we watch on YouTube or any other platform contribute to the growing
pile of unstructured data.
Semi-structured data
Semi-structured data is a hybrid of structured and unstructured data. This means that it inherits a few characteristics of structured data
but nonetheless contains information that fails to have a definite structure and does not conform with relational databases or formal
structures of data models. For instance, JSON and XML are typical examples of semi-structured data.
Exponential increase in
collected/generated data
100s of millions
of GPS enabled
76 million smart
devices sold
meters in 2009…
annually
200M by 2014
2+ billion
people on the
25+ TBs of Web by end
log data every 2011
day
• Value:
• benefit generated by using the information contained in the data to improve to outcomes of actions
• e.g. profit, medical or social benefits, customer, employee, or personal satisfaction
• Value is an essential characteristic of big data. It is not the data that we process or store. It is valuable and
reliable data that we store, process, and also analyze.
Task Tracker
• It works as a slave node for Job
Tracker.
• It receives task and code from Job
Tracker and applies that code on the
file.
• This process can also be called as a
Mapper.