Professional Documents
Culture Documents
Copyright Guideline
© 2018 Infosys Limited, Bangalore, India. All Rights Reserved.
Infosys believes the information in this document is accurate as of its publication date; such information is subject to change
without notice. Infosys acknowledges the proprietary rights of other companies to the trademarks, product names and such
other intellectual property rights mentioned in this document. Except as expressly permitted, neither this documentation nor
any part of it may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic,
mechanical, printing, photocopying, recording or otherwise, without the prior permission of Infosys Limited and/ or any named
intellectual property rights holders under this document.
Learning Objectives
On completion of this module, the learner should be able to:
By Magnai17 (Own work) [CC BY-SA 4.0 (http://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons
Question
The process begins with a question you want to answer or problem you want to solve. This
might be something like what are the characteristics of students who pass their projects? Or
how can I better stock my store with the products people most want to buy?
Wrangle
The next step of the process is data wrangling and this really has two parts, data acquisition
and data cleaning.
First, you need to acquire the data that you need to answer your question or solve your problem.
Then it's time to begin investigating the data and cleaning up any problems that you find.
The third phase is data exploration. During this phase, you spend sometime getting familiar
with your data, building your intuition about it and finding patterns. Once you're familiar with
your data, you'll usually want to draw some conclusions about it or maybe make some
predictions. For e.g. Netflix's movie recommendation systems needs to predict which movies
its users will like. This phase usually involves statistics or machine learning that are beyond
the scope of this course.
Communicate
Especially the data wrangling phase and the data exploration phase are very intertwined because you can't
really clean the data before you take a look to see what problems there are to solve.
And even when you think you're done wrangling and you're ready to just explore, you'll keep finding more
problems and have to go back. Throughout the process, you may need to return to your question and refine it
as you become more familiar with the data set.
And sometimes data acquisition actually comes before you pose a question. If a new, exciting data set is
released, you might acquire the data first, take a look and see what's there, and then think of some questions
you could answer with the data.
However, this should give you an idea of the high level steps that are involved when you're doing data
analysis.
• Prescriptive – This type of analysis reveals what actions should be taken. This is the most valuable kind of
analysis and usually results in rules and recommendations for next steps.
• Predictive – An analysis of likely scenarios of what might happen. The deliverables are usually a predictive
forecast.
• Diagnostic – A look at past performance to determine what happened and why. The result of the analysis is
often an analytic dashboard.
• Descriptive – What is happening now based on incoming data. To mine the analytics, you typically use a real-
time dashboard and/or email reports.
• Predictive analytics use big data to identify past patterns to predict the future. For example, some companies
are using predictive analytics for sales lead scoring. Some companies have gone one step further use
predictive analytics for the entire sales process, analyzing lead source, number of communications, types of
communications, social media, documents, CRM data, etc. Properly tuned predictive analytics can be used to
support sales, marketing, or for other types of complex forecasts.
• Descriptive analytics or data mining are at the bottom of the big data value chain, but they can be valuable
for uncovering patterns that offer insight. A simple example of descriptive analytics would be assessing credit
risk; using past financial performance to predict a customer’s likely financial performance. Descriptive
analytics can be useful in the sales cycle, for example, to categorize customers by their likely product
preferences and sales cycle.
• As you can see, harnessing big data analytics can deliver big value to business, adding context to data that
tells a more complete story. By reducing complex data sets to actionable intelligence you can make more
accurate business decisions. If you understand how to demystify big data for your customers, then your value
has just gone up tenfold.
Chart Types
• There are many different
charts that can be used to
represent data such as bar
charts, line charts, pie charts
etc. or even some complex
forms to enable interactivity.
• A complete list of interesting
and
Chart Title
6
0
Category 1 Category 2 Category 3 Category 4
Series 1 Series 2 Series 3
Chart Title
Category 4
Category 3
Category 2
Category 1
0 2 4 6 8 10 12 14
Series 1 Series 2 Series 3
Category 4
Category 3
Category 2
Category 1
0 1 2 3 4 5 6
Series 3 Series 2 Series 1
0
Category 1 Category 2 Category 3 Category 4
Series 1 Series 2 Series 3
30
25
20
15
10
0
1/5/2002 1/6/2002 1/7/2002 1/8/2002 1/9/2002
Series 1 Series 2
0
Category 1 Category 2 Category 3 Category 4
Series 1 Series 2 Series 3
• Highcharts
• Charts.js
OT and IT of IoT
• Node
• IoT Gateways
• IoT Platform
IoT platform is essentially what makes IoT happen for your device. It is the application that connects it with the cloud and
the corresponding output device.
Why Platforms
• Common standard application platform to hide the heterogeneity
• To provide a common working environment
• Easily integrate Azure IoT Suite with your systems and applications,
including Salesforce, SAP, Oracle Database, and Microsoft Dynamics
• Azure IoT Suite packages together Azure IoT services with
preconfigured solutions.
• Supports HTTP, Advanced Message Queuing Protocol (AMQP), and
MQ Telemetry Transport (MQTT).
• Gateway SDK
References
1. Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data, Wiley
publications, ISBN: 978-1-118-87613-8
2. Getting Started with Python Data Analysis, PACKT Publishing, by Phuong Vo.T.H (Author), Martin Czygan
(Author). ISBN-10: 1785285114, ISBN-13: 978-1785285110
3. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, O’Reilly Media, ISBN-10:
1449319793, ISBN-13: 978-1449319793
4. http://www.creativebloq.com/design-tools/data-visualization-712402
© 2013 Infosys Limited, Bangalore, India. All Rights Reserved. Infosys believes the information in this document is accurate as of its publication date; such information is subject to change without notice. Infosys acknowledges the proprietary rights of other
companies to the trademarks, product names and such other intellectual propertyrights mentioned in this document. Except as expresslyper mitted, neither this documentation nor anypart of it maybe r eproduced, stored in a retrieval system, or transmitted in
any form or byany means, electronic, mechanical, printing, photocopying, recording or otherwise, without the prior permission of Infosys Limited and/ or anynamed intellectual propertyrights holders under this document.