You are on page 1of 62

Skills Academy: Big Data Engineer

Preface

November 2018
NOTICES
This information was developed for products and services offered in the USA.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for
information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to
state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any
non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document.
The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive, MD-NC119
Armonk, NY 10504-1785
United States of America
The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law:
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND,
EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in
certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these
changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the
program(s) described in this publication at any time without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of
those websites. The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information
concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available
sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM
products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the
examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and
addresses used by an actual business enterprise is entirely coincidental.
TRADEMARKS
IBM, the IBM logo, ibm.com, Big SQL, Db2, and Hortonworks are trademarks or registered trademarks of International Business Machines Corp.,
registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM
trademarks is available on the web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml.
Adobe, and the Adobe logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other
countries.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
© Copyright International Business Machines Corporation 2018.
This document may not be reproduced in whole or in part without the prior written permission of IBM.
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

© Copyright IBM Corp. 2018 P-2


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Preface

Contents
Preface................................................................................................................. P-1
Contents ............................................................................................................. P-3
Course overview................................................................................................. P-5
Document conventions ....................................................................................... P-6
Labs ................................................................................................................... P-7
Additional training resources .............................................................................. P-8
IBM product help ................................................................................................ P-9
Introduction to IBM Watson Studio ..................................................... 1-1
Unit objectives .................................................................................................... 1-3
What is Watson Studio? ..................................................................................... 1-4
Right tool for the right job ................................................................................... 1-5
IBM Watson Studio ............................................................................................. 1-7
Industry use cases ............................................................................................. 1-8
Watson Studio offerings ..................................................................................... 1-9
Projects ............................................................................................................ 1-10
Create a project ................................................................................................ 1-11
Communities .................................................................................................... 1-12
Simplified communication ................................................................................. 1-13
Collaborations .................................................................................................. 1-14
Managing collaborators .................................................................................... 1-15
Notebooks and data assets .............................................................................. 1-16
Watson Studio and other IBM Cloud services ................................................... 1-17
Watson Studio and Spark ................................................................................. 1-18
Watson Studio and Object Storage .................................................................. 1-19
Watson Studio high availability (public cloud) ................................................... 1-20
Getting started with Watson Studio .................................................................. 1-21
Checkpoint ....................................................................................................... 1-22
Checkpoint solution .......................................................................................... 1-23
Lab 1: Getting started with Watson Studio ........................................................ 1-24
Analyzing data with Watson Studio ..................................................... 2-1
Unit objectives .................................................................................................... 2-3
Jupyter notebook overview ................................................................................. 2-4
Creating notebooks ............................................................................................ 2-5
Coding and running notebooks ........................................................................... 2-6
Load and access data from local file................................................................... 2-7
Load and access data from Watson Studio data sets ......................................... 2-8
Load and access data from connections ............................................................ 2-9
Prepare and analyze the data........................................................................... 2-10

© Copyright IBM Corp. 2018 P-3


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Preface

Visualize the data ............................................................................................. 2-11


Collaborate with other project members ........................................................... 2-12
Sharing and publishing notebooks .................................................................... 2-13
Checkpoint ....................................................................................................... 2-14
Checkpoint solution .......................................................................................... 2-15
Lab 1: Analyzing data with Watson Studio ........................................................ 2-16
Unit summary ................................................................................................... 2-20

© Copyright IBM Corp. 2018 P-4


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Preface

Course overview
Preface overview
This course is designed to introduce students to IBM Watson Studio. The course covers
how to create and set up a project and to be familiar with how to create, code,
collaborate, and share notebooks while working with a variety of data sources to
analyze data.
Intended audience
Data Scientist, Data Engineers and Application Developers
Topics covered
Topics covered in this course include:
• Introduction to IBM Watson Studio
• Analyzing data with Watson Studio
Course prerequisites
• Completed the Introduction to Data Science course
OR
• Have a basic understanding of notebook technologies for data science

© Copyright IBM Corp. 2018 P-5


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Preface

Document conventions
Conventions used in this guide follow Microsoft Windows application standards, where
applicable. As well, the following conventions are observed:
• Bold: Bold style is used in demonstration and exercise step-by-step solutions to
indicate a user interface element that is actively selected or text that must be
typed by the participant.
• Italic: Used to reference book titles.
• CAPITALIZATION: All file names, table names, column names, and folder names
appear in this guide exactly as they appear in the application.
To keep capitalization consistent with this guide, type text exactly as shown.

© Copyright IBM Corp. 2018 P-6


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Preface

Labs
Lab format
Labs are designed to allow you to work according to your own pace. Practice with
product, to become proficient with how tasks and steps are performed. Use the
Purpose and Results sections, to guide you on what you will complete in the lab.

© Copyright IBM Corp. 2018 P-7


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Preface

Additional training resources


• Visit IBM Analytics Product Training and Certification on the IBM website for
details on:
• Instructor-led training in a classroom or online
• Self-paced training that fits your needs and schedule
• Comprehensive curricula and training paths that help you identify the courses
that are right for you
• IBM Analytics Certification program
• Other resources that will enhance your success with IBM Analytics Software
• For the URL relevant to your training requirements outlined above, bookmark:
• Information Management portfolio:
http://www-01.ibm.com/software/data/education/
• Predictive and BI/Performance Management/Risk portfolio:
http://www-01.ibm.com/software/analytics/training-and-certification/

© Copyright IBM Corp. 2018 P-8


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Preface

IBM product help


Help type When to use Location

Task- You are working in the product and IBM Product - Help link
oriented you need specific task-oriented help.

Books for You want to use search engines to Start/Programs/IBM


Printing find information. You can then print Product/Documentation
(.pdf) out selected pages, a section, or the
whole book.
Use Step-by-Step online books
(.pdf) if you want to know how to
complete a task but prefer to read
about it in a book.
The Step-by-Step online books
contain the same information as the
online help, but the method of
presentation is different.

IBM on the You want to access any of the


Web following:

• IBM - Training and Certification • http://www-01.ibm.com/


software/analytics/training-
and-certification/
• Online support • http://www-947.ibm.com/
support/entry/portal/
Overview/Software
• IBM Web site • http://www.ibm.com

© Copyright IBM Corp. 2018 P-9


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Preface

© Copyright IBM Corp. 2018 P-10


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Introduction to IBM Watson Studio

Introduction to
IBM Watson Studio

Data Science Foundations

© Copyright IBM Corporation 2018


Course materials may not be reproduced in whole or in part without the written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

© Copyright IBM Corp. 2018 1-2


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

Unit objectives
• What is Watson Studio?
• Setting up a project
• Working with collaborators
• Managing data assets

Introduction to IBM Watson Studio © Copyright IBM Corporation 2018

Unit objectives

© Copyright IBM Corp. 2018 1-3


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

What is Watson Studio?


• Watson Studio is a collaborative platform for data scientists, built on
open source components and IBM added value, available in the cloud
or on premises.
▪ Collaborative platform
− Community environment for sharing resources, tutorials, data sets
− Simplified communication between different users / job roles
▪ Open source components
− Python, Scala, R, SQL, Spark, Notebooks (Jupyter, Zeppelin)
▪ IBM value-add
− Watson Machine Learning, Watson Studio Canvas, Prescriptive Analytics, and
more
▪ Watson Studio Cloud, Watson Studio Local, Watson Studio Desktop

Introduction to IBM Watson Studio © Copyright IBM Corporation 2018

What is Watson Studio?


Watson Studio is a collaborative platform for data scientists, built on open source
components and IBM added value, available in the cloud or on premises. The
collaborative platform allows the users, whether they are data scientists, data
engineers, or application developers to share resources and work together seamlessly
within the platform. Watson Studio is built upon open source components such as
Python, Scala, R, SQL, Spark and notebooks.
If the open source tools are not enough for your needs, IBM has value-added
components such as Watson Machine Learning, Watson Studio Canvas (based on
SPSS), Prescriptive Analytics and more!
Watson Studio is available in three different offering. Watson Studio Cloud, which is
what you will be using in this course. Watson Studio Local, which is the on-premise
version. Watson Studio Desktop is a light-weight version that you can install on your
laptop.

© Copyright IBM Corp. 2018 1-4


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

Right tool for the right job

Data Engineer
OUTPUT INPUT Architects how data is
organized & ensures operability
Deliver and Understand Bluemix Data Connect
Deploy Model Problem and
Domain
Data Scientist
Communicate Ingest Gets deep into the data to draw
Results Data hidden insights for the business
Watson Studio

Create Explore and Business Analyst


and Build Understand Works with data to apply insights
Model Data to the business strategy
Watson Analytics

Transform:
Evaluate App Developer
Transform: Clean
Plugs into data and models &
Shape writes code to build apps
Bluemix
ANALYSIS
Introduction to IBM Watson Studio © Copyright IBM Corporation 2018

Right tool for the right job


Open source tools have benefits, in that you have choices, and lots of them. The
downside to lots of choices is knowing the right one to pick. Essentially, you want to
pick the right tool for the right job. Watson Studio is no exception. Watson Studio is
designed for a specific persona, but other personas can use it as it relates to their job.
Watson Studio is part of a larger picture, and that picture is the Watson Data Platform
(WDP). WDP will not be covered in this course.
Take a look at the diagram on the slide. Starting from the top and going clock-wise, you
have the input, analysis, and output phases. Within each phase are the objectives of
those phase. Each objective can overlap between various user personas.
Look at the list of personas on the right. The Data Engineer, the Data Scientist, the
Business Analyst, and the App Developer. Each persona has primary tools which will
help them do their job. For example, the Data Scientist's main tool is Watson Studio.
However, often in some organizations nowadays, the Data Scientist may also have to
perform the role of a Data Engineer, so another tool to consider is the Bluemix Data
Connect. Perhaps there is a team of different personas. Whatever the case may be,
you must decide what tool is right for the job, regardless of the personas. Keep in mind
that the definitions of personas can vary between different companies and it could also
evolve over time.

© Copyright IBM Corp. 2018 1-5


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

In addition to the tailored interface of Watson Studio, collaboration really ties the team
and the organization together by allowing them to share projects, code and ideas.
Imagine a data engineer builds out a new data source and shares that asset with the
Data Scientist and the Business Analyst. The Business Analyst immediately builds the
reports and dashboards they need. The Data Scientist experiments with the data and
ultimately builds a model that passes all the tests and is worth of promoting to new
applications. They can immediately share that model with the Application Developer
who deploys a new application using the model. Along this journey the team members
are keeping each other updated on their status, asking questions and maybe sharing
ideas or requirements.
This is where Data and Analytics Development becomes a team sport. No longer does
this need to be done in silos. Additionally, because these assets can now be published
– other departments can re-use these assets, making the entire organization more
agile.

© Copyright IBM Corp. 2018 1-6


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

IBM Watson Studio

Learn Create Collaborate


Built-in learning to get The best of open Community and social
started or go the source and IBM value- features that provide
distance with add to create state-of- meaningful
advanced tutorials the-art data products collaboration

http://datascience.ibm.com

Introduction to IBM Watson Studio © Copyright IBM Corporation 2018

IBM Watson Studio


Watson Studio is built as a collaborative platform. Watson Studio provides an easy way
for you to learn how to start using the platform. You can create state-of-the-art products
based on the data you derived using open source and IBM value-add tools. As you
innovate, you can collaborate with your team and the community to share and gain
insights. To get started, visit the Watson Studio website.

© Copyright IBM Corp. 2018 1-7


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

Industry use cases

Retail Transportation
Shopping Experience End-to-End Customer Experience
Loss & Fraud Prevention Operations Planning & Optimization
Task & Workforce Optimization Predictive Maintenance
Pricing & Assortment Optimization Route and Asset Optimization
Banking Media and Entertainment
Optimize Offers & Cross-Sell Audience & Fan Insight
Risk Management Social Media Insight
Fraud and Crime Management Content Demand Forecasting
Financial Performance Marketing & Advertising Optimization
Manufacturing Telco
Inventory Optimization Subscriber Analytics
Predictive Maintenance IOT Analytics
Health, Safety & Environment Proactive Marketing
Production Planning and Scheduling Network Design

Introduction to IBM Watson Studio © Copyright IBM Corporation 2018

Industry use cases


Here is a slide on the industry use cases that leverages data science. You can see that
data science spans across multiple industries but as you look closer, some of these
things are not entirely new. In fact, organizations have been doing these types of
activities for many years now. The advantage that you have is with Watson Studio, you
can easily collaborate with other data scientists using well known tools that are used in
the industry.

© Copyright IBM Corp. 2018 1-8


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

Watson Studio offerings


• Watson Studio Cloud (Public cloud)
▪ IBM Cloud subscription (https://datascience.ibm.com)
▪ https://github.com/IBMDataScience
• Watson Studio Local (Private cloud / on premise)
▪ Run on your own clusters
▪ Same tools and features
• Watson Studio Desktop (local install)
▪ Download and try locally

Introduction to IBM Watson Studio © Copyright IBM Corporation 2018

Watson Studio offerings


As mentioned on an earlier slide, there are three offerings available with Watson Studio.
Watson Studio Cloud is available through IBM Cloud (formerly known as IBM Bluemix)
as the Public cloud option. You sign up and register with your IBM Cloud account if you
have one, or you will create a new account. There is also a GitHub link that has a lot of
useful resources.
Watson Studio Local is the private cloud option. This is for when you need to use
Watson Studio on-premise or on your own private cloud servers. You will need to install
this on your own. You have the same tools and features as Watson Studio Cloud.
Watson Studio Desktop is a version for you to install locally on your laptop. It will give
you a feel for what Watson Studio is like before you commit to either using the Public or
the Private cloud option.

© Copyright IBM Corp. 2018 1-9


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

Projects
• The architecture of Watson Studio is centered around the project
• How you organize your resources for solving a business problem
• Integrate community, data assets, collaborators, analytic assets

Introduction to IBM Watson Studio © Copyright IBM Corporation 2018

Projects
The architecture of Watson Studio is centered on projects where everything is
seamlessly integrated. You create and organize projects to suit your business needs.
Projects consists of Data assets, Collaborators, Analytic assets and Community
resources combined with a number of open source and value-add tools.
Data assets are the files in your object store or connections such as a database, data
services, streaming data and other external files.
Collaborators can be assigned to your projects as admins, editors, or viewers.
Analytic assets are the notebooks and the models that you develop.
Watson Studio has a suite of tools available to help you with your job both from the
open source space as well as suite of value-add such as Decision Optimization,
Watson Machine Learning, and Streaming Analytics.
The runtime environment with Watson Studio is Apache Spark.

© Copyright IBM Corp. 2018 1-10


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

Create a project
• Spark service + associated services
• Storage options

Introduction to IBM Watson Studio © Copyright IBM Corporation 2018

Create a project
When you create a project, you need to specify a Spark service. You can either create
a new service or associate an existing one. You also need to specify an object store,
which you can easily set up and associate from your Watson Studio account.

© Copyright IBM Corp. 2018 1-11


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

Communities
• Articles
▪ Curated articles of interests to
data scientists
• Data sets
▪ Multiple open source data sets
including local, state, and
government data
• Notebooks and tutorials
▪ Example notebooks
▪ Notebook tutorials on how to do
specific use cases and access
different data source

Introduction to IBM Watson Studio © Copyright IBM Corporation 2018

Communities
Watson Studio communities are a place where you can find articles of interests for data
scientist. These are external articles that you can peruse. Within the Watson Studio
communities, you can also find and use open data sets that are ready to use, including
local, state, and government data. You simply download the dataset and load in into
your project. If you are working within a notebook, you can easily add the dataset to the
project. Communities are a great place to get started if you are exploring the data
science space. There are sample notebooks and tutorials to get you started or to learn
about new libraries and use cases.

© Copyright IBM Corp. 2018 1-12


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

Simplified communication

Business Analyst
Show progress of a solution.

Data Engineer App Developer


Show data needs. Show the overall
Receive access to data. solution to scale.

Data Scientist

Introduction to IBM Watson Studio © Copyright IBM Corporation 2018

Simplified communication
Watson Studio provides a simplified communication workflow for all the personas. Here
is an example. Say you have a team of four players. Each of those players have their
own roles. You, as the data scientist, need to work with the data engineer to get the
data into Watson Studio so that can use it to develop the models. Once the model has
been developed, the app developer need to put all of this into an application that can
scale. The business analyst need see the progress of the solutions with notebooks that
have been created Watson Studio provides a seamless way for everyone to work
together.

© Copyright IBM Corp. 2018 1-13


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

Collaborations
• Add and remove collaborators in your projects
▪ Only the collaborators in your project can access your data or notebooks
▪ Each Watson Studio account acts as a separate tenant of the Spark and
Object Storage services
▪ Tenants cannot access other tenant's data
• Share notebook
▪ A permalink is generated
▪ Can un-share the notebook

Introduction to IBM Watson Studio © Copyright IBM Corporation 2018

Collaborations
Your projects and all the assets within it can be accessed by the collaborators that you
add. Only the collaborators in your project can access your data or notebooks. Each
Watson Studio account acts as a separate tenant of the Spark and Object Storage
services. Tenants cannot access other tenant's data. You can share notebook with
others as well. A permanent link is generated for that notebook. You can also un-share
that notebook.

© Copyright IBM Corp. 2018 1-14


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

Managing collaborators
• Add/remove/edit
• Permissions
▪ Admin
▪ Editor
▪ Viewer

Introduction to IBM Watson Studio © Copyright IBM Corporation 2018

Managing collaborators
Add collaborators to your project by their email address. If they have an existing
account on IBM Cloud, they will be added immediately. Otherwise, they will receive an
invite to create a Watson Studio account. Choose the permissions for the collaborator.
The Admin can control project asserts, collaborators, and setting. The Editor can control
project assets. The Viewer can view the project. Collaborators can be removed from a
project or have their permissions updated.

© Copyright IBM Corp. 2018 1-15


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

Notebooks and data assets


• Jupyter and Zeppelin (on Watson Studio local) notebooks
• RStudio for statistical computing
• Data assets
▪ Unstructured data cloud storage

Introduction to IBM Watson Studio © Copyright IBM Corporation 2018

Notebooks and data assets


Notebooks are the workhorses of a data scientist. Notebooks allow a user to add
formatted text around executable code. This gives the ability to describe what is being
done and to show the result, including graphics, at the same time. There are a few very
popular notebooks. One of them is the Jupyter notebook that became very popular with
Python programmers. Since the Python language is also popular with data scientist,
using Jupyter with Python is a winning combination. Other notebooks could be added
over time. For example, Watson Studio local already added another popular notebook
called Zeppelin.
Another popular environment for data science, statistical analysis, and graphic
representation is through the R language. Watson Studio included the availability of
RStudio for people that prefer this R development environment.

© Copyright IBM Corp. 2018 1-16


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

Watson Studio and other IBM Cloud services


• Watson Studio is a part of the Watson Data Platform (WDP)
▪ All WDP services are seamlessly integrated and loosely coupled via
IBM Cloud
• Examples:
▪ Watson Studio is aware of the Watson Machine Learning deployment
service
▪ Watson Studio is aware of Db2 Warehouse on Cloud (formerly dashDB),
IBM Analytics Engine, and other data sources
▪ Watson Studio is aware of Spark services created
▪ Watson Studio is aware of data connections defined in the Data Connect
service
• The services do not depend on each other can be used as stand-alone
(loosely coupled)

Introduction to IBM Watson Studio © Copyright IBM Corporation 2018

Watson Studio and other IBM Cloud services


As mentioned briefly in one of the earlier slides, Watson Studio is part of the WDP. This
gives the ability to integrate seamlessly, but coupled loosely, with other tools via IBM
Cloud. Here are some examples (there are many more!) of how Watson Studio can
work with other tools. Watson Studio is aware of the Watson Machine Learning
deployment service. It is aware of Db2 Warehouse on Cloud, the IBM Analytics Engine,
and other data sources. Watson Studio is aware of the Spark services as well as data
connections defined in the Data Connect service. The value of this is that each service
is independent of each other. You only use what you need.

© Copyright IBM Corp. 2018 1-17


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

Watson Studio and Spark


• The runtime of Watson Studio is provided through Spark
• Specify a Spark configuration when a project is created
• The Watson Studio UI provides an easy way to switch the Spark
service for each notebook

Introduction to IBM Watson Studio © Copyright IBM Corporation 2018

Watson Studio and Spark


Watson Studio's runtime engine is Spark. You specify a Spark service when you create
a project. The Spark configuration must be set up first through IBM Cloud. The UI
provides an easy way to switch between Spark services for each notebook.

© Copyright IBM Corp. 2018 1-18


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

Watson Studio and Object Storage


• Object Storage provides a space to store non-database data sources
• Swift API
▪ Available through Watson Studio
• S3 API
▪ Use your own
• The Watson Studio UI provides an easy way to switch between
Object Storage

Introduction to IBM Watson Studio © Copyright IBM Corporation 2018

Watson Studio and Object Storage


Object Storage is where the unstructured data resides for your project. By design, the
Object Storage keeps the data separate from the computation. The Object Storage
supports two APIs, the Swift API, which is available through Watson Studio, and the S3
API, where you would need to provide the external credentials. Like the Spark services,
you can easily switch between Object Storages.

© Copyright IBM Corp. 2018 1-19


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

Watson Studio high availability (public cloud)


• Designed for 24x7 availability
▪ Continuous availability and continuous delivery

• Backup and Recovery


▪ Watson Studio is disaster resistant
▪ Notebooks in Watson Studio are stored in a 3-way Cloudant cluster in
multiple geographic zones
▪ Watson Studio provides integration with GitHub and an interface for
downloading notebooks if the customer wants to use their own back up

Introduction to IBM Watson Studio © Copyright IBM Corporation 2018

Watson Studio high availability (public cloud)


Watson Studio is designed for 24x7 availability. The product itself is designed for
continuous delivery and availability. Features and updates are rolled out without
downtime. Notebooks in Watson Studio are stored in a 3-way Cloudant cluster in
multiple geographic zones. Watson Studio also provides integration with GitHub, so you
can use that to manually download the notebooks if you want to use your own backups.

© Copyright IBM Corp. 2018 1-20


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

Getting started with Watson Studio


• Go to http://datascience.ibm.com
• Signup for an account
▪ If you have an IBM Cloud account, continue with your credentials
▪ Otherwise, create your IBM Cloud account
• Confirm your account with the activation code sent via email
• Fill out your profile
• Once logged in, give it about a minute to create your account
• You are now in the Watson Studio landing page

Introduction to IBM Watson Studio © Copyright IBM Corporation 2018

Getting started with Watson Studio


Once you are signed up, your environment is automatically set up with one Apache
Spark instance and 5 GB of object storage. From here you can explore any of the
tutorials, videos, sample notebooks, tutorials or articles in the community.

© Copyright IBM Corp. 2018 1-21


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

Checkpoint
1. Watson Studio is designed only for Data Scientists, other personas
would not know how to use it. True or False?
2. List the Watson Studio offerings and their capabilities.
3. The Watson Studio architecture is centered around which
component?
4. Collaboration with Watson Studio is an optional add-on component
that must be purchased. True or False?
5. Community provides access to articles, tutorials, and even data sets
that you can use. True or False?

Introduction to IBM Watson Studio © Copyright IBM Corporation 2018

Checkpoint

© Copyright IBM Corp. 2018 1-22


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

Checkpoint solution
1. Watson Studio is designed only for Data Scientists, other personas would not
know how to use it. True or False?
▪ False, while it is designed for Data Scientists, other personas such as Data Engineers
and Application Developers can use it for their job as well.
2. List the Watson Studio offerings and their capabilities.
▪ Watson Studio Cloud (Public Cloud via IBM Cloud)
▪ Watson Studio Local (Private Cloud / On-premise install)
▪ Watson Studio Desktop (Local install)
3. The Watson Studio architecture is centered around which component?
▪ The project
4. Collaboration with Watson Studio is an optional add-on component that must be
purchased. True or False?
▪ False. Collaboration is part of the design of Watson Studio .
5. Community provides access to articles, tutorials, and even data sets that you
can use. True or False?
▪ True.

Introduction to IBM Watson Studio © Copyright IBM Corporation 2018

Checkpoint solution

© Copyright IBM Corp. 2018 1-23


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

Lab 1
Getting started with Watson Studio

• Create a project
• Assign collaborators
• Load a data set into the object store

Introduction to IBM Watson Studio © Copyright IBM Corporation 2018

Lab 1: Getting started with Watson Studio

© Copyright IBM Corp. 2018 1-24


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

Lab 1:
Getting started with Watson Studio

Purpose:
You will be able to create and manage a project, add collaborators, and load a
data set to the object store.

Task 1. Signing up for a Watson Studio account.


1. Go to https://www.ibm.com/cloud/watson-studio.
2. Click the “Cloud sign-up/log-in” button on the top right in case you have no IBM
Cloud.
3. If you have an IBM Cloud account, you can continue with those credentials by
clicking “Sign in” button on the top right.
4. Otherwise, accept the terms and click create your IBM Cloud account.
5. You should get an email "ibmacct" with your IBMid confirmation code.
6. Then, on the following page, fill in the corresponding fields and click
CREATE ACCOUNT.
7. You are now ready to sign in to Watson Studio with your new account.
Task 2. Creating a new project.
1. Once you are logged in, click the Projects menu dropdown on the menu bar and
select View All Projects.
2. Click the New Project button.
3. Fill in the fields for the project name and description.
You can give it any name that you like, for example: Watson Studio Overview
Define the storage type.
4. Keep the default IBM Cloud Storage selected for now.
5. In the second step, click Add to bring up a new page where you will create an
IBM Cloud Object Storage.
6. Select the Lite plan and accept the defaults.
7. Back on the New Project page, on step three, click the Refresh link to have the
new storage option show up.
Define the compute engine next.
8. Under the Spark service, select an existing Spark service, or click the link to
create a new one.
9. Select the Lite plan and accept the defaults.
10. Click the Refresh link back on the New Project page to see the Spark service.

© Copyright IBM Corp. 2018 1-25


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

11. Click Create to create the project.


Task 3. Managing a project.
You will now see the Project you just created. There are specific project actions
that you can perform on the projects. There are a number of tabs. Click through
each of tabs to understand what they are and what you can do with them.
1. Click the Overview tab and you can see the number of assets, bookmarks and
collaborators on this project.
You also see the amount of storage that the project is using so you can manage
your storage.
2. Click the Assets tab to see the types of assets within the project. As a new
project, you will not have any assets, but take a look at the type of assets
available:
• Data Assets
• Notebooks
• Streams flows
• Models
• SPSS Modeler flows
• Data flows
3. Click the Bookmarks tab to see any bookmarks you have saved.
4. Click the Deployments tab to see your deployments.
5. Click the Collaborators tab to see the collaborators.
6. Click the Settings tab to see the project settings.
Under the Settings tab, you can update your project name and description. You
can also add additional services on this page.
7. Familiarize yourself with the projects list, click Projects > View All Projects.
8. On the project you just created, click the link under the ACTIONS column to see
the Edit and the Delete options.
The Edit link will bring you to the page you were at before this. If you want to
delete your project, you would click the Delete link.

© Copyright IBM Corp. 2018 1-26


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

Task 4. Adding collaborators.


1. Click the Collaborators tab.
2. On the right, click Add Collaborators
3. Specify the email of the person you would like to add to your project.
If that person has an account already, they will be added immediately. If not,
they will receive invite to register.
4. Specify the access level for the collaborator.
5. Click Invite.
Task 5. Loading data.
1. Within the project, click the Assets tab.
There are currently six types of asserts that you can manage on this page as
you have seen previously. The types of assets may change over time.
2. A side panel should have appeared. There should be Load, Files, and Catalog.
You can drag and drop or browse for a local file to add to your system. Go to
the Community page to download a sample file.
3. Click the Community link at the top of the page.
4. Click the Data Sets link to filter the results by the available data sets.
5. Scan through for the United States Demographic Measures: Zip Code…
card to preview the data.
Or you can use this search keyword: ZCTA to locate this card.
6. Briefly look through the Data Preview and the Column Details to understand
the data.
When you are ready, there is a Download button at the top right. Hover over
each button to see their names.
7. Click the Download button to download the dataset.
You may be prompted to save the file or it may automatically download and
open the file. Note that there are other options for the Card. You can bookmark
the dataset, you can like the data set, grab a link, or share the notebook.

© Copyright IBM Corp. 2018 1-27


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

8. Go to the default Downloads folder for your browser confirm the download of
the file.
9. Navigate back to the Projects page and select your project.
It should bring you back to your Assets page with the right side panel open. This
time, you have a file to load.
10. Load that file into your Project by using the dropping your file or using browse to
get to it.
When the file has been added, it will appear under the Data assets heading on
the page. If you no longer need to use the file, you can Remove it under
ACTIONS. As you add more assets, the list will grow.
Task 6 Managing the Object Storage.
When you created the project, you either created a new Object Store or
associated an existing one. In either case, you should have containers set up
with the Object Store.
1. Click the WDP Admin Console link at the top corner of the page.
You will see a list of Data Services. The most recent one created should be at
the top of the list and that is the one you want.
2. Click the Cloud Object Storage link under Data Services.
Here you see the instance of the object storage currently available.
3. Click the name of the object storage to see more details.
4. Click Manage in IBM Cloud.
Here is where you can delete the storage if you are no longer using it. You can
also choose to rename your service to something more meaningful if you use
more than one.
5. Click the row of the service to see the buckets in the storage.
6. Click the row of the bucket to see the actual dataset that you loaded.
You can delete the dataset from the bucket if you no longer use it. You also
have the option to download it for storage elsewhere.
Results:
You are now able to create and manage a project, add collaborators, and load
a data set to the object store.

© Copyright IBM Corp. 2018 1-28


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

Unit summary
• What is Watson Studio?
• Setting up a project
• Working with collaborators
• Managing data assets

Introduction to IBM Watson Studio © Copyright IBM Corporation 2018

© Copyright IBM Corp. 2018 1-29


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit 1 Introduction to IBM W atson Studio

© Copyright IBM Corp. 2018 1-30


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Analyzing data with Watson Studio

Analyzing data with


Watson Studio

Data Science Foundations

© Copyright IBM Corporation 2018


Course materials may not be reproduced in whole or in part without the written permission of IBM.
U n i t 2 A n a l y zi n g d a t a w i t h W a t s o n S t u d i o

© Copyright IBM Corp. 2018 2-2


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 A n a l y zi n g d a t a w i t h W a t s o n S t u d i o

Unit objectives
• Overview of Jupyter notebooks
• Creating notebooks
• Coding and running notebooks
• Sharing and publishing notebooks

Analyzing data with Watson Studio © Copyright IBM Corporation 2018

Unit objectives

© Copyright IBM Corp. 2018 2-3


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 A n a l y zi n g d a t a w i t h W a t s o n S t u d i o

Jupyter notebook overview


• Menu bar and toolbar

• Notebook action bar

• Cells in a Jupyter notebook


▪ Code cells
▪ Markdown cells

Analyzing data with Watson Studio © Copyright IBM Corporation 2018

Jupyter notebook overview


The menu bar controls most of the actions and settings for the particular notebook.
Simple and common functionalities are supported through the menu bar.
The notebook action bar is where you can view the notebook information. You can the
following from the action bar:
• Change notebook name
• View and add data sources
• Create a permanent URL for those with the link to view the notebook
• Schedule the notebook to run at a specific time
• Add project tokens so that code can access the project resources
• Save versions of your notebook
• Post comments to collaborators
• Find resources in the community.
There are two types of cells in a notebook. In particular, code cells are where you edit
and execute the code. The output of the cell will appear right beneath the cell. Tags can
be used to describe the cell for code readability and maintainability. Cells can be re-run
as often as you like. The markdown cells can be used to document and comment the
process. You can use this to structure your notebook by using the markdown language.
Images and file attachments can be added to the notebook via the markdown cells.

© Copyright IBM Corp. 2018 2-4


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 A n a l y zi n g d a t a w i t h W a t s o n S t u d i o

Creating notebooks
• Must have a project
▪ Must create a Spark instance
• Three ways to create a Jupyter notebook
▪ Blank
▪ From File
▪ From URL
• Specify a name for the notebook
• Specify the language
▪ Python 2, R, Scala, Python 3.5 (experimental)
• Specify the Spark version
▪ 2.1
▪ 2.0
▪ 1.6

Analyzing data with Watson Studio © Copyright IBM Corporation 2018

Creating notebooks
Before you can create a notebook, there are two things that you must do first. You must
have a project. The project is going to require that you associate a Spark instance with
it. If you have an existing Spark instance, you may use that. Otherwise, you will need to
create the Spark instance as well. When the project has been set up, you have three
ways to create a notebook. You will need to specify a name for your notebook. If you
need a blank notebook, you will need to specify the language you wish to use with the
notebook as well as the Spark version you want to use with it. You can create a
notebook by importing a Python, Scala, or R notebook file (.ipynb) from your local
device. You can also create a notebook from URL by simply providing the URL.

© Copyright IBM Corp. 2018 2-5


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 A n a l y zi n g d a t a w i t h W a t s o n S t u d i o

Coding and running notebooks


• Import libraries
▪ Preinstalled from Python or R
▪ Install custom or third-party libraries for any language
− For Scala, no libraries are preinstalled to the Spark service
• Load and access data
▪ Add a file from the local system to your object store
▪ Use a free data set from the Watson Studio home page
▪ Load data from a data source connection
• Prepare and analyze the data
• Visualize the results
• Collaborate with other project members

Analyzing data with Watson Studio © Copyright IBM Corporation 2018

Coding and running notebooks


Import the libraries that you need for the notebook. You can import preinstalled libraries
if you are using Python or R. If you are using Scala, you would have to install the
libraries manually and they will be available for the duration of that notebook. Then,
load and access the data. There are three options for loading data into the notebook.
You can add local files by either adding it from within a notebook or loading the file into
your project. You can use a free dataset from the Watson Studio homepage. Finally,
you can load data from a data source connection, such as an external data source or
another IBM Bluemix service. With the libraries and data loaded, you begin the real
work with data analysis by preparing the data, analyzing the data, make predictions and
visualization. You can also collaborate with other project members by adding (or
removing) them to your project.

© Copyright IBM Corp. 2018 2-6


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 A n a l y zi n g d a t a w i t h W a t s o n S t u d i o

Load and access data from local file

Analyzing data with Watson Studio © Copyright IBM Corporation 2018

Load and access data from local file


The next several slides shows the different options to load data into your notebook.
First one is loading data from local file. You can drag and drop the file directly from your
local device to add a new file. The file will be uploaded to the object store for
persistence. You can insert the dataset as a DataFrame directly into the notebook.

© Copyright IBM Corp. 2018 2-7


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 A n a l y zi n g d a t a w i t h W a t s o n S t u d i o

Load and access data from Watson Studio data sets

Analyzing data with Watson Studio © Copyright IBM Corporation 2018

Load and access data from Watson Studio data sets


Watson Studio provides a number of open data sets as part of its community. Search
for the dataset you need for the data analysis and add it into your project so becomes
assessable to the notebook.

© Copyright IBM Corp. 2018 2-8


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 A n a l y zi n g d a t a w i t h W a t s o n S t u d i o

Load and access data from connections

Analyzing data with Watson Studio © Copyright IBM Corporation 2018

Load and access data from connections


External data sources can be loaded into the project as well by using the Load Data
option under Data Services. You can load CSV files, data from On-premises database
or Cloud database (such as other Bluemix services).

© Copyright IBM Corp. 2018 2-9


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 A n a l y zi n g d a t a w i t h W a t s o n S t u d i o

Prepare and analyze the data


• Code the notebook

Analyzing data with Watson Studio © Copyright IBM Corporation 2018

Prepare and analyze the data


Once your data has been loaded, you can prepare and analyze the data within the
notebook by using markdowns and various libraries and codes.

© Copyright IBM Corp. 2018 2-10


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 A n a l y zi n g d a t a w i t h W a t s o n S t u d i o

Visualize the data

Analyzing data with Watson Studio © Copyright IBM Corporation 2018

Visualize the data


You can visualize the data. In this example here, the display method plots the data
points on a map. You can use other visualization libraries, such as PixieDust to
visualize your data.

© Copyright IBM Corp. 2018 2-11


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 A n a l y zi n g d a t a w i t h W a t s o n S t u d i o

Collaborate with other project members

Analyzing data with Watson Studio © Copyright IBM Corporation 2018

Collaborate with other project members


Collaboration is another value when working with Watson Studio. You can add
collaborators and set the appropriate access levels for the project.

© Copyright IBM Corp. 2018 2-12


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 A n a l y zi n g d a t a w i t h W a t s o n S t u d i o

Sharing and publishing notebooks

Analyzing data with Watson Studio © Copyright IBM Corporation 2018

Sharing and publishing notebooks


Notebooks can be shared. The permalink will always point to the most recent version of
the notebook.

© Copyright IBM Corp. 2018 2-13


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 A n a l y zi n g d a t a w i t h W a t s o n S t u d i o

Checkpoint
1. Watson Studio contains Zeppelin as a notebook interface.
True or False?
2. List the three ways to create a notebook.
3. List the three ways to load an access data from a notebook.
4. You can import visualization libraries into Watson Studio.
True or False?
5. Collaborators can be given certain access levels.
True or False?

Analyzing data with Watson Studio © Copyright IBM Corporation 2018

Checkpoint

© Copyright IBM Corp. 2018 2-14


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 A n a l y zi n g d a t a w i t h W a t s o n S t u d i o

Checkpoint solution
1. Watson Studio contains Zeppelin as a notebook interface.
True or False?
▪ False
2. List the three ways to create a notebook.
▪ From a File, an URL, or a blank notebook.
3. List the three ways to load an access data from a notebook.
▪ From a local file, from a Watson Studio data set, from a remote connection
4. You can import visualization libraries into Watson Studio.
True or False?
▪ True
5. Collaborators can be given certain access levels. True or False?
▪ True

Analyzing data with Watson Studio © Copyright IBM Corporation 2018

Checkpoint solution

© Copyright IBM Corp. 2018 2-15


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 A n a l y zi n g d a t a w i t h W a t s o n S t u d i o

Lab 1
Analyzing data with Watson Studio

• Run through a sample notebook in Watson Studio


• Use PixieDust for data visualization

Analyzing data with Watson Studio © Copyright IBM Corporation 2018

Lab 1: Analyzing data with Watson Studio

© Copyright IBM Corp. 2018 2-16


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 A n a l y zi n g d a t a w i t h W a t s o n S t u d i o

Lab 1:
Analyzing data with Watson Studio

Purpose:
Use Watson Studio and Jupyter notebooks with PixieDust for visualization.
Task 1. Creating the notebook.
Watson Studio provides a sample notebook that uses PixieDust, a visualization
library developed by IBM as an add-on to Python notebooks. Search for the
notebook by first going to Community page.
1. Click the Community link on the top of the Watson Studio page.
2. Click the Notebook link to filter the results to only show notebooks (in case link
is not there, simply provide it yourself by writing it as search criteria in filter text
field).
3. Type pixiedust in the search field.
Several results will appear.
4. Click on the one with the title Welcome to PixieDust.

Once the PixieDust page opens up, at the top right, you have two options to get
the notebook into your project. The quickest way is to click the Copy button in
the middle. The other option is to download the notebook and then import it from
your project.

5. Click the Copy button.


6. Select the Project. (Example: Watson Studio Overview)

© Copyright IBM Corp. 2018 2-17


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 A n a l y zi n g d a t a w i t h W a t s o n S t u d i o

7. Associate the Spark service with this. (There should be a default service already
created.)
8. Click Create Notebook.
With the notebook created, you should see at the top right, a message saying
that the notebook is Not Trusted.
When you open an untrusted notebook, it may execute hidden code. This
options provides an additional layer of security for untrusted notebooks.
9. Click the Not Trusted link
10. Select Trust.
The notebook will reload into a trusted mode.
Task 2. Using notebooks.
This first cell consists of the markdown language essentially introducing the
notebook describing how to get the dataset to use with this notebook.
Before you continue, clear the previous output results. Part of this demo is to
walk you through each of the cell.
1. On the Menu bar, click Cell > All Output > Clear.
Follow the directions within the notebook to complete the demo. Notebooks with
pre-written code can be easily overlooked and not provide a lot of value. Take
the time to read through each cell and understand what is needed to be done
before executing them. The rest of this task will provide additional guidance as
you work through the notebook.
The first cell in the notebook installs the PixieDust library.
2. Execute the cell and pay particular attention to the output.
If instructed to restart the notebook's kernel, you must do so.
3. The next cell imports the pixiedust library.
You may be instructed to restart the kernel after this cell and after other cells
within this notebook. When you restart the kernel, you rerun the import
pixiedust command.
4. Now are you ready to begin using PixieDust. Run the next two cells to create a
simple DataFrame and display it.
Notice how simply it is to create a visualization chart using PixieDust. Take a
few moments to explore this output. Play around with the chart types and take a
quick peek at the Options available.
5. Continue with the next cell to create a more interesting DataFrame to pass to
display().

© Copyright IBM Corp. 2018 2-18


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 A n a l y zi n g d a t a w i t h W a t s o n S t u d i o

Task 3. Working with external data.


So far you worked with hard-coded data. Now you will work CSV data loaded
from a URL.
1. Run the first cell in the section to load the cars.csv file and display the dataset.
The rendered chart displays the distribution of MPG per horsepower. If you look
at the renderer within the chart, you see that current chart is rendered using
matplotlib.
2. Change to the bokeh renderer to see its output.
If you do not have Seaborn, you will install it in the next cell. Otherwise, check
out the Seaborn renderer as well.
Note, if you install Seaborn, be sure to restart the kernel and re-run the import
pixiedust command.
3. In the next section, use the sampleData method from PixieDust to load another
dataset.
4. Inspect the automatically inferred schema of the data by using display() and
analyzing the table output.
5. Use bar charts to analyze the data in output of the next cell. Follow the
instructions in the markdown cell prior to explore the output using bar charts.
6. Continue the demo by going through the last display cell and exploring the chart
Options available.
7. Run the final cell in this section of the notebook and explore the output and the
remaining cells.
Results:
You now have a better understanding of how to use Watson Studio and
Jupyter notebooks with PixieDust for visualization.

© Copyright IBM Corp. 2018 2-19


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
U n i t 2 A n a l y zi n g d a t a w i t h W a t s o n S t u d i o

Unit summary
• Overview of Jupyter notebooks
• Creating notebooks
• Coding and running notebooks
• Sharing and publishing notebooks

Analyzing data with Watson Studio © Copyright IBM Corporation 2018

Unit summary

© Copyright IBM Corp. 2018 2-20


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
IBM Training

© Copyright IBM Corporation 2018. All Rights Reserved.

You might also like