Professional Documents
Culture Documents
Preface
November 2018
NOTICES
This information was developed for products and services offered in the USA.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for
information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to
state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any
non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document.
The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive, MD-NC119
Armonk, NY 10504-1785
United States of America
The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law:
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND,
EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in
certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these
changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the
program(s) described in this publication at any time without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of
those websites. The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information
concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available
sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM
products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the
examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and
addresses used by an actual business enterprise is entirely coincidental.
TRADEMARKS
IBM, the IBM logo, ibm.com, Big SQL, Db2, and Hortonworks are trademarks or registered trademarks of International Business Machines Corp.,
registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM
trademarks is available on the web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml.
Adobe, and the Adobe logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other
countries.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
© Copyright International Business Machines Corporation 2018.
This document may not be reproduced in whole or in part without the prior written permission of IBM.
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents
Preface................................................................................................................. P-1
Contents ............................................................................................................. P-3
Course overview................................................................................................. P-5
Document conventions ....................................................................................... P-6
Labs ................................................................................................................... P-7
Additional training resources .............................................................................. P-8
IBM product help ................................................................................................ P-9
Introduction to IBM Watson Studio ..................................................... 1-1
Unit objectives .................................................................................................... 1-3
What is Watson Studio? ..................................................................................... 1-4
Right tool for the right job ................................................................................... 1-5
IBM Watson Studio ............................................................................................. 1-7
Industry use cases ............................................................................................. 1-8
Watson Studio offerings ..................................................................................... 1-9
Projects ............................................................................................................ 1-10
Create a project ................................................................................................ 1-11
Communities .................................................................................................... 1-12
Simplified communication ................................................................................. 1-13
Collaborations .................................................................................................. 1-14
Managing collaborators .................................................................................... 1-15
Notebooks and data assets .............................................................................. 1-16
Watson Studio and other IBM Cloud services ................................................... 1-17
Watson Studio and Spark ................................................................................. 1-18
Watson Studio and Object Storage .................................................................. 1-19
Watson Studio high availability (public cloud) ................................................... 1-20
Getting started with Watson Studio .................................................................. 1-21
Checkpoint ....................................................................................................... 1-22
Checkpoint solution .......................................................................................... 1-23
Lab 1: Getting started with Watson Studio ........................................................ 1-24
Analyzing data with Watson Studio ..................................................... 2-1
Unit objectives .................................................................................................... 2-3
Jupyter notebook overview ................................................................................. 2-4
Creating notebooks ............................................................................................ 2-5
Coding and running notebooks ........................................................................... 2-6
Load and access data from local file................................................................... 2-7
Load and access data from Watson Studio data sets ......................................... 2-8
Load and access data from connections ............................................................ 2-9
Prepare and analyze the data........................................................................... 2-10
Course overview
Preface overview
This course is designed to introduce students to IBM Watson Studio. The course covers
how to create and set up a project and to be familiar with how to create, code,
collaborate, and share notebooks while working with a variety of data sources to
analyze data.
Intended audience
Data Scientist, Data Engineers and Application Developers
Topics covered
Topics covered in this course include:
• Introduction to IBM Watson Studio
• Analyzing data with Watson Studio
Course prerequisites
• Completed the Introduction to Data Science course
OR
• Have a basic understanding of notebook technologies for data science
Document conventions
Conventions used in this guide follow Microsoft Windows application standards, where
applicable. As well, the following conventions are observed:
• Bold: Bold style is used in demonstration and exercise step-by-step solutions to
indicate a user interface element that is actively selected or text that must be
typed by the participant.
• Italic: Used to reference book titles.
• CAPITALIZATION: All file names, table names, column names, and folder names
appear in this guide exactly as they appear in the application.
To keep capitalization consistent with this guide, type text exactly as shown.
Labs
Lab format
Labs are designed to allow you to work according to your own pace. Practice with
product, to become proficient with how tasks and steps are performed. Use the
Purpose and Results sections, to guide you on what you will complete in the lab.
Task- You are working in the product and IBM Product - Help link
oriented you need specific task-oriented help.
Introduction to
IBM Watson Studio
Unit objectives
• What is Watson Studio?
• Setting up a project
• Working with collaborators
• Managing data assets
Unit objectives
Data Engineer
OUTPUT INPUT Architects how data is
organized & ensures operability
Deliver and Understand Bluemix Data Connect
Deploy Model Problem and
Domain
Data Scientist
Communicate Ingest Gets deep into the data to draw
Results Data hidden insights for the business
Watson Studio
Transform:
Evaluate App Developer
Transform: Clean
Plugs into data and models &
Shape writes code to build apps
Bluemix
ANALYSIS
Introduction to IBM Watson Studio © Copyright IBM Corporation 2018
In addition to the tailored interface of Watson Studio, collaboration really ties the team
and the organization together by allowing them to share projects, code and ideas.
Imagine a data engineer builds out a new data source and shares that asset with the
Data Scientist and the Business Analyst. The Business Analyst immediately builds the
reports and dashboards they need. The Data Scientist experiments with the data and
ultimately builds a model that passes all the tests and is worth of promoting to new
applications. They can immediately share that model with the Application Developer
who deploys a new application using the model. Along this journey the team members
are keeping each other updated on their status, asking questions and maybe sharing
ideas or requirements.
This is where Data and Analytics Development becomes a team sport. No longer does
this need to be done in silos. Additionally, because these assets can now be published
– other departments can re-use these assets, making the entire organization more
agile.
http://datascience.ibm.com
Retail Transportation
Shopping Experience End-to-End Customer Experience
Loss & Fraud Prevention Operations Planning & Optimization
Task & Workforce Optimization Predictive Maintenance
Pricing & Assortment Optimization Route and Asset Optimization
Banking Media and Entertainment
Optimize Offers & Cross-Sell Audience & Fan Insight
Risk Management Social Media Insight
Fraud and Crime Management Content Demand Forecasting
Financial Performance Marketing & Advertising Optimization
Manufacturing Telco
Inventory Optimization Subscriber Analytics
Predictive Maintenance IOT Analytics
Health, Safety & Environment Proactive Marketing
Production Planning and Scheduling Network Design
Projects
• The architecture of Watson Studio is centered around the project
• How you organize your resources for solving a business problem
• Integrate community, data assets, collaborators, analytic assets
Projects
The architecture of Watson Studio is centered on projects where everything is
seamlessly integrated. You create and organize projects to suit your business needs.
Projects consists of Data assets, Collaborators, Analytic assets and Community
resources combined with a number of open source and value-add tools.
Data assets are the files in your object store or connections such as a database, data
services, streaming data and other external files.
Collaborators can be assigned to your projects as admins, editors, or viewers.
Analytic assets are the notebooks and the models that you develop.
Watson Studio has a suite of tools available to help you with your job both from the
open source space as well as suite of value-add such as Decision Optimization,
Watson Machine Learning, and Streaming Analytics.
The runtime environment with Watson Studio is Apache Spark.
Create a project
• Spark service + associated services
• Storage options
Create a project
When you create a project, you need to specify a Spark service. You can either create
a new service or associate an existing one. You also need to specify an object store,
which you can easily set up and associate from your Watson Studio account.
Communities
• Articles
▪ Curated articles of interests to
data scientists
• Data sets
▪ Multiple open source data sets
including local, state, and
government data
• Notebooks and tutorials
▪ Example notebooks
▪ Notebook tutorials on how to do
specific use cases and access
different data source
Communities
Watson Studio communities are a place where you can find articles of interests for data
scientist. These are external articles that you can peruse. Within the Watson Studio
communities, you can also find and use open data sets that are ready to use, including
local, state, and government data. You simply download the dataset and load in into
your project. If you are working within a notebook, you can easily add the dataset to the
project. Communities are a great place to get started if you are exploring the data
science space. There are sample notebooks and tutorials to get you started or to learn
about new libraries and use cases.
Simplified communication
Business Analyst
Show progress of a solution.
Data Scientist
Simplified communication
Watson Studio provides a simplified communication workflow for all the personas. Here
is an example. Say you have a team of four players. Each of those players have their
own roles. You, as the data scientist, need to work with the data engineer to get the
data into Watson Studio so that can use it to develop the models. Once the model has
been developed, the app developer need to put all of this into an application that can
scale. The business analyst need see the progress of the solutions with notebooks that
have been created Watson Studio provides a seamless way for everyone to work
together.
Collaborations
• Add and remove collaborators in your projects
▪ Only the collaborators in your project can access your data or notebooks
▪ Each Watson Studio account acts as a separate tenant of the Spark and
Object Storage services
▪ Tenants cannot access other tenant's data
• Share notebook
▪ A permalink is generated
▪ Can un-share the notebook
Collaborations
Your projects and all the assets within it can be accessed by the collaborators that you
add. Only the collaborators in your project can access your data or notebooks. Each
Watson Studio account acts as a separate tenant of the Spark and Object Storage
services. Tenants cannot access other tenant's data. You can share notebook with
others as well. A permanent link is generated for that notebook. You can also un-share
that notebook.
Managing collaborators
• Add/remove/edit
• Permissions
▪ Admin
▪ Editor
▪ Viewer
Managing collaborators
Add collaborators to your project by their email address. If they have an existing
account on IBM Cloud, they will be added immediately. Otherwise, they will receive an
invite to create a Watson Studio account. Choose the permissions for the collaborator.
The Admin can control project asserts, collaborators, and setting. The Editor can control
project assets. The Viewer can view the project. Collaborators can be removed from a
project or have their permissions updated.
Checkpoint
1. Watson Studio is designed only for Data Scientists, other personas
would not know how to use it. True or False?
2. List the Watson Studio offerings and their capabilities.
3. The Watson Studio architecture is centered around which
component?
4. Collaboration with Watson Studio is an optional add-on component
that must be purchased. True or False?
5. Community provides access to articles, tutorials, and even data sets
that you can use. True or False?
Checkpoint
Checkpoint solution
1. Watson Studio is designed only for Data Scientists, other personas would not
know how to use it. True or False?
▪ False, while it is designed for Data Scientists, other personas such as Data Engineers
and Application Developers can use it for their job as well.
2. List the Watson Studio offerings and their capabilities.
▪ Watson Studio Cloud (Public Cloud via IBM Cloud)
▪ Watson Studio Local (Private Cloud / On-premise install)
▪ Watson Studio Desktop (Local install)
3. The Watson Studio architecture is centered around which component?
▪ The project
4. Collaboration with Watson Studio is an optional add-on component that must be
purchased. True or False?
▪ False. Collaboration is part of the design of Watson Studio .
5. Community provides access to articles, tutorials, and even data sets that you
can use. True or False?
▪ True.
Checkpoint solution
Lab 1
Getting started with Watson Studio
• Create a project
• Assign collaborators
• Load a data set into the object store
Lab 1:
Getting started with Watson Studio
Purpose:
You will be able to create and manage a project, add collaborators, and load a
data set to the object store.
8. Go to the default Downloads folder for your browser confirm the download of
the file.
9. Navigate back to the Projects page and select your project.
It should bring you back to your Assets page with the right side panel open. This
time, you have a file to load.
10. Load that file into your Project by using the dropping your file or using browse to
get to it.
When the file has been added, it will appear under the Data assets heading on
the page. If you no longer need to use the file, you can Remove it under
ACTIONS. As you add more assets, the list will grow.
Task 6 Managing the Object Storage.
When you created the project, you either created a new Object Store or
associated an existing one. In either case, you should have containers set up
with the Object Store.
1. Click the WDP Admin Console link at the top corner of the page.
You will see a list of Data Services. The most recent one created should be at
the top of the list and that is the one you want.
2. Click the Cloud Object Storage link under Data Services.
Here you see the instance of the object storage currently available.
3. Click the name of the object storage to see more details.
4. Click Manage in IBM Cloud.
Here is where you can delete the storage if you are no longer using it. You can
also choose to rename your service to something more meaningful if you use
more than one.
5. Click the row of the service to see the buckets in the storage.
6. Click the row of the bucket to see the actual dataset that you loaded.
You can delete the dataset from the bucket if you no longer use it. You also
have the option to download it for storage elsewhere.
Results:
You are now able to create and manage a project, add collaborators, and load
a data set to the object store.
Unit summary
• What is Watson Studio?
• Setting up a project
• Working with collaborators
• Managing data assets
Unit objectives
• Overview of Jupyter notebooks
• Creating notebooks
• Coding and running notebooks
• Sharing and publishing notebooks
Unit objectives
Creating notebooks
• Must have a project
▪ Must create a Spark instance
• Three ways to create a Jupyter notebook
▪ Blank
▪ From File
▪ From URL
• Specify a name for the notebook
• Specify the language
▪ Python 2, R, Scala, Python 3.5 (experimental)
• Specify the Spark version
▪ 2.1
▪ 2.0
▪ 1.6
Creating notebooks
Before you can create a notebook, there are two things that you must do first. You must
have a project. The project is going to require that you associate a Spark instance with
it. If you have an existing Spark instance, you may use that. Otherwise, you will need to
create the Spark instance as well. When the project has been set up, you have three
ways to create a notebook. You will need to specify a name for your notebook. If you
need a blank notebook, you will need to specify the language you wish to use with the
notebook as well as the Spark version you want to use with it. You can create a
notebook by importing a Python, Scala, or R notebook file (.ipynb) from your local
device. You can also create a notebook from URL by simply providing the URL.
Checkpoint
1. Watson Studio contains Zeppelin as a notebook interface.
True or False?
2. List the three ways to create a notebook.
3. List the three ways to load an access data from a notebook.
4. You can import visualization libraries into Watson Studio.
True or False?
5. Collaborators can be given certain access levels.
True or False?
Checkpoint
Checkpoint solution
1. Watson Studio contains Zeppelin as a notebook interface.
True or False?
▪ False
2. List the three ways to create a notebook.
▪ From a File, an URL, or a blank notebook.
3. List the three ways to load an access data from a notebook.
▪ From a local file, from a Watson Studio data set, from a remote connection
4. You can import visualization libraries into Watson Studio.
True or False?
▪ True
5. Collaborators can be given certain access levels. True or False?
▪ True
Checkpoint solution
Lab 1
Analyzing data with Watson Studio
Lab 1:
Analyzing data with Watson Studio
Purpose:
Use Watson Studio and Jupyter notebooks with PixieDust for visualization.
Task 1. Creating the notebook.
Watson Studio provides a sample notebook that uses PixieDust, a visualization
library developed by IBM as an add-on to Python notebooks. Search for the
notebook by first going to Community page.
1. Click the Community link on the top of the Watson Studio page.
2. Click the Notebook link to filter the results to only show notebooks (in case link
is not there, simply provide it yourself by writing it as search criteria in filter text
field).
3. Type pixiedust in the search field.
Several results will appear.
4. Click on the one with the title Welcome to PixieDust.
Once the PixieDust page opens up, at the top right, you have two options to get
the notebook into your project. The quickest way is to click the Copy button in
the middle. The other option is to download the notebook and then import it from
your project.
7. Associate the Spark service with this. (There should be a default service already
created.)
8. Click Create Notebook.
With the notebook created, you should see at the top right, a message saying
that the notebook is Not Trusted.
When you open an untrusted notebook, it may execute hidden code. This
options provides an additional layer of security for untrusted notebooks.
9. Click the Not Trusted link
10. Select Trust.
The notebook will reload into a trusted mode.
Task 2. Using notebooks.
This first cell consists of the markdown language essentially introducing the
notebook describing how to get the dataset to use with this notebook.
Before you continue, clear the previous output results. Part of this demo is to
walk you through each of the cell.
1. On the Menu bar, click Cell > All Output > Clear.
Follow the directions within the notebook to complete the demo. Notebooks with
pre-written code can be easily overlooked and not provide a lot of value. Take
the time to read through each cell and understand what is needed to be done
before executing them. The rest of this task will provide additional guidance as
you work through the notebook.
The first cell in the notebook installs the PixieDust library.
2. Execute the cell and pay particular attention to the output.
If instructed to restart the notebook's kernel, you must do so.
3. The next cell imports the pixiedust library.
You may be instructed to restart the kernel after this cell and after other cells
within this notebook. When you restart the kernel, you rerun the import
pixiedust command.
4. Now are you ready to begin using PixieDust. Run the next two cells to create a
simple DataFrame and display it.
Notice how simply it is to create a visualization chart using PixieDust. Take a
few moments to explore this output. Play around with the chart types and take a
quick peek at the Options available.
5. Continue with the next cell to create a more interesting DataFrame to pass to
display().
Unit summary
• Overview of Jupyter notebooks
• Creating notebooks
• Coding and running notebooks
• Sharing and publishing notebooks
Unit summary