KNIME INTRODUCTION
KNIME Introduction
WHAT IS KNIME?
KNIME stands for Konstanz Information Miner
KNIME offers a complete platform for end-to-end data science, from creating analytic models, to
deploying them and sharing insights within the organization, through to data apps and services.
It is an Open Source Tool used a GUI to assembly ‘nodes’ for data preprocessing (ETL), modeling, Data
Science, Machine Learning and Visualization
Modules for:
Data Mining
Data Analysis
Data Science
Machine Learning
More modules and extensions can be added!
Written in Java and based on Eclipse
WHERE TO GET IT AND OTHER ONLINE
RESOURCES
Download KNIME
[Link]
Skip the registration form, go straight to step (2) and download the version with
free extensions
Getting Started Guide | KNIME
[Link]
WHAT CAN I DO WITH KNIME?
Data Access
File
Database I/O
Transformation
Filtering, Grouping, Joining
Analyze and Data mining
Weka
R /Python
Data Science
Machine Learning
Visualization
Different types of charts
Deployment
Text mining
HOW DOES KNIME COMPARE WITH OTHERS?
Gartner’s Magic Quadrant for Advance Analytics Platforms
Leaders quadrant with SAS, IBM and Dell
Strong Performer / Contender in Forrester’s Wave
KNIME LINGO
Store your work in a workspace
Workflows can contain nodes, meta nodes, connections, workflow variables, workflow credentials
and annotations
Workspace can contain workflow groups built using the workflow editor
Each node has a type, which identifies the algorithm associated with it
Nodes have parameters, inports and
outports, and can have any of these states:
Misconfigured
Configured
Queued for Execution
Running
Executed
KNIME ANALYTICS PLATFORM
Home
Local Space
KNIME Community Hub
Workflow Tabs
Workbench Editor
Node Repository
Workflow Space Explorer
Workflow Monitor
K-AI AI Assistant
GETTING SET UP WITH KNIME ANALYTICS PLATFORM
HOW TO BUILD A KNIME WORKFLOW
Search in Node Repository
Dragging nodes into Workflow Editor
Connecting Nodes
Configuring Nodes
Executing (per node or one-shot)
INPUT AND OUTPUT PORTS
The input(s) is the data that the node
processes via the node ports, and the
output(s) are the resulting data.
Each node has specific settings, which you
can adjust in a configuration dialog.
Different types are represented by different
node ports.
Only ports of the same type indicated by
the same color can be connected.
NODE STATUS
Node can be in four different states.
Node status is shown by a traffic light below each node.
WORKBENCH EDITOR WITH EXCEL READER
NODE
WORKFLOW WITH EXCEL SHEETS
PERFORMING K-MEANS CLUSTERING
DATA PREPROCESSING
EXAMPLE OF R SNIPPET
SIMPLE MODEL TRAINING FOR
CLASSIFICATION