You are on page 1of 2

Cheat Sheet: Building a KNIME Workflow for Beginners

Getting started with KNIME Analytics Platform EXPLORE ANALYZE


The Learner node trains a C4.5 or a CART

(All visualizations are interactive)


• Use the Getting Started Guide to take your first steps with visual Scatter Plot: Represents Sunburst Chart: Displays Stacked Area Chart: Plots
workflows at: www.knime.com/getting-started-guide decision tree. The configuration window
input data rows as points categorical columns multiple numerical data Decision Tree Learner
• Learn more about included nodes and explore working examples in the includes options for pruning, early
in a two dimensional plot. through a hierarchy of columns on top of each
KNIME Analytics Platform Version 5 Starter Perspective Collection on stopping, information measures, splitting
Input dimensions rings. Each ring is sliced other using the previous
KNIME Community Hub. values, and more. Both the Learner and
(columns) on the x-y axis according to the nominal line as the base reference.
the Predictor node provide an interactive
plot and graphical values in the correspond- The areas in between
Node Action Bar: Interact directly with the node view where the decision tree is displayed
properties can be changed ing column and to the lines are colored for easier
to, e.g., configure, execute, cancel or reset a node. together with the input data propagation.
in the configuration selected hierarchy. This comparison. This chart is
Configure: Open the configuration dialog. window or interactively in is a powerful chart for commonly used to
Concatenate Implements the k-Means clustering
Execute: Executes the node. the node view. multivariate analysis. visualize trending topics. algorithm. Number of clusters must be set
Cancel: Cancels the execution of the node. K-Means
prior to node execution. This node builds
Reset: Resets the node.
Plots numerical values in data columns (y-axis) the clusters. The Cluster Assigner node
Node Labels: Double click “Add comment” below Line Plot
against values in a reference column (x-axis). Color Manager
Assigns a color property to each input row
Interactive Pie Chart
Visualizes one aggregated metric for different data finds the closest cluster and assigns it to
the node to add a comment/label.
Data points are connected via colored lines. If based on the row’s value in a selected column. partitions with colored slices on a circle where the the input data row. Being an unsupervised
Dynamic ports: Additional input ports can be
the reference column on the x-axis contains This color property affects the graphical areas are proportional to the metric values. The algorithm, this node pair doesn’t follow
Add comment added by clicking the plus on the left
sorted time values, the line plot graphically representation in the upcoming views. partitions are defined by a categorical column. the classic Learner - Predictor scheme.
side of the node.
represents the evolution of a time series.
The Learner node trains a logistic
Not configured: Node is not yet configured and cannot be Data Explorer Box Plot Bar Chart
Logistic Regression
Learner
regression model to predict categorical
executed with its current settings Provides an interactive view to Visualizes one or more aggregated metrics for
Visualizes numeric columns using the quartile target values. The configuration window
summarize the statistics of the input data different data partitions with rectangular bars where
Configured: Node has been correctly configured and may be statistics. Watch out for the points at the end includes options for solver, input feature
via statistical measures and histograms - the heights are proportional to the metric values. The
executed at any time of the whiskers - they might mark outliers! choice, regularization functions to avoid
for both numerical and nominal columns. partitions are defined by a categorical column.
overfitting, & more.
Executed: Node has been successfully executed and results
can be viewed and used in downstream nodes. Scorer
Calculates a number of performance
Error: The node has encountered an error during execution. measures such as accuracy, F1-score,
or Cohen’s Kappa, to quantify the
DEPLOY quality of a classifier.
Marks the data table to be exported to BIRT - a
Data to Report Numeric Scorer Calculates a number of numerical error
partially open source reporting tool integrated
READ Learner
Learner Nodes: Supervised
within KNIME. When switching from KNIME to 31
measures, such as root mean squared
algorithms in KNIME Analytics error, mean absolute error, or R², to
Reads CSV files. It has an Explore Platform have a Learner node to BIRT, the marked data sets are imported into BIRT.
quantify the quality of a numerical
Reads machine learning The Image To Report node marks the input images
CSV Reader auto-detect function to Predictor train a model on a previously predictor model.
Model Reader
models generated with to be exported to BIRT.
automatically guess the file labelled training set. Excel Writer
any of the Learner nodes.
structure. As for other reader Displays the Receiver Operating
Models are usually saved Predictor Nodes: Used for Writes the input data table to a sheet in an
nodes, you can add a "File system ROC Curve Characteristic (ROC) curve of a classifier
after training and reused applying models. The two inputs Excel file (XLS or XLSX).
connection" input port to connect working on a binary class problem. One of
in deployment. are the trained model and the
to different data sources. the two classes is arbitrarily chosen as
data to process. The output Writes the input data table to a file using the .table the positive class and the ROC curve is
Reads data from a .table Table Writer
contains the original data and KNIME proprietary format. This format includes built on the probabilities/scores produced
Amazon file. .table files are
Authenticator Amazon S3 Connector CSV Reader Table Reader the model predictions. the full file structure and is optimized for space for that class on the input data set.
organized using a KNIME
and speed. Including the table structure in the file
AWS proprietary format,
is a great advantage - especially when exchanging Integrations to many open source data analytics tools
including the full file
data files among users. are also available. Some use the KNIME node GUI (H2O,
structure and are Read Transfom Analyze Deploy CSV Writer
Weka, Keras, Spark MLlib). Others offer nodes with a
optimized for space and
Excel Reader speed - providing Writes the input data table into a CSV file or development environment for scripting and debugging
Reads content from sheets in (R, Python, Java).
maximum performance to a remote location denoted by an URL.
Excel files (XLS, XLSX). Sheet and
with minimum
cells to be read can be defined in
configuration! Google Sheets
the configuration window. Writer

Reads data from a Writes the input data table into a Google Sheet file.
Google Sheets
Allows users to manually create Reader Google Sheet file. Authentication occurs on the Google site. Google
Table Creator
a data table in its configuration Authentication occurs credentials are not saved within the KNIME
window as a data sheet. Data on the Google site. workflow.
cells can be copied and pasted in Google credentials are Send to Tableau Server • E-Books: KNIME Advanced Luck covers
the sheet. Perfect for generating not saved within the advanced features & more. Practicing Data
small data sets. KNIME workflow. Exports the input data table into a Tableau file or Science is a collection of data science case
server for reporting. studies from past projects. Both available
In reader and writer nodes, the file path is expressed relatively to a key location of the at knime.com/knimepress
current KNIME installation, like workflow, workflow data area, and mountpoint.
• KNIME Blog: Engaging topics, challenges,
industry news, & knowledge nuggets at
knime.com/blog
TRANSFORM
• E-Learning Courses: Take our free online
GroupBy
Groups the rows of a table by the unique values Partitioning String to Date&Time Column Renamer Concatenate self-paced courses to learn about the
in selected columns and calculates aggregation Splits data into two subsets according to a Converts values in a String column into Merges two or more data tables vertically different steps in a data science project
and statistical measures for the defined groups. sampling strategy. This node is generally Date&Time values. The Date&Time format Assigns new names and types to selected by piling up cells in columns with the (with exercises & solutions to test your
Despite its simple name, it offers powerful used to produce a training and a test set to contained in the String values can be manually columns, as configured in the dialog. same name. Cells in not overlapping knowledge) at
functionality and has many unsuspected train and evaluate a machine learning model. defined or auto guessed. columns are filled with missing values. knime.com/knime-self-paced-courses
usages. For example - row deduplication.
• KNIME Community Hub: Browse and
Extends the aggregation functionality of the Joins rows from two data tables based on share workflows, nodes, and components.
Pivot Row Filter Cell Splitter Splits values in a selected column into two or Joiner Missing Value
GroupBy node by creating an output data Filters rows in or out from the input data common values in one or more key Defines a strategy to deal with missing Add ratings, or comments to other
more substrings, as defined by a delimiter
table with columns and rows for the unique table according to a filtering rule. The columns. The output - inner join, left outer values in the input data table - either workflows at hub.knime.com
match. Delimiter is a set character, such as a
values in selected input columns. Note: the filtering rule can match a value in a selected join, right outer join, full outer join, or the globally on all columns, or individually for
comma, space, or any other character or
unique values of the grouping column column or numbers in a numerical range. respective antijoins - can be split into each single column. • KNIME Forum: Join our global community
character sequence.
become rows and the unique values of the multiple output tables. & engage in conversations at
pivoting column become columns. forum.knime.com
Performs operations on String values in
Rule Engine Math Formula Implements a number of math operations Column Filter Filters columns in or out from the input data Sorter String Manipulation
Sorts the table in ascending or descending columns, such as combining two or more • KNIME Business Hub : For team-based
Applies a set of rules to each row of the input across multiple input columns, from simple table according to a filtering rule. Columns to
order based on the values of a chosen Strings together, extracting one or more collaboration, automation, management, &
data table. All Rule Engine operators are also sum and average, to logarithms and be retained can be manually picked or selected
column. In addition, it is possible to sort substrings, trimming blank spaces, and so deployment check out KNIME Business
available in the Column Expressions node. exponentials. All Math Formula operators are according to their type, or of a regex
based on multiple columns. on. All operators are also available in the Hub at knime.com/knime-business-hub
also available in the Column Expressions node. expression matching their name.
Column Expressions node.
KNIME Press
Extend your KNIME knowledge with our collection of books from KNIME Press. For beginner and advanced users, through to those interested in specialty topics such as topic detection, data blending, and classic
solutions to common use cases using KNIME Analytics Platform - there’s something for everyone. Available for download at www.knime.com/knimepress.

Need help?
Contact us!

© 2023 KNIME AG. All rights reserved. The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany.

You might also like