You are on page 1of 2

Cheat Sheet: Data Wrangling with KNIME Analytics Platform

ACCESS DATA COMBINE DATA FILTER DATA WRITE DATA DATE&TIME


Concatenates the Row Filter
Reads a CSV file from either your local file Amazon S3 Connector Connects to Amazon S3 and points to a Filters rows in or out of the input table Writes the input table(s) to sheet(s) Parses the strings in the selected
CSV Reader Concatenate rows of all input String to Date&Time
system or another connected file system. working directory (with a UNIX-like syntax, according to a filtering rule. The filtering
Excel Writer
in an Excel file (XLS or XLSX). Click columns according to a date/time
tables by writing
Click the plus in the lower left corner to e.g., /mybucket/myfolder/myfile). Allows rule can match a value in a selected column the plus in the lower left corner to format and converts them into
them below each
add a dynamic connection input port to downstream reader nodes to access data or numbers in a numerical range. add a dynamic sheet input port to Date&Time cells. Four Date&Time
other. This is
connect to an external file system, like from Amazon S3 as a file system. write multiple data tables into forms are supported: only date, only
especially useful for Rule-based
Amazon S3, Azure Blob Storage, etc. Row Filter multiple sheets. time, date&time, and date&time plus
tables with shared Filters rows in or out according to a set of time zone.
column headers. rules, defined in its configuration window. Writes the input data table to a CSV
Excel Reader
Reads sheet(s) from one or more Common settings of Reader and Writer nodes
Rules are evaluated from top to bottom.
CSV Writer
file. Click the plus in the lower left Extracts rows where the time value in
Excel files. One sheet from each Joins the columns Date&Time-based
File path: All Reader and Writer nodes require a file path. Using TRUE as the antecedent applies the corner to add a dynamic
Row Filter the selected column lies within a
Excel file. A loop can be used to read of the two input Full outer join Inner join
The file path can be expressed as an absolute path in rule to all unmatched rows. connection input port to write to an given time window. The time window
multiple sheets from one Excel file. Joiner tables based on one
the local file system, a relative path to a key location in Left Right Left Right external file system, like Amazon is specified either by a start and /or
or multiple joining Table Table Table Table
Reference

Reads data from a .table file. The .table the current KNIME installation, or a path defined in an Row Filter
S3, Azure Blob Storage, etc. an end date or by a start date and a
columns. Allows Filters rows in or out from the top input
Table Reader
files are organized using a KNIME external file system if such a connection is used. duration.
you to select Right outer join Left outer join table according to matching values in the
Send to Tableau Server

proprietary format, including the full file Multiple files: Reader nodes can read and between different selected column of the lower input table. Uploads the input table to a Tableau Calculates the difference between two
structure, and are optimized for space and concatenate multiple files, according to a selected file joiner modes and to Left Right Left Right
server for reporting. date&time objects e.g., from two
Table Table Table Table Date&Time
speed - providing maximum performance extension or file name pattern. use multiple joining Difference
selected columns, from a selected
with minimum configuration. Filters columns in or out from the input
columns. column and a fixed value, from a
SAP Reader
(Theobald Software) Transformation tab: Reader nodes include a Column Filter table according to a filtering rule.
Value Lookup
Columns to be retained can be manually
Send to Power BI
selected column and the current
Loads data from various SAP systems Transformation tab for renaming, filtering, re-ordering, Replaces the values in one column of the table at execution time, or from one cell and
and type changing of the columns. picked or selected according to their type, Uploads the input table to Microsoft
(e.g. SAP S/4HANA, SAP BW, SAP R/3). the top input port with values from a look up table the cell in the previous row for a
or based on a regex expression matching Power BI for reporting.
provided at the bottom input port. selected column.
their name.
Extract Date&Time
Fields
DB Reader DB Writer
Extracts selected time and date fields
DB Connector
Connects to any JDBC-compliant Executes the input SQL query on DB Row Filter Expands the input SQL query to include the Inserts the data rows from the top from a selected column of type
DATABASES

database. The JDBC driver must be added the database and exports the Expands the input SQL query to include the join row filter criteria defined in the input port into a table in the database date&time and appends their values in
in the KNIME Preferences and then results into a KNIME data table. DB Joiner of two tables. It has a similar configuration configuration window. Grouping of multiple specified by the input connection new columns.
selected in the node configuration window. window as the joiner node. No SQL coding conditions with an AND or OR conjunctions port. If the database table does not
required. There are more DB nodes, all is also supported. No SQL coding required. DB Connection exist it will be created.
H2 Connector DB Table Selector Creates a SQL query to access the database expanding the input SQL query with additional DB Query Table Writer
Connects to an H2 database. Similar table selected in the configuration window. The SQL instructions. Besides the SQL Query node, Modifies the input SQL query using custom Writes the resulting rows from the CLEAN DATA
dedicated connector nodes connect to other table can be selected either via browsing the no DB nodes require SQL coding. SQL. The input SQL query is represented by input SQL query into a new table
databases, such as MySQL or PostgreSQL. database metadata or via a custom SQL query. the place holder #table#. inside the database. Missing Value
Defines and applies a strategy to
replace missing values in the input
table - either globally on all columns, or
Read individually for each single column.
RESHAPE AND AGGREGATE DATA DYNAMIC PORTS Duplicate
Row Filter Detects duplicate rows and applies the
Concatenate
selected operation, e.g. removes
Dynamic ports: Additional input ports can be duplicate rows. Duplicates are rows
GroupBy Groups the rows of a table by the Combine Filter Aggregate Write added by clicking the plus on the left side of that have the same value in all selected
unique values in selected the node. columns.
columns and calculates
Numeric Outliers
aggregation and statistical Detects and treats numerical outliers
measures for the defined groups. for each of the selected columns
Despite its simple name, it offers individually using the interquartile
Read
powerful functionality and has range (IQR).
many unsuspected usages. FORMAT EXCEL SHEETS
The Continental Nodes for KNIME extension allows you to automatically
Extends the aggregation
format an existing Excel sheet. The key is an additional data table of the
Pivot functionality of the GroupBy
Group Pivot same size as the original Excel sheet, where each cell contains one or
node by creating an output table
more comma separated tag values e.g., header, border, etc. Based on
with columns and rows for the
these tags, the XLS Formatter nodes add new formatting instructions to
unique values in the selected • E-Books: KNIME Advanced Luck
the existing instructions, as available at the lower (optional) input port.
input columns. The unique covers advanced features & more.
values of the grouping columns DATA TYPES & CONVERSIONS XLS Control
Table Generator Transforms the input table to an XLS Control Table, Practicing Data Science is a collection
become rows and the unique meaning it exchanges the column names to A, B, C, ... and of data science case studies from
values of the pivoting columns S String: Sequence of characters, e.g. "This is a string" Collection Cell: Collection of multiple values of either
the same or different types e.g., can be a list of values the row IDs to 1, 2, 3, ... It is the kickoff node to collect past projects. Both available at
become columns. I Integer: Whole real valued number, e.g. -100 or 345 knime.com/knimepress
or a set of values. In a set each value occurs only once. formatting instructions for an Excel sheet and feeds all
D Double: Real valued number, e.g. -0.432 or 45.39
XLS formatter nodes.
Category to Number Maps the categorical values in the Splits values in the selected column Date&Time: A data format for date, time, date&time, or Document/Image: KNIME Analytics Platform • KNIME Blog: Engaging topics,
Cell Splitter
selected columns to integer values into two or more substrings, as date&time plus time zone. supports many more data types like text documents, XLS Cell Merger
challenges, industry news, &
and exports the mapping rules to defined by a delimiter match. A B Boolean: Two possible values only, e.g. TRUE and FALSE images, fingerprints, etc. knowledge nuggets at
Adds formatting instructions to merge all cells with a
the model output port. The delimiter is a defined character, such knime.com/blog
Number to String String to Number specified tag in the XLS control table at the top input port.
Category to Number (Apply) and as a comma, space, or any other Converts the data type of the selected Converts the data type of the selected
Number to Category (Apply) nodes character or character sequence. 2 S columns from a number format, e.g. integer S 2 columns from string to either double or • E-Learning Courses: Take our free
apply the mapping rule in both Ungroup
or double, to string. integer. XLS Background
online self-paced courses to learn
directions. Ungroups a collection-type cell by Colorizer

creating one row for each value in the


Adds background color and/or pattern fill formatting about the different steps in a data
Creates one new column for each collection cell. Other columns from
instructions to all cells with a specified tag in the XLS science project (with exercises &
One to Many
value in the selected input the input table are left unaltered.
Control Table at the top input port. solutions to test your knowledge) at
column. These values become the CREATE COLUMNS knime.com/knime-self-paced-courses
column headers. Cells in the XLS Conditional
newly created columns are set to Unpivot
Stacks the cells of the selected value
Math Formula
String Manipulation
Formatter • KNIME Community Hub: Browse and
Implements a number of math operations Performs operations on string values in Adds formatting instructions to color cell backgrounds share workflows, nodes, and
0 if the value is not present, columns into one column. The cells across multiple input columns. The math columns, such as combining two or more according to their numeric value for all cells specified by components. Add ratings, or
otherwise 1. This type of encoding of the selected retained columns are operations can be applied to multiple columns strings together, extracting one or more a tag in the XLS control table at the top comments to other workflows at
is called one-hot vector. appended to the corresponding with the Math Formula (Multi Column) node. substrings, trimming blank spaces, and so on. hub.knime.com
Table Transposer
output rows.
Sorter XLS Border
Rule Engine
Converts the rows to columns and Applies a set of rules to each row of the
String Replacer
Formatter
Adds border formatting instructions for a given range • KNIME Forum: Join our global
the columns to rows. Sorts the table in ascending or Replaces values in a selected string column if community & engage in conversations
input table. Rules are applied from top to specified by a tag in the XLS control table at the top input
descending order based on the they match a defined pattern. at forum.knime.com
bottom. The first rule that matches is used. port.
values of one or more columns.
Table Manipulator Performs several transformations
Value Counter
• KNIME Business Hub : For
XLS Formatter
at once, such as renaming, filtering, Counter Generation
Column Expressions Combines the functionality of the Math (apply) team-based collaboration, automation,
Counts the number of occurrences of
re-ordering and type changing, on Creates a new column with a counter. The Formula, Rule Engine, and String management, & deployment check out
all values in a selected column from Applies all formatting instructions to an existing Excel
the input columns. By adding start value and step size are defined in the Manipulation nodes. More than one KNIME Business Hub at
the input table. configuration window. sheet.
dynamic ports it can replace a expression can be defined to modify or add knime.com/knime-business-hub
concatenate node. multiple columns at the same time.
KNIME Press
Extend your KNIME knowledge with our collection of books from KNIME Press. For beginner and advanced users, through to those interested in specialty topics such as topic detection, data blending, and classic
solutions to common use cases using KNIME Analytics Platform - there’s something for everyone. Available for download at www.knime.com/knimepress.

© 2023 KNIME AG. All rights reserved. The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany.

You might also like