Professional Documents
Culture Documents
Favorites
Icon Tool Description Example
Add one or more points in your data stream Allows users to get a look at their data anywhere in
to review and verify your data. the process
Browse
Query records based on an expression to split If you are looking at a dataset of customers, you
data into two streams, True (records that may want to only keep those customer records that
satisfy the expression) and False (those that have greater than 10 transactions or a certain level
Filter do not). of sales.
Create or update fields using one or more For instance if there is missing or NULL value, you
expressions to perform a broad variety of can use the formula tool to replace that NULL value
calculations and/or operations. with a zero.
Formula
Combines two inputs based on a commonality Can be used to join Customer profile data as well as
between the two tables. Its function is like a transactional data, and join the two data sources
SQL join but gives the option of creating 3 based on a unique customer ID
Join outputs resulting from the join.
Output the contents of a data stream to a file Loading enriched data back into a database.
or database.
Output
Limit the data stream to a number, Choosing the first 10 records for each region of
percentage, or random set of records. previously sorted data so that you end up with the
Sample top 10 stores in each region.
Select, deselect, reorder and rename fields, If your workflow only requires 5 fields out of 50 that
change field type or size, and assign a are read in from the file/database, you can deselect
Select description. all but the 5 required fields to speed up processing
downstream.
Sort records based on the values in one or Allows you to sort data records into
more fields. ascending/descending order- such as ranking your
Sort customers based on the amount of $$ they spend
Summarize data by grouping, summing, You could determine how many customers you have
counting, spatial processing, string in the state of NY and how much they have spent in
Summarize concatenation, and much more. The output total or an average per transaction.
contains only the results of the calculation(s).
Add annotation or images to the module This will allow uses to document what they did
canvas to capture notes or explain processes during a certain portion of the analysis, so other
Comment for later reference. users have an understanding of what they were
building in the workflow.
These tools are new in: 9.0 9.1 9.5 10.0
Manually add data which will be stored in the A lookup table where you are looking for certain
module. words or codes to be replaced with new
Text Input classifications. You could create a Find field and a
Replace field to populate with the values needed.
Combine two or more data streams with Transaction data stored in different files for different
similar structures based on field names or time periods, such as a sales data file for March and
Union positions. In the output, each column will a separate one for April, can be combined into one
contain the data from each input. data stream for further processing.
Input/Output
Icon Tool Description Example
Add one or more points in your data stream to Allows users to get a look at their data anywhere in
review and verify your data. the process
Browse
Input the current date and time at module This is a useful tool to easily add a date time header
runtime, in a format of the user's choosing. for a report
Date Time
(Useful for adding a date-time header to a
Now
report.)
Input a list of file names and attributes from a Lists all files in a directory- can be used in
specified directory. conjunction with the Dynamic Input tool to bring in
Directory the most recent data file that is available
Bring data into your module by selecting a file Connect to disparate datasets.
or connecting to a database (optionally, using
a query).
Input
Manually draw or select map objects (points, Output results to google maps, and allow interaction
lines, and polygons) to be stored in the with the results for example. Not really! This tool is
Map Input module. only so you can pick a spatial object, either by
drawing or selecting one, to use in your module
(app).
Output the contents of a data stream to a file Output to data- anywhere we can read from, we can
or database. output to – no – we have some formats we can read
Output only. Example- loading enriched data back into a
database
Manually add data which will be stored in the Manually store data or values inside the Alteryx
module. module- example- a lookup table where you are
values of segmentation groups and you want the
Text Input
description name
This tool enables access to an XDF format file This can be used when building and running
(the format used by Revolution R Enterprise's predictive analytics procedures on large amounts of
RevoScaleR system to scale predictive data that open source R has difficulty computing -
analytics to millions of records) for either: (1) (specifically Linear Regression, Logistic Regression,
XDF Input
using the XDF file as input to a predictive Decision Trees, Random Forests, Scoring, Lift Chart)
analytics tool or (2) reading the file into an
Alteryx data stream for further data hygiene
or blending activities
This tool reads an Alteryx data stream into an This can be used when building and running
XDF format file, the file format used by predictive analytics procedures on large amounts of
Revolution R Enterprise's RevoScaleR system data that open source R has difficulty computing -
to scale predictive analytics to millions of (specifically Linear Regression, Logistic Regression,
XDF
records. By default, the new XDF files is Decision Trees, Random Forests, Scoring, Lift Chart)
Output
stored as a temporary file, with the option of
writing it to disk as a permanent file, which
can be accessed in Alteryx using the XDF
Input tool
Preparation 17 tools
Icon Tool Description Example
These tools are new in: 9.0 9.1 9.5 10.0
Automatically set the field type for each string Trying to identify the best fit field for text based
field to the smallest possible size and type inputs. Make the data streaming into Alteryx as
that will accommodate the data in each small as possible to limit processing time and ensure
Auto Field
column. proper formats for downstream processes.
Query records based on an expression to split Allows users to exclude values – all fields come
data into two streams, True (records that through! in the stream- for instance if you are
satisfy the expression) and False (those that looking at a dataset of customers, you may want to
Filter do not). eliminate certain characteristics of that customer
such as race or sex
The Date Filter macro is designed to allow a Return transaction records by specifying a start and
user to easily filter data based on a date end date.
Date Filter criteria using a calendar based interface.
Create or update fields using one or more For instance if there is missing or NULL value, you
expressions to perform a broad variety of can use the formula tool to replace that NULL value
Formula calculations and/or operations. with a zero
Create new rows of data. Useful for creating a Creating data specifically time series data, Create
sequence of numbers, transactions, or dates. 365 unique records for each day of year.
Generate
Rows
Update specific values in a numeric data field For example, if you have a data set that is missing
with another selected value. Useful for information, such as salary, and displays (NULL)
Impute
replacing NULL () values. rather than just making it zero, you can use the
Values
mean or median to fill in the NULL, to improve
accuracy of the results.
Group multiple numeric fields into tiles or For instance if you have transactional data, you can
bins, especially for use in predictive analysis. group them into different buyer personas- i.e. Males
Multi-Field
between 30-35 that spend between $1k> per
Binning
month, etc…
Create or update multiple fields using a single For instance if there is missing or NULL value on
expression to perform a broad variety of multiple fields, you can use the formula tool to
Multi-Field
calculations and/or operations. replace that NULL value with a zero
Formula
Create or update a single field using an Creating unique identifiers at a group level- cross
expression that can reference fields in row comparisons- Sales volume for yr1, yr2, yr3 in
Multi-Row subsequent and/or prior rows to perform a different rows and want to notice the difference
Formula broad variety of calculations and/or between the sales in each of these rows
operations. Useful for parsing complex data
and creating running totals.
Generate a random number or percentage of If you want to base your analysis based on 35% of
records passing through the data stream. the data for instance, it will randomly return records
Random %
Sample
Assign a unique identifier to each record. This can be used to assign a customer id to a legacy
transaction, allowing for more accurate direct
Record ID marketing/promotional offerings in the future
Limit the data stream to a number, Allows you to select a subset of data/records for
percentage, or random set of records. your analysis- Can be used to focus on a select
Sample group of related records or transactions for analysis-
such as selecting all items in an online shopping cart
Select, deselect, reorder and rename fields, Allows you to determine a specific subset of records
change field type or size, and assign a should or should not be carried down throughout the
Select description. analysis- for instance if we are looking at customer
transactional data and we want to eliminate all
transactions that are less than $5K)
These tools are new in: 9.0 9.1 9.5 10.0
Select specific records and/or ranges of If a user wants to find records that are less than
records including discontinuous ranges. Useful $100 or in a range of $100-$150 it will return
Select
for troubleshooting and sampling. records in this range
Records
Sort records based on the values in one or Allows you to sort data records into
more fields. ascending/descending order- such as locating your
top 1000 customers based on the amount of $$ they
Sort spend
Group data into sets (tiles) based on value Creating logical groups of your data. User defined
ranges in a field. breaks or statistical breaks. Very good for
Tile bucketing high valued customers vs. low valued
customers
Separate data into two streams, duplicate and Only want to mail to one individual, and based on a
unique records, based on the fields of the unique identifier(customer id)
Unique user's choosing.
Join 9 Tools
Icon Tool Description Example
Append the fields from a source input to every Adding small value to a million records. A small to
record of a target input. Each record of the big merge- Adding time stamps as well as the name
Append target input will be duplicated for every record of the users who last accessed it onto your database
Field in the source input. records.
Search for data in one field from one data Think of this like Excel- find and replace. Looking
stream and replace it with a specified field for something and then replacing.
Find
from a different stream. Similar to an Excel
Replace
VLOOKUP.
Combine two data streams based on common For instance this can be used to join Customer
fields (or record position). In the joined profile data as well as transactional data, and join
Join output, each row will contain the data from the two data sources based on a unique customer ID
both inputs.
Combine two or more inputs based on For instance this can be used to join Customer
common fields (or record position). In the profile data as well as transactional data, and join
Join
joined output, each row will contain the data two or more data sources based on a unique
Multiple
from both inputs. customer ID
The Make Group tool takes data relationships Used primarily with Fuzzy Matching- ID 1 can match
and assembles the data into groups based on 10 different values from source 2 and that becomes
Make
those relationships. a group.
Group
Identify non-identical duplicates in a data Helps determine similarities in your data. For
stream. instance if you have 2 different data sets with
Fuzzy different ID, looking a names and address as a way
Match to standard and matching them up based on these
types of characteristics- and displays all of the id’s
that match
Match your customer or prospect file to the Matching a business listing file to Dun and
Dun &
Dun & Bradstreet business file. (Requires, Bradstreet.
Bradstreet
Alteryx with Data Package and installation of
Business
the Dun & Bradstreet business location file.)
File
Matching
Combine two or more data streams with Can be used to combine datasets with similar
similar structures based on field names or structures, but with different data. You might have
positions. In the output, each column will transaction data stored in different files for different
contain the data from each input. time periods, such as a sales data file for March and
Union
a separate one for April. Assuming that they have
the same structure (the same fields), Union will join
them together into one large file, which you can
then analyze
These tools are new in: 9.0 9.1 9.5 10.0
Parse 4 Tools
Icon Tool Description Example
Transform date/time data to and from a Easy conversion between strings and actual date
variety of formats, including both expression- time formats- Example- taking Military time into
Date Time friendly and human readable formats. standard times. Or turning Jan 1, 2012 into 1.1.12,
etc.
Parse, match, or replace data using regular An example would be if someone is trying to parse
expression syntax. unstructured text based files- Weblogs or data feeds
RegEx from twitter, helps arrange the data for analytical
purposes into rows and columns.
Split the text from one field into separate Allows you bring customer data from an excel file for
rows or columns. instance that contains first name and last name in
Text to
one column and split them up into 2 columns so first
Columns
name is in one column and last name is in the other
column- this will make it easy to sort and analyze
Read in XML snippets and parse them into Cleaning an xml file, parse xml text
individual fields.
XML Parse
Transform 7 Tools
Icon Tool Description Example
Manually transpose and rearrange fields for Used for staging data for reports
presentation purposes.
Arrange
Count the records passing through the data Returns a count of how many records are going
stream. A count of zero is returned if no through the tool
Count
records pass through.
Records
Pivot the orientation of the data stream so Think of it as a way to change your excel
that vertical fields are on the horizontal axis, spreadsheet that has a column of customer ID’s and
Cross Tab
summarized where specified. then next to a column of revenue. This will turn
these two columns into two rows
Calculate a cumulative sum per record in a Can take 3 columns of sales totals and summarize 3
data stream. yr totals of sales by that row. ( ie yr 1 sales 10K, yr
Running
2 15K, yr 3 25K)
Total
Summarize data by grouping, summing, For instance if you wanted to look at a certain group
Summariz counting, spatial processing, string of customers of a certain age or income level, or get
e concatenation, and much more. The output an idea of how many customers you have in the
contains only the results of the calculation(s). state of NY
Pivot the orientation of the data stream so Think of it as a way to change your excel
that horizontal fields are on the vertical axis. spreadsheet that has a row of customer ID’s and
Transpose then below that a row of revenue. This will turn
these two rows into two columns
Calculate the weighted average of a set of So if you are looking at calculating average spend,
values where some records are configured to this will determine and “weight” if there are certain
Weighted
contribute more than others. customers spending levels that are contributing to
Average
the average.
Report/Presentation
Icon Tool Description Example
Create a chart (Area, Column, Bar, Line, Pie, Create bar, line, pie charts
etc.) for output via the Render tool.
Charting
These tools are new in: 9.0 9.1 9.5 10.0
Send emails for each record with Allows you to create dynamically updated email
attachments or e-mail generated reports if content
Email desired.
Add an image for output via the Render tool. Add graphics/image that will be included in report
Image
Arrange two or more reporting snippets How to arrange the pieces of your report
horizontally or vertically for output via the
Layout Render tool.
Create a map for output via the Render tool. Create a map for a report
Report Map
Recombine the component parts of a map Takes a customized legend and reassembles it.
legend (created using the Map Legend
Map Legend
Splitter) into a single legend table, after
Builder
customization by other tools.
Split the legend from the Report Map tool Help customize legends by adding symbols such as
into its component parts for customization by $ or % for instance or removing redundant text
Map Legend
other tools. (Generally recombined by the
Splitter
Map Legend Builder.)
Arrange reporting snippets on top of one Allows you to specify how to put a map together-
another for output via the Render tool. example, putting a legend inside a map – I’d
Overlay suggest a different example such as…overlaying a
table and chart onto a map
Output report snippets into presentation- Saves reports out of Alteryx
quality reports in a variety of formats,
Render including PDF, HTML, XLSX and DOCX.
Add a footer to a report for output via the Apply a footer to the report
Report Render tool.
Footer
Add a header to a report for output via the Apply a header to the report
Report Render tool.
Header
Create a data table for output via the Render Creates table for selected data fields
tool.
Table
Add and customize text for output via the Allows you to customize a title or other text related
Render tool. aspects to your report
Report Text
Documentation
Icon Tool Description Example
Add annotation or images to the module This will allow users to document what they did
canvas to capture notes or explain processes during a certain portion of the analysis, so other
Comment for later reference. users have an understanding
Add a web page or Windows Explorer window Helps you organize URL embedded into the canvas –
Explorer to your canvas. I suggest a different example: Display a web page
Box for reference in the module or use it to show a
shared directory of macros
These tools are new in: 9.0 9.1 9.5 10.0
Organize tools into a single box which can be Helps you organize your module.
Tool collapsed or disabled.
Container
Spatial 15 Tools
Icon Tool Description Example
Expand or contract the extents of a spatial Identify all of the business on a road, by placing a
object (typically a polygon). buffer on that road to determine who they
Buffer are/where they are
Create spatial points in the data stream Finding a spatial ref to a longitude, latitude
using numeric coordinate fields.
Create
Points
Calculate the distance or drive time between Creating the drive distance or drive time to a
a point and another point, line, or polygon. customer location
Distance
Identify the closest points or polygons in one As a customer, find me the nearest location to visit,
Find file to the points in a second file. optimizing my driving route
Nearest
Create drive time trade areas that do not Create drive time trade areas that do not overlap,
Non overlap for a point file. for a point file
Overlap
Drivetime
Create a polygon or polyline from sets of Build a trade area- build an object of where all of my
points. customers are coming from. Building a polygon to
Poly-Build fit a series of points
Split a polygon or polyline into its component Break a polygon into a sequential set of points.
polygons, lines, or points.
Poly-Split
Round off sharp angles of a polygon or Crisp objects rendered on a map. (coastal view
polyline by adding nodes along its lines. more detailed)
Smooth
Extract information about a spatial object, Getting the Lat/Lon of a point. Maybe the area
such as area, centroid, bounding rectangle, square miles of a cover area for Telco/wireless
Spatial Info etc.
Combine two data streams based on the Finding all customers that fall within a defined trade
Spatial relationship between two sets of spatial area, based on their geographic proximity
Match objects to determine if the objects intersect,
contain or touch one another.
Create a new spatial object from the Want to remove overlap from intersecting trade
Spatial combination or intersection of two spatial areas.
Process objects.
These tools are new in: 9.0 9.1 9.5 10.0
Define radii (including non-overlapping) or Defining boundaries for where your customers or
drive-time polygons around specified points. prospects are coming from
Trade Area
Produce a frequency analysis for selected For example, What is the distribution of a company’s
fields - output includes a summary of the customers by income level? From the output, you
Frequency
selected field(s) with frequency counts and might learn that 35% of your customers are in high
Table
percentages for each value in a field. income, 30% are in middle high, 25% are in middle
low, and 10% are in low.
Provides a histogram plot for a numeric Provides a visual summary of the distribution of
field. Optionally, it provides a smoothed values based on the frequency of intervals. For
empirical density plot. Frequencies are example, the US Census using their data on the
displayed when a density plot is not time occupied by travel to work, Table 2 below
Histogram selected, and probabilities when this option shows the absolute number of people who
is selected. The number of breaks can be set responded with travel times "at least 15 but less
by the user, or determined automatically than 20 minutes" is higher than the numbers for the
using the method of Sturges. categories above and below it. This is likely due to
people rounding their reported journey time.
This tools plots the empirical bivariate Could be used for reports to visualize contingency
density of two numeric fields using colors to tables for media usage. For example, a survey of
indicate variations in the density of the data how important Internet reviews were in making a
for different levels of the two fields purchase decision (on a 1 to 10 scale), and when
the customer searched for this information (done in
Heat Plot time categories from six or more from the time of
purchase to within one hour of the point of
purchase). The heat plot allows user to see that
those who looked at internet reviews five to six
weeks from the point of purchase were most
influenced by those reviews.
Sample incoming data so that there is equal In many applications, the behavior of interest (e.g.,
representation of data values to enable responding favorably to a promotion offer) is a fairly
effective use in a predictive model. rare event (untargeted or poorly targeted direct
marketing campaigns often have favorable response
rates that are below 2%). Building predictive models
directly with this sort of rare event data is a problem
since models that predict that no one will respond
favorably to a promotion offer will be correct in the
Oversample
vast majority of cases. To prevent this from
Field
happening, it is common practice to oversample the
favorable responses so that there is a higher penalty
for placing all customers into a non-responder
category. This is done by taking all the favorable
responders and a sample of the non-responders to
get the total percentage of favorable responders up
to a user specified percentage (often 50% of the
sample used to create a model).
Replaces the Pearson Correlation Coefficient For example age and income are related. So as age
in previous versions… increases so will income.
Pearson
Correlation The Pearson coefficient is obtained by
dividing the covariance of the two variables
by the product of their standard deviations
Take a numeric or binary categorical The best use of this tool is for gaining a basic
(converted into a set of zero and one understanding of the nature of the relationship
values) field as a response field along with a between a categorical variable and numeric variable.
Plot of
categorical field and plot the mean of the For instance, it allows us to visually examine
Means
response field for each of the categories whether customers in different regions of the
(levels) of the categorical field. country spend more or less on women’s apparel in a
year.
Produce enhanced scatterplots, with options Shows the relationship between two numeric
to include boxplots in the margins, a linear variables or a numeric variable and a binary
regression line, a smooth curve via non- categorical variable (e.g., Yes/No). In addition to the
parametric regression, a smoothed points themselves, the tool also produces lines that
conditional spread, outlier identification, and show the trends in the relationships. For instance, it
Scatterplot a regression line. The smooth curve can may show us that household spending on restaurant
expose the relationship between two meals increases with household income, but the rate
variables relative to a traditional scatter of increase slows (i.e., shows “diminishing returns”)
plot, particularly in cases with many as the level of household income increases.
observations or a high level of dispersion in
the data.
Assesses how well an arbitrary monotonic For example, is there a correlation between income
Spearman function could describe the relationship level and their level of education
Correlation between two variables without making any
Coefficient other assumptions about the particular
These tools are new in: 9.0 9.1 9.5 10.0
The Control Select tool matches one to ten AB Controls takes the two things like (seasonality,
control units (e.g., stores, customers, etc.) growth, etc.) and for the treatment stores,
to each member of a set of previously compares them versus the control candidates (the
selected test units, on the criteria such as other stores in the chain) that are nearest to those
seasonal patterns and growth trends for a stores (seasonality, growth) within a given drive
key performance indicator, along with other time. – So compare low-growth store to other low-
AB Controls user provided criteria. growth stores within X distance.
The goal is to find the best set of control units
(those units that did not receive the test treatment,
but are very similar to a unit that did on important
criteria) for the purposes of doing the best
comparison possible.
Determine which group is the best fit for AB For example choosing which DMA that you would
testing. want to compare using up to 5 criteria that you want
to use at the treatment observation level (minimal
AB criteria, as well as the spread between criteria
Treatments between DMAs and within DMAs based on the
criteria that you want). Absolute comparison would
be against the average for the entire chain /
customer set.
Create measures of trend and seasonal For example, it gives users the ability to cluster
patterns that can be used in helping to treatment observations units based on underlying
match treatment to control units (e.g., trends over the course of a year (general growth
stores or customers) for A/B testing. The rate month-to-month = high-growth, low-growth,
trend measure is based on period to period medium growth), or specific seasonality patterns
percentage changes in the rolling average (climate zones, ) based on your measures (like
(taken over a one year period) in a traffic, sales volumes) and frequency (daily, weekly,
AB Trends
performance measure of interest. The same monthly) and over what period of time. Can do day
measure is used to assess seasonal effects. specific data (Mondays versus Tuesdays versus
In particular, the percentage of the total etcetera).
level of the measure in each reporting period
is used to assess seasonal patterns.
Predict a target variable using one or more A decision tree creates a set of if-then rules for
predictor variables that are expected to have classifying records (e.g., customers, prospect, etc.)
an influence on the target variable by into groups based on the target field (the field we
constructing a set of if-then split rules that want to predict). For instance, in the case of
optimize a criteria. If the target variable evaluating a credit union’s applicants for personal
identifies membership in one of a set of loans, the credit union can use the method with data
Decision categories, a classification tree is constructed on past loans it has issued and find that customers
Tree (based on Gini coefficient) to maximize the who had:
'purity' at each split. If the target variable is (a) an average monthly checking balances of
a continuous variable, a regression tree is over $1,500
constructed using the split criteria of (b) no outstanding personal loans
'minimize the sum of the squared errors' at (c) were between the ages of 50 and 59
each split. had a default rate of less than 0.3%, and therefore
should have their loan applications approved.
Predict a target variable using one or more The forest model is actually a group of decision tree
predictor variables that are expected to have models (the group is typically called an “ensemble”),
an influence on the target variable, by where the trees within the group differ with respect
constructing and combining a set of decision to the set of records used to create the decision
Forest tree models (an "ensemble" of decision tree trees and the set of variables that were considered
Model models). in creating the if-then rules in the tree. Prediction in
this model are based on the predictions of all the
models, with the final prediction on being the
outcome the greatest number of models in the group
predicted (this is known as a “voting rule”).
based on the R and Revo generalized linear It is a more appropriate model to use than linear
model, called the Gamma Regression, which regression when the target is strictly positive and
is based on an underlying Gamma has a long right-hand tail (so a small number of
Gamma distribution) that handles strictly positive customers with high values of the target, and lots
Regression target variables that have a long right tail with relatively low values). It is widely used in the
(so most values are relatively small, and insurance industry to model claim amounts (where
there is a long right-hand tail to the most claims are fairly small, but some can be very
distribution) large).
Compare the improvement (or lift) that A cumulative captured response lift chart can tell us
various models provide to each other as well that the best 10% of customers (based on a
as a ‘random guess’ to help determine which predictive model) in our customer list for a direct
model is ‘best.’ Produce a cumulative marketing campaign resulted in 40% of the total
captured response chart (also called a gains favorable response we would get if all of the
chart) or an incremental response rate chart. members of our customer list were contacted as part
of the campaign. In addition, an incremental
response rate chart can tell us that the second best
Lift Chart 10% of our customer list based on a predictive
model (so customers that were ranked just under
the best 10% to the last customer ranked in the top
20%) have a favorable response rate of 8%
compared to the overall response rate of 2%.
Step 1 of a Market Basket Analysis: Take For example the Market basket rule can state that is
transaction oriented data and create either a someone purchases Beer they are most likely to also
Market set of association rules or frequent item sets. purchase pizza as well or if they purchase Fish, they
Basket A summary report of both the transaction are most likely to purchase white wine at the same
Rules data and the rules/item sets is produced, time.
along with a model object that can be further
investigated in an MB Inspect tool.
Step 2 of a Market Basket Analysis: Take the A tool for inspecting and analyzing association rules
output of the MB Rules tool, and provide a and frequent items sets. So do the Beer and Pizza
Market
listing and analysis of those rules that can be MB Rules really fit?
Basket
filtered on several criteria in order to reduce Filter association rules and review them
Inspect
the number or returned rules or item sets to graphically (two different visuals)
a manageable number.
These tools are new in: 9.0 9.1 9.5 10.0
Used to construct a matrix of affinity For instance what is the likelihood that a person who
measures between different items with purchases steak will also purchase potatoes, bread
respect to their likelihood of being part of the and wine.
MB Affinity
same action or transaction.
Examine whether two models, one of which This tool allows the user to determine whether one
contains a subset of the variables contained or more fields help to predict the target field for a
in the other, are statistically equivalent in regression based model in the presence of other
terms of their predictive capability. predictor fields. By using this tool, a user can find a
Nested Test
statistically meaningful set of predictors for the
target field in a very selective way, as opposed to
the automated method that the Regression –
Stepwise tool represents.
Creates an interactive visualization of a Use this tool to explore relationships between
network along with summary statistics and various entities and their closeness The tool provides
Network distribution of node centrality measures. a visual representation of the network as well as key
Analysis summary statistics that characterize the network.
Relate a variable of interest (target variable) For instance what is the number of times someone
to one or more variables (predictor shops at a store related to their income level
Linear variables) that are expected to have an
Regression influence on the target variable. (Also known
as a linear model or a least-squares
regression.)
Uses the database’s native language (e.g., For instance the number of phone calls or texts a
R) to create an expression to relate a person makes related to their age.
variable of interest (target variable) to one
or more variables (predictor variables) that
are expected to have an influence on the
Linear target variable. (Also known as a linear
Regression model or a least-squares regression.)
In-DB Accessible via the regular predictive tool
palette and will automatically convert to the
In-DB version of the tool if an In-DB
connection exists.
This tool implements Friedman’s multivariate Similar to Forest Model or Boosted Model it provides
adaptive regression spline (MARS) model. a visual output that enables an understanding of
This is in the more modern class of models both the relative importance of different predictor
(like the Forest and Boosted Models) that fields on the target, and the nature of the
handles both variable selection and non- relationship between the target field and each of the
linear relationships directly with the important predictor fields. Such as most important
Spline algorithm. In some ways it is similar to a variables related to churn or which variables to focus
Model decision tree, but instead of making discrete on in a targeted campaign.
jumps at “splits”, the splits (called “knots” in
this method) place in a “hinge”, where the
slope of the effect of a predictor on a target
changes, resulting in the effect of numeric
predictors being modeled as piecewise linear
components.
Determine the "best" predictor variables to This tool takes a linear or logistic regression model
include in a model out of a larger set of (likely one that includes a large number of predictor
potential predictor variables for linear, fields) and finds the subset of the predictors that
logistic, and other traditional regression provides the best adjusted (for the number of model
Stepwise
models. The Alteryx R-based stepwise parameters) model fit. In general, a model with
regression tool makes use of both backward fewer predictors is better at predicting new data
variable selection and mixed backward and than one with more predictors since it is less prone
forward variable selection. to over fitting the estimation sample.
Calculate a predicted value for the target This tool takes a model and provides predicted
variable in the model. This is done by model values for the target variable in new data.
appending a ‘Score’ field to each record in This is what actually enables a predictive model to
Score the output of the data stream, based on the be incorporated into a business process.
inputs: an R model object (produced by the
Logistic Regression, Decision Tree, Forest
Model, or Linear Regression) and a data
These tools are new in: 9.0 9.1 9.5 10.0
percentage confidence levels. For each among the last items placed into a new home. This
confidence level, the expected probability tools allows for creation of forecasts from ARIMA
that the true value will fall within the models that include covariates.
provided bounds corresponds to the
confidence level percentage. In addition to
the model, the covariate values for the
forecast horizon must also be provided.
Provide forecasts from either an ARIMA or This tool can be used for inventory management. For
ETS model for a specific number of future instance based on past history and inventory levels
periods. to help predict what your inventory levels should be
TS Forecast
in the next 3 months. The forecasts are carried out
using models created using either the TS ARIMA or
TS ETS tools.
Create a number of different univariate time This tool allows a number of different, commonly
series plots, to aid in the understanding the used time series plots to be created. One particularly
time series data and determine how to interesting plot is the Time Series Decomposition
develop a forecasting model. plot breaks a time series into longer term trend,
TS Plot seasonal, and error components. This can be
particularly useful in spotting changes in underling
trend (say the use of a particular cell tower) in what
is otherwise “noisy” data due to time of day and day
of week effects.
Predictive Grouping 5 Tools
Icon Tool Description Example
Appends the cluster assignments from a K- After you create clusters using a cluster analysis tool
Centroids Cluster Analysis tool to a data you can then append the cluster assignments both
Append stream containing the set of fields (with the to the database used to create the clusters as well
Cluster same names, but not necessarily the same as to new data not used in creating the set of
values) used to create the original cluster clusters.
solution.
Partition records into “K groups” around Common applications of this method are to create
centroids by assigning cluster memberships, customer segments based on past purchasing
using K-Means, K-Medians, or Neural Gas behavior as a way of simplify the task of developing
clustering. specialized marketing program creation (a firm can
hope to develop six specialized marketing programs,
K-Centroids
but not a specialized program of each of 30,000
Analysis
customers) and the creation of store groups by
retailers based on store and trade area
characteristics (ranging from demographics to
weather patterns) for implementing merchandising,
promotion, inventory, and pricing programs.
Assess the appropriate number of clusters to This tool is designed to help the user determine the
specify, given the data and the selected “right” number of clusters to use in creating
Predictive Grouping algorithm (K-Means, K- segment or groups To do this, information on the
Medians, or Neural Gas.) stability of the clusters (i.e., whether small changes
in the underlying data result in the creation of
similar or very different clusters) and the ratio of
cluster separation (how far apart the cluster are to
one another) to cluster compactness (how close the
K-Centeroids
members of a cluster are to one another) are
Diagnostics
provided to the user for different clustering solutions
(i.e., across different possible number of clusters
and methods). By comparing these measures across
different clustering solutions, the user can make an
informed judgment about the number of segment or
groups to create. Ideally, the resulting groups
should have a high level of stability and a high ratio
of separation to compactness.
Find the selected number of nearest Customers that have spent various amounts of
neighbors in the "data" stream that money on your product, and categorize them into
corresponds to each record in the "query" “big spender”, “medium spender”, “small spender”,
stream based on their Euclidean distance. and “will never buy anything” categories. K-Nearest
K-Nearest
Neighbor can be used to help determine what
Neighbor
category to put a new user before they’ve bought
anything, based on what you know about them
when they arrive, namely their attributes and how
people with similar attributes are categorized.
These tools are new in: 9.0 9.1 9.5 10.0
Reduce the dimensions (number of numeric Groups fields together by examining how a set of
fields) in a database by transforming the variables correlate or relate with one another- such
original set of fields into a smaller set that as household income level and educational
accounts for most of the variance (i.e., attainment for people living in different geographic
information) in the data. The new fields are areas. For a particular area we may have the
called factors, or principal components. percentage of people who fall into each of eight
different educational groups and the percentage of
households that fall into nine different income
Principal groups. It turns out that household income and
Components educational attainments are highly related
(correlated) with one another. Taking advantage of
this, a principal components analysis may allow a
user to reduce the 17 different education and
income groups into one or two composite fields that
are created by the analysis. These two composite
fields would capture nearly all the information
contained in the original 17 fields, greatly simplifying
downstream analyses (such as a cluster analysis.)
Connectors
Icon Tool Description Example
Read CSV, DBF and YXDB files from Amazon Get data that is stored in Amazon S3- useful for the
Amazon S3 S3. Analytics Gallery- b/c it is hosted in Amazon.
Download
Write CSV, DBF and YXDB files to Amazon Put data in Amazon S3.
S3.
Amazon S3
Upload
Retrieve data from a specified URL, including Pull data off the web. Download competitors store
an FTP site, for use in a data stream. listings from their website.
Download
Search Foursquare Venues by a location with A company can analyze location data of where users
an option of filtering by a search term. are checking in.
Foursquare
Bring in data from Google Analytics. A company can combine google analytics data with
Google other sources of data.
Analytics
The Marketo Input Tool reads Marketo Allows users to access data directly from Marketo.
records for a specified date range. Two types
of Marketo records can be retrieved:
1. LeadRecord: These are lead
records and there will be one record
for each lead.
Marketo 2. ChangeRecord: These records
Input track the activities for each lead.
There are potentially many
ChangeRecord Records for each
LeadRecord
The Input Tool retrieves records in batches
of 1000 records, where the Marketo Append
tool makes an API request for each record.
The Marketo Append tool retrieves Marketo Appends records accessed from Marketo.
records and appends them to the records of
an incoming data stream. Two types of
Marketo records can be retrieved:
1. LeadRecord: These are lead
Marketo
records and there will be one record
Append
for each lead.
2. ActivityRecord: These records
track the activities for each lead.
There can be many ActivityRecord
records for each LeadRecord.
These tools are new in: 9.0 9.1 9.5 10.0
The Marketo Output tool calls to the Marketo Allows users to write data back into Marketo.
API function: syncLead(). Data is written
back to Marketo using an 'Upsert' operation.
Marketo
This means if a record doesn't currently
Output
exist, it will be created (see 'Inserts' below).
If a record currently exists, it will be
updated.
Write data to a MongoDB database. Allows users to writing data back into MongDB
Mongo DB MongoDB is a scalable, high-
Output performance, open source NoSQL database.
Read and query data from a MongoDB Allows users to read data that is stored in MongoDB-
MongoDB database. MongoDB is a scalable, high- can be used to enable Big Data analytics such as
Input performance, open source NoSQL database. Ecommerce transaction analysis.
Read and query data from Salesforce.com. Pull data from SFDC
Search tweets of the last 7 days by given A company can analyze what people are saying
Twitter search terms with location and user around their company over that last week.
Search relationship as optional properties.
Address
Icon Tool Description Example
Standardize address data to conform to the This can be used to help in Marketing optimization
U.S. Postal Service CASS (Coding Accuracy by improving on the accuracy of the address and
Support System) or Canadian SOA customer information – can improve geocoding and
CASS
(Statement of Accuracy). fuzzy matching by standardizing address data –
CASS Certification is required by USPS for bulk mail
discounts.
Determine the coordinates (Latitude and Could be used for processing customer files to
Longitude) of an address and attach a identify clustering of customers.
Canada
corresponding spatial object to your data Geocoding – assigning an address to a physical
Geocoder
stream. Uses multiple tools to produce the location is the first crucial step to any spatial
most accurate answer. analysis.
Determine the coordinates (Latitude and Could be used for processing customer files to
Longitude) of an address and attach a identify clustering of customers.
US corresponding spatial object to your data Geocoding – assigning an address to a physical
Geocoder stream. Uses multiple tools to produce the location is the first crucial step to any spatial
most accurate answer. analysis.
These tools are new in: 9.0 9.1 9.5 10.0
Parse a single address field into different Take an address field and break it into different
fields for each component part such as: pieces. Example, address in a single field/column-
Parse
number, street, city, ZIP. Consider using the city, zip, street.
Address
CASS tool for better accuracy.
Determine the coordinates (Latitude and Find the latitude, longitude of a customer
Longitude) of an address and attach a
corresponding spatial object to your data Could be used for processing customer files to
stream. Consider using the U.S. Geocoder or identify clustering of customers.
Street
Canadian Geocoder macros for better Geocoding – assigning an address to a physical
Geocoder
accuracy. location is the first crucial step to any spatial
analysis.
Determine the coordinates (Latitude and Find the latitude longitude of a ZIP+4.
Longitude) of a 5, 7, or 9 digit ZIP code.
US ZIP9
Coder
Demographic Analysis
Icon Tool Description Example
Append demographic variables to your data Append demographics to a trade area- example-
stream from the installed dataset(s). what is the population within 10 mins of my store.
Allocate
Append
Create pre-formatted reports associated with Pre-built demographic reports- example- a Store
Allocate data from the installed dataset(s). trade area returns a formatted demographic report.
Allocate
Report
Input behavior cluster names, IDs and other Generates a list of all of the clusters in the
meta info from an installed dataset. segmentation system.
Behavior
Metainfo
Append a behavior cluster code to each Example- It appends a lifestyle segments to records
Cluster record in the incoming stream. based on a geo-demographic code (ie. block group
Code code.)
Compare two behavior profile sets to output Calculate correlations and market potential between
a variety of measures such as market two profiles- Market potential for new customers in a
Compare potential index, penetration, etc. new market that fit the same profiles of current
Behavior customers.
These tools are new in: 9.0 9.1 9.5 10.0
Create behavior profiles from cluster Example- it can show the distribution of customers
information in an incoming data stream. across the clusters in the segmentation.
Create
Profile
Generate a comparison report from two Shows market potential towards the likelihood of
behavior profile sets for output via the certain purchasing behaviors at the segment level.
Report Render tool.
Comparison
Generate a detailed report from a behavior Generates a report of distributions and penetration
profile set for output via the Render tool. of customers by segment.
Report
Detail
Generate a rank report from a set of Use to rank geographies based on market potential
behavior profiles for output via the Render of prospective customers.
Report
tool.
Rank
Input a behavior profile set from an installed Used in conjunction or after the Write Behavior
dataset or external file. Profile set- Provides the ability to open the
Profile previously saved profile sets and use for analysis.
Input
Output a profile set (*.scd file) from Taking a customer file that has a segment appended
behavior profile sets in an incoming data to it and saving it into an Alteryx format so that it
Profile stream. Generally only used when using the can be used repeatedly.
Output standalone Solocast desktop tool.
Find the counts of sets of values (from the Very fast, find a count based on a filter that is gone
incoming data stream) that occur in a into the tool from previous process. Example-
Cross Count Calgary database file. number of people by segmentation group in a trade
Append area(the advantage is you are getting a count,
rather than extracting the data and getting the
count yourself.)
Input data from the Calgary database file Very Fast input data from an indexed file. This can
Calgary with a query. be filtered or based on a user defined condition.
Input
Query a Calgary database dynamically based Very fast extraction data from a large index file,
on values from an incoming data stream. based on a condition passed to it from inside
Calgary
Alteryx. Example- extract all businesses with
Join
employees greater than 1000 (where 1000 is from
an Alteryx calculation above it.)
Create a highly indexed and compressed Create a highly indexed and compressed file for use
Calgary Calgary database which allows for extremely with any of the Calgary tools (to enable the speed.)
Loader fast queries.
Developer
Icon Tool Description Example
Return the results of a data stream directly Have an Alteryx process being called from the inside
to an API callback function. For use with of another program. Example- someone wants to
API Output custom application development. have their own Application interface, but use Alteryx
for the processing. (i.e. Mortgage calculator)
These tools are new in: 9.0 9.1 9.5 10.0
The Base 64 Encoder macro issues a base 64 Useful where there is a need to encode binary data
encode string. that needs to be stored and transferred over media
that is designed to deal with textual data. This is to
Base64 ensure that the data remains intact without
Encoder modification during transport.
(http://en.wikipedia.org/wiki/Base64)
Read from input files or databases at runtime You have a process in Alteryx that can pull a
using an incoming data stream to value/extract from a Database. Example, you have
Dynamic dynamically choose the data. Allows for a process in Alteryx that says which zip codes have
Input dynamically generated queries. the highest value. Then that extracts data from
another source based on that zip code – decreases
overhead of reading in the entire database.
Dynamically (using data from an incoming If you bring in a csv file, where all of the column
stream) rename fields. Useful when applying names have underscores, this could replace all of
Dynamic
custom parsing to text files. the underscores to spaces. It will not change the
Rename
data, but will change the column names.
Replace data values in a series of fields Say you have a hundred different income fields and
(using a dynamically specified condition) instead of the actual value in each field, you want to
Dynamic
with expressions or values from an incoming represent the number with a code of A, B, C, D, etc.
Replace
stream. that represents a range. The Dynamic Replace tool
can easily perform this task.
Select or de-select fields by field type or an Select the first 10 columns of a file regardless of
Dynamic expression. what they contain.
Select
Output the schema (field types and names, Gives users the types of the data, that the column
etc.) of a data stream. contains (description of the data.)
Field Info
The JSON Parse tool separates Java Script This tool is used in our Social Media tools where we
Object Notation text into a table schema for are pulling data from web API’s that return JSON via
the purpose of downstream processing. It the Download tool. The JSON is parsed before it is
JSON Parse
can be built back up into usable JSON format presented to the user into a nicely formatted table.
by feeding the output into the JSON Build
tool.
Write log messages to the Output Window. Gives you extra messaging information in the
Generally used in authoring macros. output. The number of columns in a certain table,
Message the number of rows, or errors.
Execute an R language script and link This free open-source programming language has
incoming and outgoing data from Alteryx to 1000’s of algorithms available, or you can write your
R R, an open-source tool used for statistical own algorithm in R and bring it into Alteryx.
and predictive analysis.
Run external programs as part of an Alteryx Call other programs from inside Alteryx- example,
process. curl.exe was widely used to scape data, before
Run Alteryx had the download tool.
Command
These tools are new in: 9.0 9.1 9.5 10.0
Test assumptions in a data stream. Check conditions inside modules and give errors if
required- example- if you want to make sure all of
Test the records are joined before you start the process,
the error/test would stop the module.
Laboratory
Icon Tool Description Example
The Blob Convert tool will take different data Convert a PNG, GIF or JPG Blob to Report Snippet.
Blob types and either converts them to a Binary Allows the Reporting Tools to recognize the incoming
Convert Large Object (Blob) or takes a Blob and images as report snippets for building reports.
converts it to a different data type.
The Blob input tool will read a Binary Large Read a list of documents from disk either as blobs or
Object such as an image or media file, by encoded strings.
Blob Input browsing directly to a file or passing a list of
files to read.
The Blob Output tool writes out each record Write a list of documents to disk either as blobs or
into its own file. encoded strings.
Blob Output
The JSON Build tool takes the table schema After parsing JSON data, send it through the
of the JSON Parse tool and builds it back into Formula tool to modify it as needed and then use
JSON Build properly formatted Java Script Object the JSON Build tool to output properly formatted
Notation. JSON.
The Make Columns tool takes rows of data This tool is useful for reporting or display purposes
and arranges them by wrapping records into where you want to layout records to fit nicely within
multiple columns. The user can specify how a table. Arrange a table of 10 records into a table of
Make
many columns to create and whether they 5 records spread across 2 columns.
Columns
want records to layout horizontally or
vertically.
The Throttle tool slows down the speed of This is useful for slowing down requests sent per
the downstream tool by limiting the number minute when there are limits to how many records
Throttle of records that are passed through the can be sent in.
Throttle tool.
Interface
Icon Tool Description Example
The Action tool updates the configuration of
a module with values provided by interface
Action questions, when run as an app or macro.
Establish a database connection for an In-DB If you want to connect directly to Oracle or SQL
Connect workflow. Server.
In-DB
Filter In-DB records with a Basic filter or with If you are looking at a dataset of customers, you
Filter a Custom expression using the database’s may want to only keep those customer records that
In-DB native language (e.g., SQL.) have greater than 10 transactions or a certain level
of sales and want this process to run in a database.
These tools are new in: 9.0 9.1 9.5 10.0
Take In-DB Connection Name and Query Use a Dynamic Input In-DB tool when creating an
Dynamic fields from a standard data stream and input In-DB macro for predictive analysis.
Input In- them into an In-DB data stream.
DB
Output information about the In-DB workflow Use a Dynamic Output In-DB tool to output
Dynamic to a standard workflow for Predictive In-DB. information about the In-DB workflow to a standard
Output In- workflow for Predictive In-DB.
DB
Create or update fields in an In-DB data For instance if there is missing or NULL value, you
stream with an expression using the can use the formula tool to replace that NULL value
Formula database’s native language (e.g., SQL). with a zero and run this process in a database.
In-DB
Stream data from an In-DB workflow to a If you already have an establish Alteryx Workflow
Data standard workflow, with an option to sort the and you want to bring data from a database into the
Stream Out records. workflow.
Summarize In-DB data by grouping, You could determine how many customers you have
Summarize summing, counting, counting distinct fields, in the state of NY and how much they have spent in
In-DB and more. The output contains only the total or an average per transaction while all of this
result of the calculation(s). process takes place in the database.
Combine two or more In-DB data streams Transaction data stored in different files for different
with similar structures based on field names time periods, such as a sales data file for March and
Union
or positions. In the output, each column will a separate one for April, can be combined into one
In-DB
contain the data from each input. data stream for further processing inside the
database.
Use an In-DB data stream to create or If you need to update and save the results of the in-
Write update a table directly in the database. database blending into the database.
In-DB