You are on page 1of 90

6/3/23, 12:00 PM Tetrad Single HTML Manual

Tetrad Manual
Last updated: 02/09/2023

Table of Contents

Introduction
Graph Box
Compare Box
Parametric Model Box
Instantiated Model Box
Data Box
Estimator Box
Updater Box
Knowledge Box
Simulation Box
Search Box
Regression Box
Appendix

Introduction
Tetrad is a suite of software for the discovery, estimation, and simulation of causal models. Some of the functions that you can perform with Tetrad
include, but are not limited to:

Loading an existing data set, restricting potential models using your a priori causal knowledge, and searching for a model that explains it using one
of Tetrad’s causal search algorithms
Loading an existing causal graph and existing data set, and estimating a parameterized model from them
Creating a new causal graph, parameterizing a model from it, and simulating data from that model

Tetrad allows for numerous types of data, graph, and model to be input and output, and some functions may be restricted based on what types of data
or graph the user inputs. Other functions may simply not perform as well on certain types of data.

All analysis in Tetrad is performed graphically using a box paradigm, found in a sidebar to the left of the workspace. A box either houses an object such
as a graph or a dataset, or performs an operation such as a search or an estimation. Some boxes require input from other boxes in order to work.
Complex operations are performed by stringing chains of boxes together in the workspace. For instance, to simulate data, you would input a graph box
into a parametric model box, the PM box into an instantiated model box, and finally the IM box into a simulation box.

In order to use a box, click on it in the sidebar, then click inside the workspace. This creates an empty box, which you can instantiated by double-
clicking. Most boxes have multiple options available on instantiation, which will be explained in further detail in this manual.

In order to use one box as input to another, draw an arrow between them by clicking on the arrow tool in the sidebar, and clicking and dragging from the
first box to the second in the workspace.

Tetrad may be cited using the following reference: Ramsey, J. D., Zhang, K., Glymour, M., Romero, R. S., Huang, B., Ebert-Uphoff, I., ... & Glymour, C.
(2018). TETRAD—A toolbox for causal discovery. In 8th International Workshop on Climate Informatics.

Graph Box
The graph box can be used to create a new graph, or to copy or edit a graph from another box.

Possible Parent Boxes of the Graph Box


Another graph box

cmu-phil.github.io/tetrad/manual/ 1/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
A parametric model box
An instantiated model box
An estimator box
A data box
A simulation box
A search box
An updater box
A regression box

Possible Child Boxes of the Graph Box


Another graph box
A compare box
A parametric model box
A data box
A simulation box
A search box
A knowledge box

Creating a New Graph


When you first open a graph box with no parent, you will be presented with several options for which kind of graph you would like to create: a general
graph, a directed acyclic graph (DAG), a structural equation model (SEM)graph, or a time lag graph. Once you have selected the type of graph you
want to create, an empty graph box will open.

You can add variables to your graph by clicking on the variable button on the left, then clicking inside the graph area. Add edges by clicking on an edge
type, then clicking and dragging from one variable to another. Variables may be measured (represented by rectangular icons) or latent (represented by
elliptical icons). Edges may be directed, undirected, bidirected, or uncertain (represented by circles at the ends of an edge). Depending on the type of
graph you choose to create, your choice of edges may be limited.

DAGs allow only directed edges. If an edge would create a cycle, it will not be accepted. A graph box containing a DAG can be used as input for any
parametric model box, and is the only kind of graph box that can be used as input for a Bayes parametric model.

SEM graphs allow only directed and bidirected edges. A graph box containing a SEM graph can be used as input to a SEM parametric model or
generalized SEM parametric model, where a bidirected edge between two variables X and Y will be interpreted as X and Y having correlated error
terms.

Time lag graphs allow only directed edges. New variables that you add will be initialized with a single lag. (The number of lags in the graph may be
changed under “Edit—Configuration…”) Edges from later lags to earlier lags will not be accepted. Edges added within one lag will automatically be
replicated in later lags.

The general graph option allows all edge types and configurations.

Creating a Random Graph


Instead of manually creating a new graph, you can randomly create one. To do so, open up a new empty graph box and click on “Graph—Random
Graph.” This will open up a dialog box from which you can choose the type of random graph you would like to create by clicking through the tabs at the
top of the window. Tetrad will randomly generate a DAG, a multiple indicator model (MIM) graph, or a scale-free graph. Each type of graph is associated
with a number of parameters (including but not limited to the number of nodes and the maximum degree) which you can set.

Once a graph has been randomly generated, you can directly edit it within the same graph box by adding or removing any variables or edges that that
type of graph box allows. So, for instance, although you cannot randomly generate a graph with bidirected edges, you can manually add bidirected
edges to a randomly generated DAG in a SEM graph box.

Random graph generation is not available for time lag graphs.

Loading a Saved Graph


If you have previously saved a graph from Tetrad, you can load it into a new graph box by clicking “File—Load…,” and then clicking on the file type of
the saved graph. Tetrad can load graphs from XML, from text, and from JSON files.

To save a graph to file, click “File—Save…,” then click on the file type you would like to save your graph as. Tetrad can save graphs to XML, text, JSON,
R and dot files. (If you save your graph to R or dot, you will not be able to load that file back into Tetrad.)

You can also save an image of your graph by clicking “File—Save Graph Image…” Tetrad cannot load graphs from saved image files.

cmu-phil.github.io/tetrad/manual/ 2/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

Copying a Graph
There are two ways to copy a graph.

To copy a graph from any box which contains one, first, create a new graph box in the workspace, and draw an arrow from the box whose graph you
want to copy to the new graph box. When opened, the new graph box will automatically contain a direct copy of the graph its parent box contains.

Manipulating a Graph
If you create a graph box as a child of another box, you can also choose to perform a graph manipulation on the parent graph. Your graph box will then
contain the manipulated version of the parent graph.

The available graph manipulations are:

Display Subgraphs
This option allows you to isolate a subgraph from the parent graph. Add variables to the subgraph by highlighting the variable name in the “Unselected”
pane and clicking on the right arrow. The highlighted variable will then show up in the “Selected” pane. (You may also define which variables go in the
“Selected” pane by clicking on the “Text Input…” button and typing the variable names directly into the window.) Choose the type of subgraph you want
to display from the drop-down panel below. Then click “Graph It!” and the resulting subgraph of the selected variables will appear in the pane on the
right. (Some types of subgraph, such as “Markov Blanket,” will include unselected variables if they are part of the subgraph as defined on the selected
variables. So, for instance, an unselected variable that is in the Markov blanket of a selected variable will appear in the Markov Blanket subgraph.
Edges between unselected variables will not be shown.) For large or very dense graphs, it may take a long time to isolate and display subgraphs.

The types of subgraphs that can be displayed are:

Subgraph (displays the selected nodes and all edges between them)
Adjacents (displays the selected nodes and all edges between them, as well as nodes adjacent to the selected nodes)
Adjacents of adjacents (displays the selected nodes and all edges between them, as well as nodes adjacent to the selected nodes and nodes
adjacent to adjacencies of the selected nodes)
Adjacents of adjacents of adjacents (displays the selected nodes and all edges between them, as well as nodes adjacent to the selected nodes,
nodes adjacent to adjacencies of the selected nodes, and nodes adjacent to adjacencies of adjacencies of the selected nodes)
Markov Blankets (displays the selected nodes and all edges between them, as well as the Markov blankets of each selected node)
Treks (displays the selected nodes, with an edge between each pair if and only if a trek exists between them in the full graph)
Trek Edges (displays the selected nodes, and any treks between them, including nodes not in the selected set if they are part of a trek)
Paths (displays the selected nodes, with an edge between each pair if and only if a path exists between them in the full graph)
Path Edges (displays the selected nodes, and any paths between them, including nodes not in the selected set if they are part of a path)
Directed Paths (displays the selected nodes, with a directed edge between each pair if and only if a directed path exists between them in the full
graph)
Directed Path Edges (displays the selected nodes, and any directed paths between them, including nodes not in the selected set if they are part of
a path)
Y Structures (displays any Y structures involving at least two of the selected nodes)
Pag_Y Structures (displays any Y PAGs involving at least two of the selected nodes)
Indegree (displays the selected nodes and their parents)
Outdegree (displays the selected nodes and their children)
Degree (displays the selected nodes and their parents and children)

Choose DAG in CPDAG


If given a CPDAG as input, this chooses a random DAG from the Markov equivalence class of the CPDAG to display. The resulting DAG functions as a
normal graph box.

Choose MAG in PAG


If given a partial ancestral graph (PAG) as input, this chooses a random mixed ancestral graph (MAG) from the equivalence class of the PAG to display.
The resulting MAG functions as a normal graph box.

Show DAGs in CPDAG


If given a CPDAG as input, this displays all DAGs in the CPDAG’s Markov equivalence class. Each DAG is displayed in its own tab. Most graph box
functionality is not available in this type of graph box, but the DAG currently on display can be copied by clicking “Copy Selected Graph.”

Generate CPDAG from DAG


If given a DAG as input, this displays the CPDAG of the Markov equivalence class to which the parent graph belongs. The resulting CPDAG functions
as a normal graph box.

Generate PAG from DAG


Converts an input graph from partial ancestral to directed acyclic format. The resulting DAG functions as a normal graph box.

cmu-phil.github.io/tetrad/manual/ 3/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Generate PAG from tsDAG
Converts an input graph from partial ancestral to time series DAG format. The resulting DAG functions as a normal graph box.

Make Bidirected Edges Undirected


Replaces all bidirected edges in the input graph with undirected edges.

Make Undirected Edges Bidirected


Replaces all undirected edges in the input graph with bidirected edges.

Make All Edges Undirected


Replaces all edges in the input graph with undirected edges.

Generate Complete Graph


Creates a completely connected, undirected graph from the variables in the input graph.

Extract Structure Model


Isolates the subgraph of the input graph involving all and only latent variables.

Other Graph Box Functions


Edges and Edge Type Frequencies
At the bottom of the graph box, the Edges and Edge Type Frequencies section provides an accounting of every edge in the graph, and how certain
Tetrad is of its type. The first three columns contain a list, in text form, of all of the edges in the graph. The columns to the right are all blank in manually
constructed graphs, user-loaded graphs, and graphs output by searches with default settings. They are only filled in for graphs that are output by
searches performed with bootstrapping. In those cases, the fourth column will contain the percentage of bootstrap outputs in which the edge type
between these two variables matches the edge type in the final graph. All of the columns to the right contain the percentages of the bootstrap outputs
that output each possible edge type.

For more information on bootstrap searches, see the Search Box section of the manual.

Layout
You can change the layout of your graph by clicking on the “Layout” tab and choosing between several common layouts. You can also rearrange the
layout of one graph box to match the layout of another graph box (so long as the two graphs have identical variables) by clicking “Layout—Copy
Layout” and “Layout—Paste Layout.” You do not need to a highlight the graph in order to copy the layout.

Graph Properties
Clicking on “Graph—Graph Properties” will give you a text box containing the following properties of your graph:

Number of nodes
Number of latent nodes
Number of adjacencies
Number of directed edges (not in 2-cycles)
Number of bidirected edges
Number of undirected edges
Max degree
Max indegree
Max outdegree
Average degree
Density
Number of latents
Cyclic/Acyclic

Paths
Clicking on “Graph—Paths” opens a dialog box that allows you to see all the paths between any two variables. You can specify whether you want to see
only adjacencies, only directed paths, only semidirected paths, or all treks between the two variables of interest, and the maximum length of the paths
you are interested in using drop boxes at the top of the pane. To apply those settings, click “update.”

Correlation
You can automatically correlate or uncorrelated exogenous variables under the Graph tab.

Highlighting
You can highlight bidirected edges, undirected edges, and latent nodes under the Graph tab.

cmu-phil.github.io/tetrad/manual/ 4/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

Compare Box
The compare box compares two or more graphs.

Possible Parent Boxes of the Compare box:


A graph box
An instantiated model box
An estimator box
A simulation box
A search box
A regression box

Possible Child Boxes of the Compare box:


None

Edgewise Comparisons
An edgewise comparison compares two graphs, and gives a textual list of the edges which must be added to or taken away from one to make it
identical to the other.

Take, for example, the following two graphs. The first is the reference graph, the second is the graph to be compared to it. When the Edgewise
Comparison box is opened, a comparison like this appears:

You may choose (by a menu in the upper left part of the box) whether the graph being compared is the original DAG, or the CPDAG of the original
DAG, of the PAG of the original DAG

When the listed changes have been made to the second graph, it will be identical to the first graph.

Stats List Graph Comparisons


A stats list graph comparison tallies up and presents statistics for the differences and similarities between a true graph and a reference graph. Consider
the example used in the above section; once again, we’ll let graph one be the true graph. Just as above, when the graphs are input to the tabular graph
compare box, we must specify which of the graphs is the reference graph, and whether it contains latent variables. When the comparison is complete,
the following window results:

cmu-phil.github.io/tetrad/manual/ 5/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

You may choose (by a menu in the upper left part of the box) whether the graph being compared is the original DAG, or the CPDAG of the original
DAG, of the PAG of the original DAG

The first columns gives an abbreviation for the statistic; the second columns gives a definition of the statistic. The third columns gives the statistic value.

Misclassifications
A misclassification procedure organizes a graph comparison by edge type. The edge types (undirected, directed, uncertain, partially uncertain,
bidirected, and null) are listed as the rows and columns of a matrix, with the true graph edges as the row headers and the target graph edges as the
column headers. If, for example, there are three pairs of variables that are connected by undirected edges in the reference graph, but are connected by
directed edges in the estimated graph, then there will be a 3 in the (undirected, directed) cell of the matrix. An analogous method is used to represent
endpoint errors. For example:

cmu-phil.github.io/tetrad/manual/ 6/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

Graph Intersections
A graph intersection compares two or more graphs in the same comparison. It does so by ranking adjacencies (edges without regard to direction) and
orientations based on how many of the graphs they appear in. In an n-graph comparison, it first lists any adjacencies found in all n graphs. Then it lists
all adjacencies found in n – 1 graphs, then adjacencies found in n – 2 graphs, and so on.

After it has listed all adjacencies, it lists any orientations that are not contradicted among the graphs, again in descending order of how many graphs the
orientation appears in. An uncontradicted orientation is one on which all graphs either agree or have no opinion. So if the edge X  Y appears in all n
graphs, it will be listed first. If the edge X  Z appears in n – 1 graphs, it will be listed next, but only if the nth graph doesn’t contradict it—that is, only if
the edge Z  X does not appear in the final graph. If the undirected edge Z – X appears in the final graph, the orientation X  Z is still considered to be
uncontradicted.

Finally, any contradicted orientations (orientations that the graphs disagree on) are listed.

Independence Facts Comparison


Rather than comparing edges or orientation, this option directly compares the implied dependencies in two graphs. When you initially open the box, you
will see the following window:

cmu-phil.github.io/tetrad/manual/ 7/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

The drop-down menu allows you to choose which variables you want to check the dependence of. If you select more than two variables, any
subsequent variables will be considered members of the conditioning set. So, if you select variables X1, X2, and X3, in that order, the box will determine
whether X1 is independent of X2, conditional on X3, in each of the graphs being compared. When you click “List,” in the bottom right of the window, the
results will be displayed in the center of the window:

Edge Weight Similarity Comparisons


Edge weight (linear coefficient) similarity comparisons compare two linear SEM instantiated models. The output is a score equal to the sum of the
squares of the differences between each corresponding edge weight in each model. Therefore, the lower the score, the more similar the two graphs

cmu-phil.github.io/tetrad/manual/ 8/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
are. The score has peculiarities: it does not take account of the variances of the variables, and may therefore best be used with standardized models;
the complete absence of an edge is scored as 0—so a negative coefficient compares less well with a positive coefficient than does no edge at all.

Consider, for example, an edge weight similarity comparison between the following two SEM IMs:

When they are input into an edge weight similarity comparison, the following window results:

This is, unsurprisingly, a high score; the input models have few adjacencies in common, let alone similar parameters.

Model Fit
cmu-phil.github.io/tetrad/manual/ 9/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
A model fit comparison takes a simulation box and a search box (ideally, a search that has been run on the simulated data in the simulation box), and
provides goodness-of-fit statistics, including a Student’s t statistic and p value for each edge, for the output graph and the data, as well as estimating the
values of any parameters. It looks and functions identically to the estimator box, but unlike the estimator box, it takes the search box directly as a
parent, without needing to isolate and parameterize the graph output by the search.

Parametric Model Box


The parametric model box takes a nonparameterized input graph and creates a causal model.

Possible Parent Boxes of the Parametric Model Box:


A graph box
Another parametric model box
An instantiated model box
An estimator box
A data box
A simulation box
A search box
A regression box

Possible Child Boxes of the Parametric Model Box:


A graph box
Another parametric model box
An instantiated model box
An estimator box
A data box
A simulation box
A search box
A knowledge box

Bayes Parametric Models


A Bayes parametric model takes as input a DAG. Bayes PMs represent causal structures in which all of the variables are categorical.

Bayes PMs consist of three components: the graphical representation of the causal structure of the model; for each named variable, the number of
categories which that variable can assume; and the names of the categories associated with each variable.

You may either manually assign categories to the variables or have Tetrad assign them at random. If you choose to manually create a Bayes PM, each
variable will initially be assigned two categories, named numerically. If you choose to have Tetrad assign the categories, you can specify a minimum and
maximum number of categories possible for any given variable. You can then manually edit the number of categories and category names.

Take, for example, the following DAG:

One possible random Bayes PM that Tetrad might generate from the above DAG, using the default settings, looks like this:

cmu-phil.github.io/tetrad/manual/ 10/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

To view the number and names of the categories associated with each variable, you can click on that variable in the graph, or choose it from the drop-
down menu on the right. In this graph, X1 and X2 each have three categories, and the rest of the variables have four categories. The categories are
named numerically by default.

The number of categories associated with a particular variable can be changed by clicking up or down in the drop-down menu on the right. Names of
categories can be changed by overwriting the text already present.

Additionally, several commonly-used preset variable names are provided under the “Presets” tab on the right. If you choose one of these configurations,
the number of categories associated with the current variable will automatically be changed to agree with the configuration you have chosen. If you
want all of the categories associated with a variable to have the same name with a number appended (e.g., x1, x2, x3), choose the “x1, x2, x3…” option
under Presets.

You can also copy category names between variables in the same Bayes PM by clicking on “Transfer—Copy categories” and “Transfer—Paste
categories.”

SEM Parametric Models


The parametric model of a structural equation model (SEM) will take any type of graph as input, as long as the graph contains only directed and
bidirected edges. SEM PMs represent causal structures in which all variables are continuous.

A SEM PM has two components: the graphical causal structure of the model, and a list of parameters used in a set of linear equations that define the
causal relationships in the model. Each variable in a SEM PM is a linear function of a subset of the other variables and of an error term drawn from a
Normal distribution.

Here is an example of a SEM graph and the SEM PM that Tetrad creates from it:

cmu-phil.github.io/tetrad/manual/ 11/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

You can see the error terms in the model by clicking “Parameters—Show Error Terms.” In a SEM model, a bidirected edge indicates that error terms are
correlated, so when error terms are visible, the edge between X1 and X2 will instead run between their error terms.

To change a parameter’s name or starting value for estimation, double click on the parameter in the window.

Generalized SEM Parametric Models


A generalized SEM parametric model takes as input any type of graph, as long as the graph contains only directed edges. (The generalized SEM PM
cannot currently interpret bidirected edges.) Like a SEM PM, it represents causal structures in which all variables are continuous. Also like a SEM PM, a
generalized SEM PM contains two components: the graphical causal structure of the model, and a set of equations representing the causal structure of
the model. Each variable in a generalized SEM PM is a function of a subset of the other variables and an error term. By default, the functions are linear
and the error terms are drawn from a Normal distribution (as in a SEM PM), but the purpose of a generalized SEM PM is to allow editing of these
features.

Here is an example of a general graph and the default generalized SEM PM Tetrad creates using it:

cmu-phil.github.io/tetrad/manual/ 12/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

You can view the error terms by clicking “Tools: Show Error Terms.”

The Variables tab contains a list of the variables and the expressions that define them, and a list of the error terms and the distributions from which their
values will be drawn. Values will be drawn independently for each case if the model is instantiated (see IM box) and used to simulate data (see data
box).

The Parameters tab contains a list of the parameters and the distributions from which they are drawn. When the model in instantiated in the IM box, a
fixed value of each parameter will be selected according to the specified distribution.

To edit an expression or parameter, double click on it (in any tab). This will open up a window allowing you to change the function that defines the
variable or distribution of the parameter.

For instance, if you double click on the expression next to X1 (b1*X5+E_X1), the following window opens:

The drop-down menu at the top of the window lists valid operators and functions. You could, for example, change the expression from linear to
quadratic by replacing b1*X5+E_X1 with b1*X5^2+E_X1. You can also form more complicated expressions, using, for instance, exponential or sine

cmu-phil.github.io/tetrad/manual/ 13/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
functions. If the expression you type is well-formed, it will appear in black text; if it is invalid, it will appear in red text. Tetrad will not accept any invalid
changes.

Parameters are edited in the same way as expressions.

If you want several expressions or parameters to follow the same non-linear model, you may wish to use the Apply Templates tool. This allows you to
edit the expressions or parameters associated with several variables at the same time. To use the Apply Templates tool, click “Tools: Apply
Templates….” This will open the following window:

You can choose to edit variables, error terms, or parameters by clicking through the “apply to” radio buttons. If you type a letter or expression into the
“starts with” box, the template you create will apply only to variables, error terms, or parameters which begin with that letter for expression. For example,
in the given generalized PM, there are two types of parameters: the standard deviations s1-s6 and the edge weights b1-b7. If you click on the
“Parameters” radio button and type “b” into the “Starts with” box, only parameters b1-b7 will be affected by the changes you make.

The “Type Template” box itself works in the same way that the “Type Expression” box works in the “Edit Expression” window, with a few additions. If you
scroll through the drop-down menu at the top of the window, you will see the options NEW, TSUM, and TPROD. Adding NEW to a template creates a
new parameter for every variable the template is applied to. TSUM means “sum of the values of this variable’s parents,” and TPROD means “product of
the values of this variable’s parents.” The contents of the parentheses following TSUM and TPROD indicate any operations which should be performed
upon each variable in the sum or product, with the dollar sign ($) functioning as a wild card. For example, in the image above, TSUM(NEW(b)*$) means
that, for each parent variable of the variable in question, a new “b” will be created and multiplied by the parent variable’s value, and then all of the
products will be added together.

Instantiated Model Box


The instantiated model (IM) box takes a parametric model and assigns values to the parameters.

Possible Parent Boxes of the Instantiated Model Box:


A parametric model box
Another instantiated model box
An estimator box
A simulation box
An updater box

Possible Child Boxes of the Instantiated Model Box:


A graph box
A compare box
A parametric model box
Another instantiated model box

cmu-phil.github.io/tetrad/manual/ 14/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
An estimator box
A simulation box
A search box
An updater box
A classify box
A knowledge box

Bayes Instantiated Models


A Bayes IM consists of a Bayes parametric model with defined probability values for all variables. This means that, conditional on the values of each of
its parent variables, there is a defined probability that a variable will take on each of its possible values. For each assignment of a value to each of the
parents of a variable X, the probabilities of the several values of X must sum to 1.

You can manually set the probability values for each variable, or have Tetrad assign them randomly. If you choose to have Tetrad assign probability
values, you can manually edit them later.

Here is an example of a Bayes PM and its randomly created instantiated model:

In the model above, when X4 and X5 are both 0, the probability that X5 is 0 is 0.0346, that X5 is 1 is 0.4425, and that X5 is 2 is 0.5229. Since X5 must
be 0, 1, or 2, those three values must add up to one, as must the values in every row.

To view the probability values of a variable, either double click on the variable in the graph or choose it from the drop-down menu on the right. You can
manually set a given probability value by overwriting the text box. Be warned that changing the value in one cell will delete the values in all of the other

cmu-phil.github.io/tetrad/manual/ 15/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
cells in the row. Since the values in any row must sum to one, if all of the cells in a row but one are set, Tetrad will automatically change the value in the
last cell to make the sum correct. For instance, in the above model, if you change the first row such that the probability that X5 = 0 is 0.5000 and the
probability that X5 = 1 is 0.4000, the probability that X5 = 2 will automatically be set to 0.1000.

If you right click on a cell in the table (or two-finger click on Macs), you can choose to randomize the probabilities in the row containing that cell,
randomize the values in all incomplete rows in the table, randomize the entire table, or randomize the table of every variable in the model. You can also
choose to clear the row or table.

Dirichlet Instantiated Models


A Dirichlet instantiated model is a specialized form of a Bayes instantiated model. Like a Bayes IM, a Dirichlet IM consists of a Bayes parametric model
with defined probability values. Unlike a Bayes IM, these probability values are not manually set or assigned randomly. Instead, the pseudocount is
manually set or assigned uniformly, and the probability values are derived from it. The pseudocount of a given value of a variable is the number of data
points for which the variable takes on that value, conditional on the values of the variable’s parents, where these numbers are permitted to take on non-
negative real values. Since we are creating models without data, we can set the pseudocount to be any number we want. If you choose to create a
Dirichlet IM, a window will open allowing you to either manually set the pseudocounts, or have Tetrad set all the pseudocounts in the model to one
number, which you specify.

Here is an example of a Bayes PM and the Dirichlet IM which Tetrad creates from it when all pseudocounts are set to one:

In the above model, when X2=0 and X6=0, there is one (pseudo) data point at which X4=0, one at which X4=1, and one at which X4=2. There are three
total (pseudo) data points in which X2=0 and X6=0. You can view the pseudocounts of any variable by clicking on it in the graph or choosing it from the
drop-down menu at the top of the window. To edit the value of a pseudocount, double click on it and overwrite it. The total count of a row cannot be
directly edited.

cmu-phil.github.io/tetrad/manual/ 16/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
From the pseudocounts, Tetrad determines the conditional probability of a category. This estimation is done by taking the pseudocount of a category
and dividing it by the total count for its row. For instance, the total count of X4 when X2=0 and X6=0 is 3. So the conditional probability of X4=0 given
that X2=0 and X6=0 is 1/3. The reasoning behind this is clear: in a third of the data points in which X2 and X6 are both 0, X4 is also 0, so the probability
that X4=0 given that X2 and X6 also equal 0 is probably one third. This also guarantees that the conditional probabilities for any configuration of parent
variables add up to one, which is necessary.

To view the table of conditional probabilities for a variable, click the Probabilities tab. In the above model, the Probabilities tab looks like this:

SEM Instantiated Models


A SEM instantiated model is a SEM parametric model in which the parameters and error terms have defined values. It assumes that relationships
between variables are linear, and that error terms have Gaussian distributions. If you choose to create a SEM IM, the following window will open:

Using this box, you can specify the ranges of values from which you want coefficients, covariances, and variances to be drawn for the parameters in the
model. In the above box, for example, all linear coefficients will be between -1.0 and 1.0. If you uncheck “symmetric about zero,” they will only be
between 0.0 and 1.0.

Here is an example of a SEM PM and a SEM IM generated from it using the default settings:

cmu-phil.github.io/tetrad/manual/ 17/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

You can now manually edit the values of parameters in one of two ways. Double clicking on the parameter in the graph will open up a small text box for
you to overwrite. Or you can click on the Tabular Editor tab, which will show all of the parameters in a table which you can edit. The Tabular Editor tab of
our SEM IM looks like this:

In the Tabular Editor tab of a SEM estimator box (which functions similarly to the SEM IM box), the SE, T, and P columns provide statistics showing how
robust the estimation of each parameter is. Our SEM IM, however, is in an instantiated model box, so these columns are empty.

The Implied Matrices tab shows matrices of relationships between variables in the model. In the Implied Matrices tab, you can view the covariance or
correlation matrix for all variables (including latents) or just measured variables. In our SEM IM, the Implied Matrices tab looks like this:

cmu-phil.github.io/tetrad/manual/ 18/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

You can choose the matrix you wish to view from the drop-down menu at the top of the window. Only half of any matrix is shown, because in a well-
formed acyclic model, the matrices should be symmetric. The cells in the Implied Matrices tab cannot be edited.

In an estimator box, the Model Statistics tab provides goodness of fit statistics for the SEM IM which has been estimated. Our SEM IM, however, is in
an instantiated model box, so no estimation has occurred, and the Model Statistics tab is empty.

Standardized SEM Instantiated Models


A standardized SEM instantiated model consists of a SEM parametric model with defined values for its parameters. In a standardized SEM IM, each
variable (not error terms) has a Normal distribution with 0 mean and unit variance. The input PM to a standardized SEM IM must be acyclic.

Here is an example of an acyclic SEM PM and the standardized SEM IM which Tetrad creates from it

cmu-phil.github.io/tetrad/manual/ 19/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

To edit a parameter, double click on it. A slider will open at the bottom of the window (shown above for the edge parameter between X1 and X2). Click
and drag the slider to change the value of the parameter, or enter the specific value you wish into the box. The value must stay within a certain range in
order for the variables in the model to remain standard Normal (N(0, 1)), so if you attempt to overwrite the text box on the bottom right with a value
outside the listed range, Tetrad will not allow it. That is, given that the variables are all distributed as N(0, 1), there is a limited range in which each
parameter may be adjusted; these ranges vary parameter by parameter, given the values of the other parameters. In a standardized SEM IM, error
terms are not considered parameters and cannot be edited, but you can view them by clicking Parameters: Show Error Terms.

It is possible to make a SEM IM with a time lag graph, even with latent variables. This does not work for other types of models, such as Bayes IM's or
for mixed data (for which no IM is currently available-- though mixed data can be simulated in the Simulate box with an appropriate choice of simulation
model). Standardization for time lag model is not currently avialable.

The Implied Matrices tab works in the same way that it does in a normal SEM IM.

Generalized SEM Instantiated Models


A generalized SEM instantiated model consists of a generalized SEM parametric model with defined values for its parameters. Since the distributions of
the parameters were specified in the SEM PM, Tetrad does not give you the option of specifying these before it creates the instantiated model.

Here is an example of a generalized SEM PM and its generalized SEM IM:

cmu-phil.github.io/tetrad/manual/ 20/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

Note that the expressions for X6 and X2 are not shown, having been replaced with the words “long formula.” Formulae over a certain length—the
default setting is 25 characters—are hidden to improve visibility. Long formulae can be viewed in the Variables tab, which lists all variables and their
formulae. You can change the cutoff point for long formulae by clicking Tools: Formula Cutoff.

If you double click on a formula in either the graph or the Variables tab, you can change the value of the parameters in that formula.

Data Box
The data box stores or manipulates data sets.

Possible Parent Boxes of the Data Box


A graph box
An estimator box
Another data box
A simulation box
A regression box

Possible Child Boxes of the Data Box


A graph box
A parametric model box
Another data box
An estimator box
A simulation box
A search box
A classify box
A regression box
A knowledge box

Using the Data Box:


The data box stores the actual data sets from which causal structures are determined. Data can be loaded into the data box from a preexisting source,
manually filled in Tetrad, or simulated from an instantiated model.

cmu-phil.github.io/tetrad/manual/ 21/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

Loading Data
Data sets loaded into Tetrad may be categorical, continuous, mixed, or covariance data.

General Tabular Data


To load data, create a data box with no parent. When you double click it, an empty data window will appear:

Click "File -> Load Data" and select the text file or files that contain your data. The following window will appear:

The text of the source file appears in the Data Preview window. Above, there are options to describe your file, so that Tetrad can load it correctly. If you
are loading categorical, continuous, or mixed data values, select the “Tabular Data” button. If you are loading a covariance matrix, select “Covariance
Data.” Note that if you are loading a covariance matrix, your text file should contain only the lower half of the matrix, as Tetrad will not accept an entire
matrix.

Below the file type, you can specify a number of other details about your file, including information about the type of data
(categorical/continuous/mixed), metadata JSON file, delimiter between data values, variable names, and more. If your data is mixed (some variables
categorical, and some continuous), you must specify the maximum number of categories discrete variables in your data can take on. All columns with
more than that number of values will be treated as continuous; the others will be treated as categorical. If you do not list the variable names in the file,
you should uncheck “First row variable names.” If you provide case IDs, check the box for the appropriate column in the “Case ID column to ignore”
area. If the case ID column is labeled, provide the name of the label; otherwise, the case ID column should be the first column, and you should check
“First column.”

Below this, you can specify your comment markers, quote characters, and the character which marks missing data values. Tetrad will use that
information to distinguish continuous from discrete variables. You may also choose more files to load (or remove files that you do not wish to load) in the

cmu-phil.github.io/tetrad/manual/ 22/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
“Files” panel on the lower left.

Metadata JSON File


Metadata is optional in general data handling. But it can be very helpful if you want to overwrite the data type of a given variable column. And the
metadata MUST be a JSON file like the following example.

{
"domains": [
{
"name": "raf",
"discrete": false
},
{
"name": "mek",
"discrete": true
}
]
}

You can specify the name and data type for each variable. Variables that are not in the metadata file will be treated as domain variables and their data
type will be the default data type when reading in columns described previously.

When you are satisfied with your description of your data, click “Validate” at the bottom of the window. Tetrad will check that your file is correctly
formatted. If it is, you will receive a screen telling you that validation has passed with no error. At this point, you can revisit the settings page, or click
“Load” to load the data.

cmu-phil.github.io/tetrad/manual/ 23/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
You can now save this data set to a text file by clicking File: Save Data.

In addition to loading data from a file, you can manually enter data values and variable names by overwriting cells in the data table.

Covariance Data
Covariance matrices loaded into Tetrad should be ascii text files. The first row contains the sample size, the second row contains the names of the
variables. The first two rows are followed by a lower triangular matrix. For example:

1000
X1 X2 X3 X4 X5 X6
1.0000
0.0312 1.0000
-0.5746 0.4168 1.0000
-0.5996 0.4261 0.9544 1.0000
0.8691 0.0414 -0.4372 -0.4487 1.0000
0.6188 0.0427 -0.1023 -0.0913 0.7172 1.0000

Categorical, continuous, or mixed data should also be an ascii text file, with columns representing variables and rows representing cases. Beyond that,
there is a great deal of flexibility in the layout: delimiters may be commas, colons, tabs, spaces, semicolons, pipe symbols, or whitespace; comments
and missing data may be marked by any symbol you like; there may be a row of variable names or not; and case IDs may be present or not. There
should be no sample size row. For example:

X1 X2 X3 X4 X5
-3.0133 1.0361 0.2329 2.7829 -0.2878
0.5542 0.3661 0.2480 1.6881 0.0775
3.5579 -0.7431 -0.5960 -2.5502 1.5641
-0.0858 1.0400 -0.8255 0.3021 0.2654
-0.9666 -0.5873 -0.6350 -0.1248 1.1684
-1.7821 1.8063 -0.9814 1.8505 -0.7537
-0.8162 -0.6715 0.3339 2.6631 0.9014
-0.3150 -0.5103 -2.2830 -1.2462 -1.2765
-4.1204 2.9980 -0.3609 4.8079 0.6005
1.4658 -1.4069 1.7234 -1.7129 -3.8298

Handling Tabular Data with Interventional Variables


This is an advanced topic for datasets that contain interventional (i.e., experimental) variables. We model a single intervention using two variables:
status variable and value variable. Below is a sample dataset, in which `raf`, `mek`, `pip2`, `erk`, `atk` are the 5 domain variables, and `cd3_s` and
`cd3_v` are an interventional pair (status and value variable respectively). `icam` in another intervention variable, but it's a combined variable that
doesn't have status.

raf mek pip2 erk akt cd3_s cd3_v icam


3.5946 3.1442 3.3429 2.81 3.2958 0 1.2223 *
3.8265 3.2771 3.2884 3.3534 3.7495 0 2.3344 *
4.2399 3.9908 3.0057 3.2149 3.7495 1 0 3.4423
4.4188 4.5304 3.157 2.7619 3.0819 1 3.4533 1.0067
3.7773 3.3945 2.9821 3.4372 4.0271 0 4.0976 *

And the sample metadata JSON file looks like this:

{
"interventions": [
{
"status": {
"name": "cd3_s",
"discrete": true
},
"value": {
"name": "cd3_v",
"discrete": false
}
},
{
"status": null,
"value": {
"name": "icam",
"discrete": false
}
}
],
"domains": [
{
"name": "raf",

cmu-phil.github.io/tetrad/manual/ 24/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
"discrete": false
},
{
"name": "mek",
"discrete": false
}
]
}

Each intervention consists of a status variable and value variable. There are cases that you may have a combined interventional variable that doesn't
have the status variable. In this case, just use `null`. The data type of each variable can either be discrete or continuous. We use a boolean flag to
indicate the data type. From the above example, we only specified two domain variables in the metadata JSON, any variables not specifed in the
metadata will be treated as domain variables.

Manipulating Data
The data box can also be used to manipulate data sets that have already been loaded or simulated. If you create a data box as the child of another box
containing a data set, you will be presented with a list of operations that can be performed on the data. The available data manipulations are:

Discretize Dataset
This operation allows you to make some or all variables in a data set discrete. If you choose it, a window will open.

When the window first opens, no variables are selected, and the right side of the window appears blank; in this case, we have already selected X1
ourselves. In order to discretize a variable, Tetrad assigns all data points within a certain range to a category. You can tell Tetrad to break the range of
the dataset into approximately even sections (Evenly Distributed Intervals) or to break the data points themselves into approximately even chunks
(Evenly Distributed Values). Use the scrolling menu to increase or decrease the number of categories to create. You can also rename categories by
overwriting the text boxes on the left, or change the ranges of the categories by overwriting the text boxes on the right. To discretize another variable,
simply select it from the left. If you want your new data set to include the variables you did not discretize, check the box at the bottom of the window.

You may discretize multiple variables at once by selecting multiple variables. In this case, the ranges are not shown, as they will be different from
variable to variable.

Convert Numerical Discrete to Continuous


If you choose this option, any discrete variables with numerical category values will be treated as continuous variables with real values. For example,
“1” will be converted to “1.0.”

Calculator
The Calculator option allows you to add and edit relationships between variables in your data set, and to add new variables to the data set.

cmu-phil.github.io/tetrad/manual/ 25/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

In many ways, this tool works like the Edit Expression window in a generalized SEM parametric model. To edit the formula that defines a variable (which
will change that variable’s values in the table) type that variable name into the text box to the left of the equals sign. To create a new variable, type a
name for that variable into the text box to the left of the equals sign. Then, in the box on the right, write the formula by which you wish to define a new
variable in place of, or in addition to, the old variable. You can select functions from the scrolling menu below. (For an explanation of the meaning of
some the functions, see the section on generalized SEM models in the Parametric Model Box chapter.) To edit or create several formulae at once, click
the “Add Expression” button, and another blank formula will appear. To delete a formula, check the box next to it and click the “Remove Selected
Expressions” button.

When you click “Save” a table will appear listing the data. Values of variables whose formulae you changed will be changed, and any new variables you
created will appear with defined values.

Merge Deterministic Interventional Variables


This option looks for pairs of interventional variables (currently only discrete variables) that are deterministic and merges them into one combined
variable. For domain variables that are fully determinised, we'll add an attribute to them. Later in the knowledge box (Edges and Tiers), all the
interventional variables (both status and value variables) and the fully-determinised domain variables will be automatically put to top tier. And all other
domain variables will be placed in the second tier.

Merge Datasets
This operation takes two or more data boxes as parents and creates a data box containing all data sets in the parent boxes. Individual data sets will be
contained in their own tabs in the resulting box.

Convert to Correlation Matrix


This operation takes a tabular data set and outputs the lower half of the correlation matrix of that data set.

Convert to Covariance Matrix


This operation takes a tabular data set and outputs the lower half of the covariance matrix of that data set.

Inverse Matrix
This operation takes a covariance or correlation matrix and outputs its inverse. (Note: The output will not be acceptable in Tetrad as a covariance or
correlation matrix, as it is not lower triangular.)

Simulate Tabular from Covariance


This operation takes a covariance matrix and outputs a tabular data set whose covariances comply with the matrix.

Difference of Covariance Matrices


This operation takes two covariance matrices and outputs their difference. The resulting matrix will be a well-formatted Tetrad covariance matrix data
set.

Sum of Covariance Matrices

cmu-phil.github.io/tetrad/manual/ 26/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
This operation takes two covariance matrices and outputs their sum. The resulting matrix will be a well-formatted Tetrad covariance matrix data set.

Average of Covariance Matrices


This operation takes two or more covariance matrices and outputs their average. The resulting matrix will be a well-formatted Tetrad covariance matrix
data set.

Convert to Time Lag Data


This operation takes a tabular data set and outputs a time lag data set, in which each variable is recorded several times over the course of an
experiment. You can specify the number of lags in the data. Each contains the same data, shifted by one “time unit.” For instance, if the original data set
had 1000 cases, and you specify that the time lag data set should contain two lags, then the third stage variable values will be those of cases 1 to 998,
the second stage variable values will be those of cases 2 to 999, and the first stage variable values will be those of cases 3 to 1000.

Convert to Time Lag Data with Index


This operation takes a tabular data set and outputs a time lag data set in the same manner as “Convert to Time Lag Data,” then adds an index variable.

Convert to AR Residuals
This operation is performed on a time lag data set. Tetrad performs a linear regression on each variable in each lag with respect to each of the variables
in the previous lag, and derives the error terms. The output data set contains only the error terms.

Whiten
Takes a continuous tabular data set and converts it to a data set whose covariance matrix is the identity matrix.

Nonparanormal Transform
Takes a continuous tabular data set and increases its Gaussianity, using a nonparanormal transformation to smooth the variables. (Note: This operation
increases only marginal Gaussanity, not the joint, and in linear systems may eliminate information about higher moments that can aid in non-Gaussian
orientation procedures.)

Convert to Residuals
The input for this operation is a directed acyclic graph (DAG) and a data set. Tetrad performs a linear regression on each variable in the data set with
respect to all of the variables that the graph shows to be its parents, and derives the error terms. The output data set contains only the error terms.

Standardize Data
This operation manipulates the data in your data set such that each variable has 0 mean and unit variance.

Remove Cases with Missing Values


If you choose this operation, Tetrad will remove any row in which one or more of the values is missing.

Replace Missing Values with Column Mode


If you choose this operation, Tetrad will replace any missing value markers with the most commonly used value in the column.

Replace Missing Values with Column Mean


If you choose this operation, Tetrad will replace any missing value markers with the average of all of the values in the column. Replace Missing Values
with Regression Predictions: If you choose this operation, Tetrad will perform a linear regression on the data in order to estimate the most likely value of
any missing value.

Replace Missing Values by Extra Category


This operation takes as input a discrete data set. For every variable which has missing values, Tetrad will create an extra category for that variable
(named by default “Missing”) and replace any missing data markers with that category.

Replace Missing with Random


For discrete data, replaces missing values at random from the list of categories the variable takes in other cases. For continuous data, finds the
minimum and maximum values of the column (ignoring the missing values) and picks a random number from U(min, max)

Inject Missing Data Randomly


If you choose this operation, Tetrad will replace randomly selected data values with a missing data marker. You can set the probability with which any
particular value will be replaced (that is, approximately the percentage of values for each variable which will be replaced with missing data markers).

Bootstrap Sample
This operation draws a random subset of the input data set (you specify the size of the subset) with replacement (that is, cases which appear once in
the original data set can appear multiple times in the subset). The resulting data set can be used along with similar subsets to achieve more accurate
estimates of parameters.

Split by Cases

cmu-phil.github.io/tetrad/manual/ 27/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
This operation allows you to split a data set into several smaller data sets. When you choose it, a window opens.

If you would like the subsets to retain the ordering they had in the original set, click “Original Order.” Otherwise, the ordering of the subsets will be
assigned at random. You can also increase and decrease the number of subsets created, and specify the range of each subset.

Permute Rows
This operation randomly reassigns the ordering of a data set’s cases.

First Differences
This operation takes a tabular data set and outputs the first differences of the data (i.e., if X is a variable in the original data set and X’ is its equivalent in
the first differences data set, X’1 = X2 – X1). The resulting data set will have one fewer row than the original.

Concatenate Datasets
This operation takes two or more datasets and concatenates. The parent datasets must have the same number of variables.

Copy Continuous Variables


This operation takes as input a data set and creates a new data set containing only the continuous variables present in the original.

Copy Discrete Variables


This operation takes as input a data set and creates a new data set containing only the discrete variables present in the original.

Remove Selected Variables


Copy Selected Variables
As explained above, you can select an entire column in a data set by clicking on the C1, C2, C3, etc… cell above the column. To select multiple
columns, press and hold the “control” key while clicking on the cells. Once you have done so, you can use the Copy Selected Variables tool to create a
data set in which only those columns appear.

Remove Constant Columns


This operation takes a data set as input, and creates a data set which contains all columns in the original data set except for those with constant values
(such as, for example, a column containing nothing but 2’s).

Randomly Reorder Columns


This operation randomly reassigns the ordering of a data set’s variables.

Manually Editing Data


Under the Edit tab, there are several options to manipulate data. If you select a number of cells and click “Clear Cells,” Tetrad will replace the data
values in the selected cells with a missing data marker. If you select an entire row or column and click “Delete selected rows or columns,” Tetrad will
delete all data values in the row or column, and the name of the row or column. (To select an entire column, click on the category number above it,
labeled C1, C2, C3, and so on. To select an entire row, click on the row number to the left of it, labeled 1, 2, 3, and so on.) You can also copy, cut, and
paste data values to and from selected cells. You can choose to show or hide category names, and if you click on “Set Constants Col to Missing,” then
in any column in which the variable takes on only one value (for example, a column in which every cell contains the number 2) Tetrad will set every cell
to the missing data marker.

Under the Tools tab, the Calculator tool allows you add and edit relationships between variables in the graph. For more information on how the
Calculator tool works, see “Manipulating Data” section above.

Data Information

cmu-phil.github.io/tetrad/manual/ 28/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Under the Tools tab, there are options to view information about your data in several different formats.

The Histograms tool shows histograms of the variables in the data set.

These show the distribution of data for each variable, with the width of each bar representing a range of values, and height of each bar representing
how many data points fall into that range. Using histograms, you can determine whether each variable has a distribution that is approximately Normal.
To select a variable to view, choose it from the drop-down menu on the right. You can increase or decrease the number of bars in the histogram (and
therefore decrease or increase the range of each bar, and increase or decrease the accuracy of the histogram) using the menu on the right. You can
also view only ranges with a certain amount of the data using the “cull bins” menu.

The Scatter Plots tool allows you to view scatter plots of two variables plotted against each other.

To view a variable as the x- or y-axis of the graph, select it from one the drop-down menus to the right. To view the regression line of the graph, check
the box on the right.

You can see the correlation of two variables conditional on a third variable by using the Add New Conditional Variable button at the bottom of the
window. This will open up a slider and a box in which you can set the granularity of the slider. By moving the slider to the left or right, you can change
the range of values of the conditional variable for which the scatter plot shows the correlation of the variables on the x- and y- axes. You can increase
and decrease the width of the ranges by changing the granularity of the slider. A slider with granularity 1 will break the values of the conditional variable
into sections one unit long, etc. The granularity cannot be set lower than one.

In a well-formed model, the scatter plot of a variable plotted against itself should appear as a straight line along the line y = x.

The Q-Q Plot tool is a test for normality of distribution.

cmu-phil.github.io/tetrad/manual/ 29/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

If a variable has a distribution which is approximately Normal, its Q-Q plot should appear as a straight line with a positive slope. You can select the
variable whose Q-Q plot you wish to view from the drop-down menu on the right.

The Normality Tests tool gives a text box with the results of the Kolmogorov and Anderson Darling Tests for normality for each variable. The Descriptive
Statistics tool gives a text box with statistical information such as the mean, median, and variance of each variable.

Estimator Box
The estimator box takes as input a data box (or simulation box) and a parametric model box and estimates, tests, and outputs an instantiated model for
the data. With the exception of the EM Bayes estimator, Tetrad estimators do not accept missing values. If your data set contains missing values, the
missing values can interpolated or removed using the data box. (Note that missing values are allowed in various Tetrad search procedures; see the
section on the search box.)

Possible Parent Boxes of the Estimator Box:


A parametric model box

Possible Child Boxes of the Estimator Box:


A graph box
A simulation box
An updater box

ML Bayes Estimations
Bayes nets are acyclic graphical models parameterized by the conditional probability distribution of each variable on its parents' values, as in the
instantiated model box. When the model contains no latent variables, the joint distribution of the variables equals the product of the distributions of the
variables conditional on their respective parents. The maximum likelihood (ML) estimate of the joint probability distribution under a model is the product
of the corresponding frequencies in the sample.

The ML Bayes estimator, because it estimates Bayes IMs, works only on models with discrete variables. The model estimated must not include latent
variables, and the input data set must not include missing data values. A sample estimate looks like this:

cmu-phil.github.io/tetrad/manual/ 30/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

The Model tab works exactly as it does in a Bayes instantiated model. The Model Statistics tab provides the p-value for a chi square test of the model,
degrees of freedom, the chi square value, and the Bayes Information Criterion (BIC) score of the model.

Dirichlet Estimations
A Dirichlet estimate estimates a Bayes instantiated model using a Dirichlet distribution for each category. In a Dirichlet estimate, the probability of each
value of a variable (conditional on the values of the variable’s parents) is estimated by adding together a prior pseudo count (which is 1, by default, of
cases and the number of cases in which the variable takes that value in the data, and then dividing by the total number of cases in the pseudocounts
and in the data with that configuration of values of parent variables. The default prior pseudo-count can be changed inside the box. (For a full
explanation of pseudocounts and Dirichlet estimate, see the section on Dirichlet instantiated models.)

The Dirichlet estimator in TETRAD does not work if the input data set contains missing data values.

EM Bayes Estimations
The EM Bayes estimator takes the same input and gives the same output as the ML Bayes estimator, but is designed to handle data sets with missing
data values, and input models with latent variables.

SEM Estimates
A SEM estimator estimates the values of parameters for a SEM parametric model. SEM estimates do not work if the input data set contains missing
data values. A sample output looks like this:

cmu-phil.github.io/tetrad/manual/ 31/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

Tetrad provides five parameter optimizers: RICF,( Drton, M., & Richardson, T. S. (2004, July). Iterative conditional fitting for Gaussian ancestral graph
models. In Proceedings of the 20th conference on Uncertainty in artificial intelligence (pp. 130-137). AUAI Press). expectation­-maximization (EM),
regression, Powell Journal of Econometrics 25 (1984) 303-325) and random search. Accurate regression estimates assume that the input parametric
model is a DAG, and that its associated statistics are based on a linear, Gaussian model. The EM optimizer has the same input constraints as
regression, but can handle latent variables.

Tetrad also provides two scores that can be used in estimation: feasible generalized least squares (FGLS) and Full Information Maximum Likelihood
(FML).

If the graph for the SEM is a DAG, and we may assume that the SEM is linear with Gaussian error terms, we use multilinear regression to estimate
coefficients and residual variances. Otherwise, we use a standard maximum likelihoood fitting function (see Bollen, Structural Equations with Latent
Variables, Wiley, 1989, pg. 107) to minimize the distance between (a) the covariance over the variables as implied by the coefficient and error
covariance parameter values of the model and (b) the sample covariance matrix. Following Bollen, we denote this function Fml; it maps points in
parameter values space to real numbers, and, when minimized, yields the maximum likelihood estimation point in parameter space.

In either case, an Fml value may be obtained for the maximum likelihood point in parameter space, either by regression or by direct minimization of the
Fml function itself. The value of Fml at this minimum (maximum likelihood) point, multiplied by N - 1 (where N is the sample size), yields a chi square
statistics (ch^2) for the model, which when referred to the chi square table with appropriate degrees of freedom, yields a model p value. The degrees of
freedom (dof) in this case is equal to the m(m-1)/2 - f, where m is the number of measured variables, and f is the number of free parameters, equal to
the number of coefficient parameters plus the number of covariance parameters. (Note that the degrees of freedom many be negative, in which case
estimation should not be done.) The BIC score is calculated as ch^2 - dof * log(N).

You can change which score optimizer Tetrad uses by choosing them from the drop-down menus at the bottom of the window and clicking “Estimate
Again.”

cmu-phil.github.io/tetrad/manual/ 32/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
The Tabular Editor and Implied Matrices tabs function exactly as they do in the instantiated model box, but in the estimator box, the last three columns
of the table in the Tabular Editor tab are filled in. The SE, T, and P columns provide the standard errors, t statistics, and p values of the estimation.

The Model Statistics tab provides the degrees of freedom, chi square, p value, comparative fit index (CFI), root mean square error of approximation
(RMSEA) and BIC score of a test of the model. It should be noted that while these test statistics are standard, they are not in general correct. See
Mathias Drton, 2009, Likelihood ratio tests and singularities. Annals of Statistics 37(2):979-1012. arXiv:math.ST/0703360.

When the EM algorithm is used with latent variable models, we recommend multiple random restarts. The number of restarts can be set in the lower
right hand corner of the Estimator Box.

Generalized Estimator
A generalized graphical model may have non-linear relations and non-Gaussian distributions. These models are automatically estimated by the Powell
method, which seeks a maximum likelihood solution.

Updater Box
The updater box takes an instantiated model as input, and, given information about the values of parameters in that model, updates the information
about the values and relationships of other parameters.

The Updater allows the user to specify values of variables as “Evidence.” The default is that the conditional probabilities (Bayes net models; categorical
variables) or conditional means (SEM models; continuous variables) are computed. For any variable for which evidence is specified, the user can click
on “Manipulated,” in which case the Updater will calculate the conditional probabilities or conditional means for other variables when the evidence
variables are forced to have their specified values. In manipulated calculations, all connections into a measured variable are discarded, the manipulated
variables are treated as independent of their causes in the graph, and probabilities for variables that are causes of the manipulated variables are
unchanged.

There are four available updater algorithms in Tetrad: the approximate updater, the row summing exact updater, and the Junction Tree Updater, and the
SEM updater. All except for the SEM updater function only when given Bayes instantiated models as input; the SEM updater functions when given a
SEM instantiated model as input. None of the updaters work on cyclic models.

Possible Parent Boxes of the Updater Box:


An instantiated model box
An estimator box

Possible Child Boxes of the Updater Box:


An instantiated model box (Note that the instantiated model will have the updated parameters)

Approximate Updater
The approximated updater is a fast but inexact algorithm. It randomly draws a sample data set from the instantiated model and calculates the
conditional frequency of the variable to be estimated.

Take, for example, the following instantiated model:

cmu-phil.github.io/tetrad/manual/ 33/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

When it is input into the approximate updater, the following window results:

If we click “Do Update Now” now, without giving the updater any evidence, the right side of the screen changes to show us the marginal probabilities of
the variables.

The blue lines, and the values listed across from them, indicate the probability that the variable takes on the given value in the input instantiated model.
The red lines indicate the probability that the variable takes on the given value, given the evidence we’ve added to the updater.

Since we have added no evidence to the updater, the red and blue lines are very similar in length. To view the marginal probabilities for a variable,
either click on the variable in the graph to the left, or choose it from the scrolling menu at the top of the window. At the moment, they should all be very

cmu-phil.github.io/tetrad/manual/ 34/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
close to the marginal probabilities taken from the instantiated model.

Now, we’ll return to the original window. We can do so by clicking “Edit Evidence” under the Evidence tab. Suppose we know that X1 takes on the value
1 in our model, or suppose we merely want to see how X1 taking that value affects the values of the other variables. We can click on the box that says
“1” next to X1. When we click “Do Update Now,” we again get a list of the marginal probabilities for X1.

Now that we have added evidence, the “red line” marginal probabilities have changed; for X1, the probability that X1=1 is 1, because we’ve told Tetrad
that that is the case. Likewise, the probabilities that X1=0 and X1=2 are both 0.

Now, let’s look at the updated marginal probabilities for X2, a parent of X1.

cmu-phil.github.io/tetrad/manual/ 35/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

The first image is the marginal probabilities before we added the evidence that X1=1. The second image is the updated marginal probabilities. They
have changed; in particular, it has become much more likely that X2=0.

Under the Mode tab, we can change the type of information that the updater box gives us. The mode we have been using so far is “Marginals Only
(Multiple Variables).” We can switch the mode to “In-Depth Information (Single Variable).” Under this mode, when we perform the update, we receive
more information (such as log odds and joints, when supported; joint probabilities are not supported by the approximate updater), but only about the
variable which was selected in the graph when we performed the update. To view information about a different variable, we must re-edit the evidence
with that variable selected.

If the variable can take one of several values, or if we know the values of more than one variable, we can select multiple values by pressing and holding
the Shift key and then making our selections. For instance, in the model above, suppose that we know that X1 can be 1 or 2, but not 0. We can hold the
Shift key and select the boxes for 1 and 2, and when we click “Do Update Now,” the marginal probabilities for X2 look like this:

Since X1 must be 1 or 2, the updated probability that it is 0 is now 0. The marginal probabilities of X2 also change:

cmu-phil.github.io/tetrad/manual/ 36/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

The updated marginal probabilities are much closer to their original values than they were when we knew that X1 was 1.

Finally, if we are arbitrarily setting the value of a variable—that is, the values of its parents have no effect on its value—we can check the “Manipulated”
box next to it while we are we editing evidence, and the update will reflect this information.

Note that multiple values cannot be selected for evidence for SEM models.

Row Summing Exact Updater


The row summing exact updater is a slower but more accurate updater than the approximate updater. The complexity of the algorithm depends on the
number of variables and the number of categories each variable has. It creates a full exact conditional probability table and updates from that. Its
window functions exactly as the approximate updater does, with two exceptions: in “Multiple Variables” mode, you can see conditional as well as
marginal probabilities, and in “Single Variable” mode, you can see joint values.

Junction Tree Exact Updater


The Junction Tree exact updater is a another exact learning algroithm. Its window functions exactly as the approximate updater down, with one
exception: in “Multiple Variables” mode, you can see conditional as well as marginal probabilities.

SEM Updater
The SEM updater does not deal with marginal probabilities; instead, it estimates means.

When it is input to the SEM updater, the following window results:

cmu-phil.github.io/tetrad/manual/ 37/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

Suppose we know that the mean of X1 is .5. When we enter that value into the text box on the left and click “Do Update Now,” the model on the right
updates to reflect that mean, changing the means of both X1 and several other variables. In the new model, the means of X2, X4, and X5 will all have
changed. If we click the “Manipulated” check box as well, it means that we have arbitrarily set the mean of X1 to .5, and that the value of its parent
variable, X4, has no effect on it. The graph, as well as the updated means, changes to reflect this.

The rest of the window has the same functionality as a SEM instantiated model window, except as noted above.

Knowledge Box
The knowledge box takes as input a graph or a data set and imposes additional constraints onto it, to aid with search.

Possible Parent Boxes of the Knowledge Box:


A graph box
A parametric model box
An instantiated model box
A data box
A simulation box
A search box
Another knowledge box

Possible Child Boxes of the Knowledge Box:


A search box
Another knowledge box

Tiers and Edges


The tiers and edges option allows you to sort variables into groupings that can or cannot affect each other. It also allows you to manually add forbidden
and required edges one at a time.

Tiers
The tiers tab for a graph with ten variables looks like this:

cmu-phil.github.io/tetrad/manual/ 38/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

Tiers separate your variables into a time line. Variables in higher-numbered tiers occur later than variables in lower-numbered tiers, which gives Tetrad
information about causation. For example, a variable in Tier 3 could not possibly be a cause of a variable in Tier 1.

To place a variable in a tier, click on the variable in the “Not in tier” box, and then click on the box of the tier. If you check the “Forbid Within Tier” box for
a tier, variables in that tier will not be allowed to be causes of each other. To increase or decrease the number of tiers, use the scrolling box in the upper
right corner of the window.

You can quickly search, select and place variables in a tier using the Find button associated with each tier. Enter a search string into the Find dialogue
box using asterisks as wildcard indicators. E.g., "X1*" would find and select variables X1 and X10.

You can also limit the search such that edges from one tier only are added to the next immediate tier e.g,. if Tier 1 "Can cause only next tier" is checked
then edges from variables in Tier 1 to variables in Tier 3 are forbidden.

Handling of Interventional Variables in Tiers


If you have annotated your variables with interventional status and interventional value tags using a metadata JSON file (see Data Box section) the
Tiers and Edges panel will automatically place these variables in Tier 1. If you have information about the effects of the intervention variables you can
use the groups tab to indicate this.

Groups
The groups tab for a graph with four variables looks like this:

cmu-phil.github.io/tetrad/manual/ 39/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

In the groups tab, you can specify certain groups of variables which are forbidden or required to cause other groups of variables. To add a variable to
the “cause” section of a group, click on the variable in the box at the top, and then click on the box to the left of the group’s arrow. To add a variable to
the “effect” section of a group, click on the variable in the box at the top, and then click on the box to the right of the group’s arrow. You can add a group
by clicking on one of the buttons at the top of the window, and remove one by clicking the “remove” button above the group’s boxes.

Edges
The edges tab for a graph with four variables looks like this:

cmu-phil.github.io/tetrad/manual/ 40/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

In the edges tab, you can require or forbid individual causal edges between variables. To add an edge, click the type of edge you’d like to create, and
then click and drag from the “cause” variable to the “effect” variable.

You can also use this tab to see the effects of the knowledge you created in the other tabs by checking and unchecking the boxes at the bottom of the
window. You can adjust the layout to mimic the layout of the source (by clicking “source layout”) or to see the variables in their timeline tiers (by clicking
“knowledge layout”).

Forbidden Graph
If you use a graph as input to a knowledge box with the “Forbidden Graph” operation, the box will immediately add all edges in the parent graph as
forbidden edges. It will otherwise work like a Tiers and Edges box.

Required Graph
If you use a graph as input to a knowledge box with the “Required Graph” operation, the box will immediately add all edges in the parent graph as
required edges. It will otherwise work like a Tiers and Edges box.

Measurement Model
This option allows you to build clusters for a measurement model. When first opened, the window looks like this:

cmu-phil.github.io/tetrad/manual/ 41/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

You can change the number of clusters using the text box in the upper right hand corner. To place a variable in a cluster, click and drag the box with its
name into the cluster pane. To move multiple variables at once, shift- or command-click on the variables, and (without releasing the shift/command
button or the mouse after the final click) drag. In the search boxes, these variables will be assumed to be children of a common latent cause.

Simulation Box
The simulation box takes a graph, parametric model, or instantiated model and uses it to simulate a data set.

Possible Parent Boxes of the Simulation Box


A graph box
A parametric model box
An instantiated model box
An estimator box
A data box
Another simulation box
A search box
An updater box
A regression box

Possible Child Boxes of the Simulation Box


A graph box
A compare box
A parametric model box
An instantiated model box
An estimator box
A data box

cmu-phil.github.io/tetrad/manual/ 42/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Another simulation box
A search box
A classify box
A regression box
A knowledge box

Using the Simulator Box


When you first open the simulation box, you will see some variation on this window:

The “True Graph” tab contains the graph from which data is simulated.

The Simulation Box with no Input


Because it has no input box to create constraints, a parentless simulation box offers the greatest freedom for setting the graph type, model type, and
parameters of your simulated data. In particular, it is the only way that the simulation box will allow you to create a random graph or graphs within the
box. (If you are simulating multiple data sets, and want to use a different random graph for each one, you can select “Yes” under “Yes if a different graph
should be used for each run.”) You can choose the type of graph you want Tetrad to create from the “Type of Graph” drop-down list.

Random Forward DAG

This option creates a DAG by randomly adding forward edges (edges that do not point to a variable’s ancestors) one at a time. You can specify graph
parameters such as number of variables, maximum and minimum degrees, and connectedness.

Erdos Renyi DAG

This option creates a DAG by randomly adding edgew with a given edge probability. The graph is then oriented as a DAG by choosing a causal order.

Scale Free DAG

This option creates a DAG whose variable’s degrees obey a power law. You can specify graph parameters such as number of variables, alpha, beta,
and delta values.

Cyclic, constructed from small loops

This option creates a cyclic graph. You can specify graph parameters such as number of variables, maximum and average degrees, and the probability
of the graph containing at least one cycle.

It is very important when dealing with cyclic models to realize that the potential exists always to instantiate these models with coefficients that are too
large. Always, to keep simulations from "exploding" ("diverging"--i.e., having simulation values that tend to infinity over time), it is necessary to make
sure that coefficient values are relatively small, usually less than 1. One can tell whether a model will produce simulations that diverge in value by
testing the eigenvalues of the covariance matrix of the data. If any of these eigenvalues are greater than 1, the potential exists for the simulation to
"explode" toward infinity over time.

Random One Factor MIM

This option creates a one-factor multiple indicator model. You can specify graph parameters such as number of latent nodes, number of measurements
per latent, and number of impure edges.

cmu-phil.github.io/tetrad/manual/ 43/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Random Two Factor MIM

This option creates a two-factor multiple indicator model. You can specify graph parameters such as number of latent nodes, number of measurements
per latent, and number of impure edges.

In addition to the graph type, you can also specify the type of model you would like Tetrad to simulate.

Bayes net

Simulates a Bayes instantiated model. You can specify model parameters including maximum and minimum number of categories for each variable.

Structural Equation Model

Simulates a SEM instantiated model. You can specify model parameters including coefficient, variance, and covariance ranges.

Linear Fisher Model

Simulates data using a linear Markov 1 DBN without concurrent edges. The Fisher model suggests that shocks should be applied at intervals and the
time series be allowed to move to convergence between shocks. This simulation has many parameters that can be adjusted, as indicated in the
interface. The ones that require some explanation are as follows.

Low end of coefficient range, high end of coefficient range, low end of variance range, high end of variance range. Each variable is a linear function
of the parents of the variable (in the previous time lag) plus Gaussian noise. The coefficients are drawn randomly from U(a, b) where a is the low
end of the coefficient range and b is the high end of the coefficient range. Here, a < b. The Gaussian noise is drawn uniformly from U(c, d), where c
is the low end of the variance range and d is the high end of the variance range. Here, c < d.
Yes, if negative values should be considered. If no, only positive values will be recorded. This should not be used for large numbers of variables,
since it is more difficult to find cases with all positive values when the number of variables is large.
Percentage of discrete variables. The model generates continuous data, but some or all of the variables may be discretized at random. The user
needs to indicate the percentage of variables (randomly chosen that one wishes to have discretized. The default is zero—i.e., all continuous
variables.
Number of categories of discrete variables. For the variables that are discretized, the number of categories to use to discretize each of these
variables.
Sample size. The number of records to be simulated.
Interval between shocks. The number of time steps between shocks in the model.
Interval between data recordings. The data are recorded every so many steps. If one wishes to allow to completely converge between steps (i.e.,
produce equilibrium data), set this interval to some large number like 20 and set the interval between shocks likewise to 20 Other values can be
used, however.
Epsilon for convergence. Even if you set the interval between data recordings to a large number, you can specify an epsilon such that if all values of
variables differ from their values one time step back by less than epsilon, the series will be taken to have converged, and the remaining steps
between data recordings will be skipped, the data point being recorded at convergence.

Lee & Hastie

This is a model for simulating mixed data (data with both continuous and discrete variables. The model is given in Lee J, Hastie T. 2013, Structure
Learning of Mixed Graphical Models, Journal of Machine Learning Research 31: 388-396. Here, mixtures of continuous and discrete variables are
treated as log-linear.

Percentage of discrete variables. The model generates continuous data, but some or all of the variables may be discretized at random. The user
needs to indicate the percentage of variables (randomly chosen that one wishes to have discretized. The default is zero—i.e., all continuous
variables.
Number of categories of discrete variables. For the variables that are discretized, the number of categories to use to discretize each of these
variables.
Sample size. The number of records to be simulated.

Time Series

This is a special simulation for representing time series. Concurrent edges are allowed. This can take a Time Series Graph as input, in which variables
in the current lag are written as functions of the parents in the current and previous lags.

Sample size. The number of records to be simulated.

Boolean Glass

The instantiated model used to simulate the data will be re-parameterized for each run of the simulation.

The Simulation Box with a Graph Input


If you input a graph, you will be able to simulate any kind of model, with any parameters. But the model will be constrained by the graph you have input
(or the subgraph you choose in the “True Graph” tab.) Because of this, if you create a simulation box with a graph as a parent, you will not see the
“Type of Graph” option.

The Simulation Box with a Parametric Model Input


At the time of writing, a simulation box with a parametric model input acts as though the PM’s underlying graph had been input into the box.

The Simulation Box with an Instantiated Model Input

cmu-phil.github.io/tetrad/manual/ 44/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
If you input an instantiated model, your only options will be the sample size of your simulation and the number of data sets you want to simulate; Tetrad
will simulate every one of them based on the parameters of the IM. The model will not be re-parameterized for each run of the simulation.

Search Box
The search box takes as input a data set (in either a data or simulation box) and optionally a knowledge box, and searches for causal explanations
represented by directed graphs. The result of a search is not necessarily—and not usually—a unique graph, but an object such as a CPDAG that
represents a set of graphs, usually a Markov Equivalence class. More alternatives can be found by varying the parameters of search algorithms.

Possible Parent Boxes of the Search Box


A graph box
A parametric model box
An instantiated model box
An estimator box
A data box
A simulation box
Another search box
A regression box
A knowledge box

Possible Child Boxes of the Simulation Box


A graph box
A compare box
A parametric model box
A simulation box
Another search box
A knowledge box

Using the Search Box


Using the search box requires you to select an algorithm (optionally select a test/score), confirm/change search parameters and finally run the search.

The search box first asks what algorithm, statistical tests and/or scoring functions you would like to use in the search. The upper left panel allows you to
filter for different types of search algorithms with the results of filtering appearing in the middle panel. Selecting a particular algorithm will update the
algorithm description on the right panel.

cmu-phil.github.io/tetrad/manual/ 45/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Choosing the correct algorithm for your needs is an important consideration. Tetrad provides over 30 search algorithms (and more are added all of the
time) each of which makes different assumptions about the input data, uses different parameters, and produces different kinds of output. For instance,
some algorithms produce Markov blankets or CPDAGs, and some produce full graphs; some algorithms work best with Gaussian or non-Gaussian
data; some algorithms require an alpha value, some require a penalty discount, and some require both or neither. You can narrow down the list using
the “Algorithm filter" panel, which allows you to limit the provided algorithms according to whichever factor is important to you.

Depending on the datatype used as input for the search (i.e., continuous, discrete, or mixed data) and algorithm selected, the lower left panel will
display available statistical tests (i.e., tests of independence) and Bayesian scoring functions.

After selecting the algorithm and desired test/score, click on "Set parameters" which will allow you to confirm/change the parameters of the search.

After optionally changing any search parameters, click on "Run Search and Generate Graph" which will execute the search.

Notably there are some experimental algorithms available in this box. To see these, select File->Settings->Enable Experimental.

Search Algorithms
PC
Description

PC algorithm (Spirtes and Glymour, Social Science Computer Review, 1991) is a CPDAG search which assumes that the underlying causal structure of
the input data is acyclic, and that no two variables are caused by the same latent (unmeasured) variable. In addition, it is assumed that the input data
set is either entirely continuous or entirely discrete; if the data set is continuous, it is assumed that the causal relation between any two variables is
linear, and that the distribution of each variable is Normal. Finally, the sample should ideally be i.i.d.. Simulations show that PC and several of the other
algorithms described here often succeed when these assumptions, needed to prove their correctness, do not strictly hold. The PC algorithm will
sometimes output double headed edges. In the large sample limit, double headed edges in the output indicate that the adjacent variables have an
unrecorded common cause, but PC tends to produce false positive double headed edges on small samples.

The PC algorithm is correct whenever decision procedures for independence and conditional independence are available. The procedure conducts a
sequence of independence and conditional independence tests, and efficiently builds a CPDAG from the results of those tests. As implemented in
TETRAD, PC is intended for multinomial and approximately Normal distributions with i.i.d. data. The tests have an alpha value for rejecting the null
hypothesis, which is always a hypothesis of independence or conditional independence. For continuous variables, PC uses tests of zero correlation or
zero partial correlation for independence or conditional independence respectively. For discrete or categorical variables, PC uses either a chi square or
a g square test of independence or conditional independence (see Causation, Prediction, and Search for details on tests). In either case, the tests
require an alpha value for rejecting the null hypothesis, which can be adjusted by the user. The procedures make no adjustment for multiple testing.
(For PC, CPC, JPC, JCPC, FCI, all testing searches.)

The PC algorithm as given in Causation, Prediction and Search (Spirtes, Glymour, and Scheines, 2000) comes with three heuristics designed to reduce
dependence on the order of the variables. The heuristic PC-1 simple sorts the variables in alphabetical order. The heuristic PC-2 and PC-3 sort edges
by their p-values in the search. PP-3 further sorts parents of nodes in reverse order by the p-values of the conditional independence facts used to
removed edges in the search. Please see Causation, Prediction, and Search for more details for these heuristics.

Note: If one wants to analyze time series data using this algorithms, one may set the time lag parameter to a value greater than 0, which will
automatically apply the time lag transform. The same goes for any algorithm that has this parameter available in the interface.

Input Assumptions

The algorithm effectively takes conditional independence facts as input. Thus it will work for any type of data for which a conditional independence facts
are known. In the interface, it will work for linear, Gaussian data (the Fisher Z test), discrete multinomial data the Chi Square test) and mixed
multinomial/Gaussian data (the Conditional Gaussian test).

Output Format

The graph outputs a CPDAG. This is an equivalence class of directed acyclic graphs (DAGs). Each DAG in the equivalence class has all of the
adjacencies (and no more) of the CPDAG. Each oriented edge in the CPDAG is so oriented in each of the DAG in the equivalence class. Unoriented
edges in the equivalence class cannot be oriented by conditional independence facts. For example, if the model is X->Y->Z, the output will be X—Y—Z.
There are not collider in this model, so the algorithm will not detect one. Since there are not colliders, the Meek cannot orient additional edges. If the
model were X<-Y<-Z, the output would also be X—Y—Z; this model is in the same equivalence class as X->Y->Z. The model X->Y<-Z would be its own
equivalence class, since the collider in this model can be oriented. See Spirtes et al. (2000) for more details.

Parameters

alpha, depth

The CPC Algorithm


Description

The CPC (Conservative PC) algorithm (Ramsey et al., ??) modifies the collider orientation step of PC to make it more conservative—that is, to increase
the precision of collider orientations at the expense of recall. It does this as follows. Say you want to orient X—Y—Z as a collider or a noncollider; the
PC algorithm looks at variables adjacent to X or variables adjacent to Z to find a subset S such that X is independent of Z conditional on S. The CPC
algorithm considers all possible such sets and records the set on which X is conditionally independent of Z. If all of these sets contain Y, it orients X—Y

cmu-phil.github.io/tetrad/manual/ 46/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
—Z as a noncollider. If none of them contains Z, if orient X—Y—Z as a collider. If some contain Z but other don’t, it marks it as ambiguous, with an
underline. Thus, the output is ambiguous between CPDAGs; in order to get a specific CPDAG out of the output, one needs first to decide whether the
underlined triples are colliders or noncolliders and then to apply the orientation rules in Meek (1997).

The PC algorithm is correct whenever decision procedures for independence and conditional independence are available. The procedure conducts a
sequence of independence and conditional independence tests, and efficiently builds a CPDAG from the results of those tests. As implemented in
TETRAD, PC is intended for multinomial and approximately Normal distributions with i.i.d. data. The tests have an alpha value for rejecting the null
hypothesis, which is always a hypothesis of independence or conditional independence. For continuous variables, PC uses tests of zero correlation or
zero partial correlation for independence or conditional independence respectively. For discrete or categorical variables, PC uses either a chi square or
a g square test of independence or conditional independence (see Causation, Prediction, and Search for details on tests). In either case, the tests
require an alpha value for rejecting the null hypothesis, which can be adjusted by the user. The procedures make no adjustment for multiple testing.
(For PC, CPC, JPC, JCPC, FCI, all testing searches.)

Note: If one wants to analyze time series data using this algorithms, one may set the time lag parameter to a value greater than 0, which will
automatically apply the time lag transform. The same goes for any algorithm that has this parameter available in the interface.
Input Assumptions

Same as for PC.

Output Format

An e-CPDAG (extended CPDAG), consistent of directed and undirected edges where some of the triple may have been marked with underlines to
indicate ambiguity, as above. It may be that bidirected edges are oriented as X->Y<->X<-W if two adjacent colliders are oriented; this is not ruled out.

Parameters

alpha, depth

The PcMax Algorithm


Description

Similar in spirit to CPC but orients all unshielded triples using maximum likelihood conditioning sets. The idea is as follows. The adjacency search is the
same as for PC, but colliders are oriented differently. Let X—Y—Z be an unshielded triple (X not adjacent to Z) and find all subsets S from among the
adjacents of X or the adjacents of Z such that X is independent of Z conditional on S. However, instead of using the CPC rule to orient the triple, instead
just list the p-values for each of these conditional independence judgments and pick the set S’ that yields the highest such p-value. Then orient X->Y<-Z
if S does not contain Y and X—Y—Z otherwise. This orients all unshielded triples. It’s possible (though rare) that adjacent triples both be oriented as 2-
cycles, X->Y<->Z<-W. If this happens, pick one of the other of these triples or orient as a collider, arbitrarily. This guarantees that the resulting graph will
be a CPDAG.

Note: If one wants to analyze time series data using this algorithms, one may set the time lag parameter to a value greater than 0, which will
automatically apply the time lag transform. The same goes for any algorithm that has this parameter available in the interface.

Input Assumptions

Same as for PC.

Output Format

Same as PC, a CPDAG.

Parameters

alpha, depth, useMaxPOrientationHeuristic, maxPOrientationMaxPathLength

The FGES Algorithm


Description

FGES is an optimized and parallelized version of an algorithm developed by Meek [Meek, 1997] called the Greedy Equivalence Search (GES). The
algorithm was further developed and studied by Chickering [Chickering, 2002]. GES is a Bayesian algorithm that heuristically searches the space of
CBNs and returns the model with highest Bayesian score it finds. In particular, GES starts its search with the empty graph. It then performs a forward
stepping search in which edges are added between nodes in order to increase the Bayesian score. This process continues until no single edge addition
increases the score. Finally, it performs a backward stepping search that removes edges until no single edge removal can increase the score. More
information is available here and here. The reference is Ramsey et al., 2017.

The algorithms requires a decomposable score—that is, a score that for the entire DAG model is a sum of logged scores of each variables given its
parents in the model. The algorithms can take all continuous data (using the SEM BIC score), all discrete data (using the BDeu score) or a mixture of
continuous and discrete data (using the Conditional Gaussian score); these are all decomposable scores.

Note: If one wants to analyze time series data using this algorithms, one may set the time lag parameter to a value greater than 0, which will
automatically apply the time lag transform. The same goes for any algorithm that has this parameter available in the interface.

Note: It is possible to run FGES followed some non-Gaussian orientation algorithm like FASK-pairwise or R3 or RSkew. To do this, see the FASK
algorithm. There one can select an algorithm to estimate adjacencies and a pairwise algorithm to estimate orientations. This is for the linear, non-
Gaussian case, where such pairwise algorithms are effective.

cmu-phil.github.io/tetrad/manual/ 47/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Input Assumptions

Data that’s all continuous, all discrete, or a mixture of continuous and discrete variables. Continuous variables will be assumed to be linearly associated;
discrete variable will be assumed to be associated by multinomial conditional probability tables. Continuous variables for the mixed case will be
assumed to be jointly Gaussian.

Output Format

A CPDAG, same as PC.

Parameters

samplePrior, structurePrior, penaltyDiscount, symmetricFirstStep, faithfulnessAssumed, maxDegree, parallelized, verbose meekVerbose

The IMaGES Algorithm


Description

Adjusts the selected score for FGES so allow for multiple datasets as input. The linear, Gaussian BIC scores for each data set are averaged at each
step of the algorithm, producing a model for all data sets that assumes they have the same graphical structure across dataset.

Input Assumptions

A set of datasets consistent with the chosen score with the same variables and sample sizes.

Output Format

A CPDAG, interpreted as a common model for all datasets.

Parameters

All of the parameters from FGES are available for IMaGES. Additionally:

numRuns, randomSelectionSize

The FCI Algorithm


Description

The FCI algorithm is a constraint-based algorithm that takes as input sample data and optional background knowledge and in the large sample limit
outputs an equivalence class of CBNs that (including those with hidden confounders) that entail the set of conditional independence relations judged to
hold in the population. It is limited to several thousand variables, and on realistic sample sizes it is inaccurate in both adjacencies and orientations. FCI
has two phases: an adjacency phase and an orientation phase. The adjacency phase of the algorithm starts with a complete undirected graph and then
performs a sequence of conditional independence tests that lead to the removal of an edge between any two adjacent variables that are judged to be
independent, conditional on some subset of the observed variables; any conditioning set that leads to the removal of an adjacency is stored. After the
adjacency phase, the resulting undirected graph has the correct set of adjacencies, but all of the edges are unoriented. FCI then enters an orientation
phase that uses the stored conditioning sets that led to the removal of adjacencies to orient as many of the edges as possible. See [Spirtes, 1993].

Note: If one wants to analyze time series data using this algorithms, one may set the time lag parameter to a value greater than 0, which will
automatically apply the time lag transform. The same goes for any algorithm that has this parameter available in the interface.

Input Assumptions

The data are continuous, discrete, or mixed.

Output Format

A partial ancestral graph (see Spirtes et al., 2000).

Parameters

All of the parameters from FCI are below.

depth, maxPathLength, completeRuleSetUsed

The FCI-Max Algorithm


Description

The FCI-Max algorithm simply changes the first collider orientation rule in FCI to use the PC-Max orientation.

Note: If one wants to analyze time series data using this algorithms, one may set the time lag parameter to a value greater than 0, which will
automatically apply the time lag transform. The same goes for any algorithm that has this parameter available in the interface.

Input Assumptions

The data are continuous, discrete, or mixed.

Output Format

A partial ancestral graph (see Spirtes et al., 2000).

Parameters
cmu-phil.github.io/tetrad/manual/ 48/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
All of the parameters from FCI are below.

depth, maxPathLength, completeRuleSetUsed

The RFCI Algorithm


Description

A modification of the FCI algorithm in which some expensive steps are finessed and the output is somewhat differently interpreted. In most cases this
runs faster than FCI (which can be slow in some steps) and is almost as informative. See Colombo et al., 2012.

Note: If one wants to analyze time series data using this algorithms, one may set the time lag parameter to a value greater than 0, which will
automatically apply the time lag transform. The same goes for any algorithm that has this parameter available in the interface.

Input Assumptions

Data for which a conditional independence test is available.

Output Format

A partial ancestral graph (PAG). See Spirtes et al., 2000.

Parameters

All of the parameters from FCI are available for RFCI. Additionally:

depth, maxPathLength, completeRuleSetUsed

The PAG-Sampling RFCI Algorithm


Description

A modification of the RFCI algorithm which does probabilistic bootstrap sampling with respect to the RFCI PAG output.For discrete data only.
Parameters are: (a) Number of search probabilistic models, (b) boostrap ensemble method to use (see bootstrapping), (c) maximimum size of
conditioning set (depth), (d) maximum length of any discriminating path (a property for RFCI). This must use the probabilistic test, which must be
selected. Parameters for the probabilistic test are (d) independence cutoff threshold, default 0.5, (e) prior equivalent sample size, and (f) whether the
cutoff in (d) is used in the independence test calculation; if not, then a coin flip is used (probaility 0.5).

Input Assumptions

A discrete dataset.

Output Format

A partial ancestral graph (PAG). See Spirtes et al., 2000.

The GFCI Algorithm


Description

GFCI is a combination of the FGES [FGES, 2016] algorithm and the FCI algorithm [Spirtes, 1993] that improves upon the accuracy and efficiency of
FCI. In order to understand the basic methodology of GFCI, it is necessary to understand some basic facts about the FGES and FCI algorithms. The
FGES algorithm is used to improve the accuracy of both the adjacency phase and the orientation phase of FCI by providing a more accurate initial
graph that contains a subset of both the non-adjacencies and orientations of the final output of FCI. The initial set of nonadjacencies given by FGES is
augmented by FCI performing a set of conditional independence tests that lead to the removal of some further adjacencies whenever a conditioning set
is found that makes two adjacent variables independent. After the adjacency phase of FCI, some of the orientations of FGES are then used to provide
an initial orientation of the undirected graph that is then augmented by the orientation phase of FCI to provide additional orientations. A verbose
description of GFCI can be found here (discrete variables) and here (continuous variables).

Note: If one wants to analyze time series data using this algorithms, one may set the time lag parameter to a value greater than 0, which will
automatically apply the time lag transform. The same goes for any algorithm that has this parameter available in the interface.

Input Assumptions

Same as for FCI.

Output Format

Same as for FCI.

Parameters

Uses all of the parameters of FCI (see Spirtes et al., 1993) and FGES (see FGES, 2016).

The GRaSP-FCI Algorithm


Description

GRaSP-FCI is wrapper around the GRaSP algorithm that replaces the FGES step in GFCI with the more accurate GRaSP algrorithm, which reasons by
considering permutations of variables. The second collider orienatation step in GFCI is also done using permutation reasoning, leaving the

cmu-phil.github.io/tetrad/manual/ 49/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
discriminating path rule as the only rule that requires a "raw" conditional independence judgment. Ultimately this independence judgment can be
decided using the same scoring apparatus as the other permutation steps, so ultimatly GRaSP-FCI can be treated as a scoring algorithm.

Note: If one wants to analyze time series data using this algorithms, one may set the time lag parameter to a value greater than 0, which will
automatically apply the time lag transform. The same goes for any algorithm that has this parameter available in the interface.
Input Assumptions

Same as for FCI.

Output Format

Same as for FCI.

Parameters

Uses all of the parameters of FCI (see Spirtes et al., 1993) and GRaSP.

The SP-FCI Algorithm


Description

SP-FCI is wrapper around the SP algorithm that replaces the FGES step in GFCI with the more accurate SP algrorithm, which reasons by considering
permutations of variables. This uses the same method for wrapping SP with an FCI method similar to GFCI as GRaSP-FCI. The difference is that SP
considers every permutation of the variables and so necessarily return a frugal DAG. This can only be used for very small models, of up to about 8 or 9
variables, because of the super-exponential step of considering every permutation of the variables. The second collider orienatation step in GFCI is
done using permutation reasoning, leaving the discriminating path rule as the only rule that requires a "raw" conditional independence judgment.
Ultimately this independence judgment can be decided using the same scoring apparatus as the other permutation steps, so ultimately GRaSP-FCI can
be treated as a scoring algorithm.

Input Assumptions

Same as for FCI.

Output Format

Same as for FCI.

Parameters

Uses all of the parameters of FCI (see Spirtes et al., 1993) and SP.

The SvarFCI Algorithm


Description

The SvarFCI algorithm is a version of FCI for time series data. See the FCI documentation for a description of the FCI algorithm, which allows for
unmeasured (hidden, latent) variables in the data-generating process and produces a PAG (partial ancestral graph). svarFCI takes as input a “time lag
data set,” i.e., a data set which includes time series observations of variables X1, X2, X3, ..., and their lags X1:1, X2:1, X3:1, ..., X1:2, X2:2,X3:2, ... and
so on. X1:n is the nth-lag of the variable X1. To create a time lag data set from a standard tabular data set (i.e., a matrix of observations of X1, X2, X3,
...), use the “create time lag data” function in the data manipulation toolbox. The user will be prompted to specify the number of lags (n), and a new data
set will be created with the above naming convention. The new sample size will be the old sample size minus n.

Since this algorithms specifically requires time series data, one must set the time lag parameter to a value greater than 0, which will automatically apply
the time lag transform.

Input Assumptions

The (continuous) data has been generated by a time series, and the "Convert to Time Lag" converter in the Data box has been used to format the data
as a time lag dataset. (Manual formatting of the data will not work for this.

Output Format

A PAG over the input variables with stated number of lags.

Parameters

alpha

The SvarGFCI Algorithm


Description

SvarGFCI uses a BIC score to search for a skeleton. Thus, the only user-specified parameter is an optional “penalty score” to bias the search in favor of
more sparse models. See the description of the GES algorithm for discussion of the penalty score. For the traditional definition of the BIC score, set the
penalty to 1.0. The orientation rules are the same as for FCI. As is the case with SvarFCI, SvarFCI will automatically respect the time order of the
variables and impose a repeating structure. Firstly, it puts lagged variables in appropriate tiers so, e.g., X3:2 can cause X3:1 and X3 but X3:1 cannot
cause X3:2 and X3 cannot cause either X3:1 or X3:2. Also, it will assume that the causal structure is the same across time, so that if the edge between
X1 and X2 is removed because this increases the BIC score, then also the edge between X1:1 and X2:1 is removed, and so on for additional lags if
they exist. When some edge is removed as the result of a score increase, all similar (or “homologous”) edges are also removed.

cmu-phil.github.io/tetrad/manual/ 50/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Since this algorithms specifically requires time series data, one must set the time lag parameter to a value greater than 0, which will automatically apply
the time lag transform.

Input Assumptions

The (continuous) data has been generated by a time series, and the "Convert to Time Lag" converter in the Data box has been used to format the data
as a time lag dataset. (Manual formatting of the data will not work for this.

Output Format

A PAG over the input variables with stated number of lags.

Parameters

Uses all of the parameters of FCI (see Spirtes et al., 1993) and FGES (see FGES, 2016).

Input Assumptions

The (continuous) data has been generated by a time series.

Output Format

A PAG over the input variables with stated number of lags.

Parameters

Uses the parameters of IMaGES.

The CCD (Cyclic Causal Discovery) Algorithm


Description

CCD assumes that the data are causally sufficient (no latent variables) though possibly with directed cycles. No background knowledge is permitted. It
generates a "cyclic PAG". See Richardson, T. S. (2013). A discovery algorithm for directed cyclic graphs. arXiv preprint arXiv:1302.3599, for details.
Note that the output graph contains circle endpoints as with a latent variable PAG, but these are interpreted differently. CCD reasons about cyclic
(feedback) models using conditional independence facts alone, as with PC or FCI, so is general in this sense.

Input Assumptions

Data from a possibly cyclic (feedback) model without latent variables for which an independence test is available.

Output Format

Same as for FCI.

A cyclic PAG--see reference above.

Parameters

Cutoff for alpha, maximum size of conditioning set, Yes if orient away from arrow rule should be applied, Yes if verbose output should be printed.

The FGES-MB Algorithm


Description

This is a restriction of the FGES algorithm to union of edges over the combined Markov blankets of a set of targets, including the targets. In the
interface, just one target may be specified. See Ramsey et al., 2017 for details. In the general case, finding the graph over the Markov blanket variables
of a target (including the target) is far faster than finding the CPDAG for all of the variables.

Input Assumptions

The same as FGES

Output Format

A graph over a selected group of nodes that includes the target and each node in the Markov blanket of the target. This will be the same as if FGES
were run and the result restricted to just these variables, so some edges may be oriented in the returned graph that may not have been oriented in a
CPDAG over the selected nodes.

Parameters

Uses the parameters of FGES (see FGES, 2016).

targetName

The PC-MB Algorithm


Description

PC-MB. Similar to FGES-MB (see FGES, 2016) but using PC as the basic search instead of FGES. The rules of the PC search are restricted to just the
variables in the Markov blanket of a target T, including T; the result is a graph that is a CPDAG over these variables.

Input Assumptions

cmu-phil.github.io/tetrad/manual/ 51/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Same as for PC

Output Format

A CPDAG over a selected group of nodes that includes the target and each node in the Markov blanket of the target.

Parameters

Uses the parameters of PC.

targetName

The FAS Algorithm


Description

This is just the adjacency search of the PC algorithm, included here for times when just the adjacency search is needed, as when one is subsequently
just going to orient variables pairwise.

Input Assumptions

Same as for PC

Output Format

An undirected graph over the variables of the input dataset. In particular, parents of a variables are not married by FAS, so the resulting graph is not a
Markov random field. For example, if X->Y<-Z, the output will be X—Y—Z with X—Z. The parents of Y will be joined by an undirected edge, morally,
only if they are joined by a trek in the true graph.

Parameters

alpha, depth

The MGM Algorithm


Description

Need reference. Finds a Markov random field (with parents married) for a dataset in which continuous and discrete variables are mixed together. For
example, if X->Y<-Z, the output will be X—Y—Z with X—Z. The parents of Y will be joined by an undirected edge, morally, even though this edge does
not occur in the true model.

Input Assumptions

Data are mixed.

Output Format

A Markov random field for the data.

Parameters

mgmParam1, mgmParam2, mgmParam3

The SP Algorithm
Description

SP (Sparsest Permutation, Raskutti, G., & Uhler, C. (2018). Learning directed acyclic graph models based on sparsest permutations. Stat, 7(1), e183)
searches for model satisfying the SMR (frugality) assumption for small models of up to about 9 variables.

The algorithms works searching over all possible permutations of the variables and building DAGs for them in the same way as the GRaSP algorithm.
Two ways of building DAGs are considered, one independence-based, due to Raskutti and Uhler and a score-based method, using the Grow-Shrink
(GS) method (Margaritis and Thrun, 1999). If the Pearl method is selected, an independence test will be used; if the GS method is selected, a score will
be used, so both a test and a score need to be supplied so that this choice can be made.

Input Assumptions

Causal sufficiency

Output Format

The CPDAG equivalence class of DAGs representing estimated possible causal structures over the set of variables.

Parameters

graspDepth, graspNonSingularDepth graspSingularDepth graspOrderedAlg graspUseVermaPearl verbose numStarts

The GRaSP Algorithm


Description

GRaSP (Greedy Relations of Sparsest Permutation) is an algorithm that generalizes and extends the GSP (Greedy Sparsest Permutation) algorithm. Is
generalizes specifically the algorithms TSP (or which GSP is a bounded and iterated version) and ESP (Solus et al., 2017) by allowing those algorithms

cmu-phil.github.io/tetrad/manual/ 52/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
to equivalently be run by selecting particular parameterizations of GRaSP, where ESP enlarges the search space of GSP. The implementation given for
ESP renders that algorithm tractable. By choosing other parameterizations of GRaSP, it is possible to further enlarge the search space even over and
above that of ESP to render the algorithm significantly more accurate. In all cases, these algorithms come back quickly enough to analyze accurately
sparse problems of up to 300 variables and denser problems with up to about 100 variables.

The parameters are as follows:

graspDepth - This controls the overall recursion depth for a depth first search.

graspNonsingularDepth - This controls the depth at which nonsingular tucks are explored.

graspSingularDepth - This controls the depth at which singlular tucks are considered.

numRestarts - By default 1; if > 1, additional random restarts are done, and the best of these results is returned.

TSP corresponds to singular depth 0, singular depth 0. ESP corresponds to singular depth > 0, singular depth = 0. GRaSP corresponds to singular
depth > 0, singular depth > 0. In each case, an ordering option is available to find best permutations from lower levels before proceeding to higher
levels.

The algorithms works by building DAGs given permutations in ways similar to those described in Raskutti and Uhler (ref?) and Solus et al. Two ways of
building DAGs are considered, one independence-based, due to Raskutti and Uhler and a score-based method, using the Grow-Shrink (GS) method
(Margaritis and Thrun, 1999). If the Pearl method is selected, an independence test will be used; if the GS method is selected, a score will be used, so
both a test and a score need to be supplied so that this choice can be made.

We recommend that the user turn on logging to watch the progress of the algorithm; this helps with larger searches especially.

Knowledge of forbidden edges may be used with GRaSP; currently knowledge of required edges is not implemented.

Dimitris Margaritis and Sebastian Thrun. Bayesian network induction via local neighborhoods. Advances in neural information processing systems, 12,
1999.

Raskutti, G., & Uhler, C. (2018). Learning directed acyclic graph models based on sparsest permutations. Stat, 7(1), e183.

Solus, L., Wang, Y., Matejovicova, L., & Uhler, C. (2017). Consistency guarantees for permutation-based causal inference algorithms. arXiv preprint
arXiv:1702.03530.
Input Assumptions

Causal sufficiency

Output Format

The CPDAG equivalence class of DAGs representing estimated possible causal structures over the set of variables.

Parameters

graspDepth, graspNonSingularDepth graspSingularDepth graspOrderedAlg graspUseVermaPearl verbose numStarts

The BPC/Mimbuild Algorithm


Description

Searches for causal structure over latent variables, where the true models are Multiple Indicator Models (MIM’s) as described in the Graphs section.
The idea is this. There is a set of latent (unmeasured) variables over which a directed acyclic model has been defined. Then for each of these latent L
there are 3 (preferably 4) or more measures of that variable—that is, measured variables that are all children of L. Under these conditions, one may
define tetrad constraints (see Spirtes et al., 2000). There is a theorem to the effect that if certain CPDAGs of these tetrad constraints hold, there must
be a latent common cause of all of them (the Tetrad Representation Theorem, see Silva et al., 2003, where the BPC (“Build Pure Clusters”) algorithm is
defined and discussed.) Moreover, once one has such a “measurement model,” once can estimate a covariance matrix over the latent variables that are
parents of the measures and use some algorithm such as PC or GES to estimate a CPDAG over the latents. The algorithm to run PC or GES on this
covariance matrix is called MimBuild (“MIM” is the the graph, Multiple Indicator Model; “Build” means build). In this way, one may recover causal
structure over the latents. The more measures one has for each latent, the better the result is, generally. The larger the sample size the better. One
important issue is that the algorithm is sensitive to so-called “impurities”—that is, causal edges among the measured variables, or between measured
variables and unintended latent. The algorithm will in effect remove one measure in each impure pair from consideration. The algorithm to run PC or
GES on this covariance matrix is called MimBuild (“MIM” is the graph, Multiple Indicator Model; “Build” means build). MimBUILD is an optional choice
inside FOFC In this way, one may recover causal structure over the latents. The more measures one has for each latent the better the result is,
generally.

Input Assumptions

Continuous data, a collection of measurements in the above sense, excluding the latent variables (which are to be learned).

Output Format

For BPC, a measurement model, in the above sense. This is represented as a clustering of variables; it may be inferred that there is a single latent for
each output cluster. For MimBuild, a CPDAG over the latent variables, one for each cluster.

Parameters

cmu-phil.github.io/tetrad/manual/ 53/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
alpha, useWishart

The FOFC/MIMBUILD Algorithm


Description

Searches for causal structure over latent variables, where the true models are Multiple Indicator Models (MIM’s) as described in the Graphs section.
The idea is this. There is a set of latent (unmeasured) variables over which a directed acyclic model has been defined, Then for each of these latent L
there are 3 (preferably 4) or more measures of that variable—that is, measured variables that are all children of L. Under these conditions, one may
define tetrad constraints (see Spirtes et al., 2000). There is a theorem to the effect that if certain CPDAGs of these tetrad constraints hold, there must
be a latent common cause of all of them (the Tetrad Representation Theorem). The FOFC (Find One Factor Clusters) takes advantage of this fact. The
basic idea is to build up clusters one at a time by adding variables that keep them pure in the sense that all relevant tetrad constraints still hold. There
are different ways of going about this. One could try to build one cluster up as far as possible, then remove all of those variables from the set, and try to
make a another cluster using the remaining variables (SAG, Seed and Grow). Or one can try in parallel to grow all possible clusters and then choose
among the grown clusters using some criterion such as cluster size (GAP, Grow and Pick). In general, GAP is more accurate. The result is a clustering
of variables. Once one has such a “measurement model, one can estimate (using the ESTIMATOR box) a covariance matrix over the latent variables
that are parents of the measures and use some algorithm such as PC or GES to estimate a CPDAG over the latent variables. The algorithm to run PC
or GES on this covariance matrix is called MimBuild (“MIM” is the graph, Multiple Indicator Model; “Build” means build). MimBUILD is an optional choice
inside FOFC In this way, one may recover causal structure over the latents. The more measures one has for each latent the better the result is,
generally. At least 3 measured indicator variables are needed for each latent variable. The larger the sample size the better. One important issue is that
the algorithm is sensitive to so-called “impurities”—that is,causal edges among the measured variables, or between measured variables and multiple
latent variables. The algorithm will in effect remove one measure in each impure pair from consideration. Note that for FOFC, a test is done for each
final cluster as to whether the variables in teh cluster are all mutually dependent. In the interface, in order to see teh results of this test, one needs to
open the logging window. See the Logging menu.

The FTFC Algorithm


Description

FTFC (Find Two Factor Clusters) is similar to FOFC, but instead of each cluster having one latent that is the parent of all of the measure in the cluster, it
instead has two such latents. So each measure has two latent parents; these are two “factors.” Similarly to FOFC, constraints are checked for, but in
this case, the constraints must be sextad constraints, and more of them must be satisfied for each pure cluster (see Kummerfelt et al., 2014). Thus, the
number of measures in each cluster, once impure edges have been taken into account, must be at least six, preferably more.

Input Assumptions

Continuous data over the measures with at least six variable variables in each cluster once variables involve in impure edges have been removed.

Output Format

A clustering of measures. It may be assumed that each cluster has at least two factors and that the clusters are pure.

Parameters

alpha, useWishart, useGap

The LiNGAM Algorithm


Description

LiNGAM (Shimizu et al., 2006) was one of the first of the algorithms that assumed linearity among the variables and non-Gaussianity of error term, and
still one of the best for smaller models, for the basic algorithm, implemented here. The idea is to use the Independent Components Analysis (ICA)
algorithm to check all permutations of the variables to find one that is a causal order—that is, one in which earlier variables can cause later variables
but not vice-versa. The method is clever. First, since we assume the model is a directed acyclic graph (DAG), there must be some permutation of the
variables for which the main diagonal of the inverse of the weight matrix contains no zeros. This gives us a permuted estimate of the weight matrix.
Then we look for a permutation of this weight matrix that is lower triangular. There must be one, since the model is assumed to be a DAG. But a lower
triangular weight matrix just gives a causal order, so we’re done.

In the referenced paper, we implement Algorithm A, which is described above. Once one has a causal order the only thing one needs to do is to
eliminate the extra edges. For this, we use the causal order to define knowledge of tiers and run FGES.

Our implementation of LiNGAM has one parameter, penalty discount, used for the FGES adjacency search. The method as implemented does not scale
much beyond 10 variables, because it is checking every permutation of all of the variables (twice). The implementation of ICA we use is FastIca
(Hyvärinen et al., 2004).

Shimizu, S., Hoyer, P. O., Hyvärinen, A., & Kerminen, A. (2006). A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning
Research, 7(Oct), 2003-2030.

Hyvärinen, A., Karhunen, J., & Oja, E. (2004). Independent component analysis (Vol. 46). John Wiley & Sons.

The FASK Algorithm


Description

FASK learns a linear model in which all of the variables are skewed.

cmu-phil.github.io/tetrad/manual/ 54/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
The idea is as follows. First, FAS-stable is run on the data, producing an undirected graph. We use the BIC score as a conditional independence test
with a specified penalty discount c. This yields undirected graph G0 . The reason FAS-stable works for sparse cyclic models where the linear
coefficients are all less than 1 is that correlations induced by long cyclic paths are statistically judged as zero, since they are products of multiple
coefficients less than 1. Then, each of the X − Y adjacencies in G0 is oriented as a 2-cycle X += Y , or X → Y , or X ← Y . Taking up each adjacency in
turn, one tests to see whether the adjacency is a 2-cycle by testing if the difference between corr(X, Y ) and corr(X, Y |X > 0), and corr(X, Y ) and corr(X,
Y |Y > 0), are both significantly not zero. If so, the edges X → Y and X ← Y are added to the output graph G1 . If not, the Left-Right orientation is rule is
applied: Orient X → Y in G1, if (E(X Y |X > 0)/ E(X 2|X > 0)E(Y 2 |X > 0) − E(X Y |Y > 0)/ E(X 2 |Y > 0)E(Y 2|Y > 0)) > 0; otherwise orient X ← Y . G1 will
be a fully oriented graph. For some models, where the true coefficients of a 2-cycle between X and Y are more or less equal in magnitude but opposite
in sign, FAS-stable may fail to detect an edge between X and Y when in fact a 2-cycle exists. In this case, we check explicitly whether corr(X, Y |X > 0)
and corr(X, Y |Y > 0) differ by more than a set amount of 0.3. If so, the adjacency is added to the graph and oriented using the aforementioned rules.

We include pairwise orientation rule RSkew, Skew, and Tanh from Hyvärinen, A., & Smith, S. M. (2013). Pairwise likelihood ratios for estimation of non-
Gaussian structural equation models. Journal of Machine Learning Research, 14(Jan), 111-152, so in some configurations FASK can be made to
implement an algorithm that has been called in the literature "Pairwise LiNGAM"--this is intentional; we do this for ease of comparison. You'll get this
configuration if you choose one of these pairwise orientation rules, together with the FAS with orientation alpha and two-cycle threshold set to zero and
skewness threshold set to 1, for instance.

See Sanchez-Romero R, Ramsey JD, Zhang K, Glymour MR, Huang B, Glymour C. Causal discovery of feedback networks with functional magnetic
resonance imaging. Network Neuroscience 2018.
Input Assumptions

Continuous, linear data in which all the variables are skewed.

Output Format

A fully directed, potentially cyclic, causal graph.

The FASK-Vote Algorithm


Description

FASK-Vote is a metascript that learns a model from a list of datasets in a method similar to IMaGES (see). For adjacencies, it uses FAS-Stable with the
voting-based score from IMaGES used as a test (using all of the datasets, standardized), producing a single undirected graph G. It then orients each
edge X--Y in G for each dataset using the FASK (see) left-right rule and orient X->Y if that rule orients X--Y as such in at least half of the datasets. The
final graph is returned.

For FASK, See Sanchez-Romero R, Ramsey JD, Zhang K, Glymour MR, Huang B, Glymour C. Causal discovery of feedback networks with functional
magnetic resonance imaging. Network Neuroscience 2018.

Input Assumptions

Same as FASK.

Output Format

Same as FASK.

Orientation Algorithms (R3, RSkew, Skew)


Description

This is an algorithm that orients an edge X--Y for continuous variables based on non-Gaussian information. This rule in particular uses an entropy
calculation to make the orientation. Note that if the variables X and Y are both Gaussian, and the model is linear, it is not possible to orient the edge X--
Y pairwise; any attempt to do so would result in random orientation. But if X and Y are non-Gaussian, the orientation is fairly easy. This rule is similar to
Hyvarinen and Smith's (2013) EB rule, but using Anderson Darling for the measure of non-Gaussianity, to somewhat better effect. See Ramsey et al.
(2012).

This is an algorithm that orients an edge X--Y for continuous variables based on non-Gaussian information. This rule in particular uses a skewness to
make the orientation. Note that if the variables X and Y are both Gaussian, and the model is linear, it is not possible to orient the edge X--Y pairwise;
any attempt to do so would result in random orientation. But if X and Y are non-Gaussian, in particular in this case, if X and Y are skewed, the
orientation is relatively straightforward. See Hyvarinen and Smith (2013) for details.

The Skew rule is differently motivated from the RSkew rule (see), though they both appeal to the skewness of the variables.

This is an algorithm that orients an edge X--Y for continuous variables based on non-Gaussian information. This rule in particular uses a skewness to
make the orientation. Note that if the variables X and Y are both Gaussian, and the model is linear, it is not possible to orient the edge X--Y pairwise;
any attempt to do so would result in random orientation. But if X and Y are non-Gaussian, in particular in this case, if X and Y are skewed, the
orientation is relatively straightforward. See Hyvarinen and Smith (2013) for details.

The RSkew rule is differently motivated from the Skew rule (see), though they both appeal to the skewness of the variables.

This is an algorithm that orients an edge X--Y for continuous variables based on non-Gaussian information. This rule in particular uses the FASK
pairwise rule to make the orientation. Note that if the variables X and Y are both Gaussian, and the model is linear, it is not possible to orient the edge
X--Y pairwise; any attempt to do so would result in random orientation. But if X and Y are non-Gaussian, in particular in this case, if X and Y are
skewed, the orientation is relatively straightforward. See Hyvarinen and Smith (2013) for details.

cmu-phil.github.io/tetrad/manual/ 55/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
The FASK-PW rule appeals to skewness in a different way than Skew and RSkew.
Input Assumptions

Continuous data in which the variables are non-Gaussian. Non-Gaussianity can be assessed using the Anderson-Darling score, which is available in
the Data box.

Output Format

Orients all of the edges in the input graph using the selected score.

Parameters

alpha, depth

The CStaR Algorithm


Description

The CStaR algorithm (Causal Stability Ranking, Stekhoven, D. J., Moraes, I., Sveinbjörnsson, G., Hennig, L., Maathuis, M. H., & Bühlmann, P. 2012.
Causal stability ranking. Bioinformatics, 28(21), 2819-2823) calculates lower bounds on estimated parameters for the causally sufficient case. It first
runs a CPDAG algorithm and then for X->Y locally about Y finds all possible orientation of the edges in the CPDAG and does an estimation for each of
these and finds their lower bound. In the interface, all nodes that are found to have significant impace on a given target nodes are marked as into that
target node. However, the more useful thing is to look at the CStaR table produced by the procedure. To see this table, either run CStaR on a dataset,
specifying a target node, and the table will be printed out, or run Tetrad using java -jar and look at the console output, or if these methods are not
available, turn on logging in the interface before running the methods, and the table will be printed out. The table is to interpreted as in the Steckhoven
et al. paper cited above (see), and is in the same format.

Input Assumptions

Same as for PC.

Statistical Tests
All of the below tests do testwise deletion as a default way of dealing with missing values. For testwise deletion, if a test, say, I(X, Y | Z), is done,
columns for X, Y, and Z are scanned for missing values. If any row occurs in which X, Y, or Z is missing, that row is deleted from the data for those three
variables. So if a different test, I(R, W | Q, T) is done, different rows may be stricken from the data. That is, the deletion is done testwise. For a useful
discussion of the testwise deletion condition, see for instance Tu, R., Zhang, C., Ackermann, P., Mohan, K., Kjellström, H., & Zhang, K. (2019, April).
Causal discovery in the presence of missing data. In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 1762-1770). PMLR.
For all of these tests, if no data are missing, the behavior will be as if testwise deletion were not being done.

BDeu Test
This is a test based on the BDeu score given in Heckerman, D., Geiger, D., & Chickering, D. M. (1995). Learning Bayesian networks: The combination
of knowledge and statistical data. Machine learning, 20(3), 197-243, used as a test. This gives a score for any two variables conditioned on any list of
others which is more positive for distributions which are more strongly dependent. The test for X _||_ Y | Z compares two different models, X conditional
on Y, and X conditional on Y and Z; the scores for the two models are subtracted, in that order. If the difference is negative, independence is inferred.

Parameters

equivalentSamplelSize, structurePrior

Fisher Z Test
Fisher Z judges independence if the conditional correlation is cannot statistically be distinguished from zero. Primarily for the linear, Gaussian case.

Parameters

alpha

SEM BIC Test


This uses the SEM BIC Score to create a test for the linear, Gaussian case, where we include an additional penalty term, which is commonly used. We
call this the penalty discount. So our formulas has BIC = 2L - ck log N,where L is the likelihood, c the penalty discount (usually greater than or equal to
1), and N the sample size. Since the assumption is that the data are distributed as Gaussian, this reduces to BIC = -n log sigma - ck ln N, where sigma
is the standard deviation of the linear residual obtained by regressing a child variable onto all of its parents in the model.

Parameters

penaltyDiscount

Kim et al. Scores


This is a set of generalized information criterion (GIC) scores, used as tests, based on the paper, Kim, Y., Kwon, S., & Choi, H. (2012). Consistent
model selection criteria on high dimensions. The Journal 0of Machine Learning Research, 13(1), 1037-1057. One needs to select which lambda to use

cmu-phil.github.io/tetrad/manual/ 56/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
in place of the usual lambda for the linear, Gaussian BIC score. An penalty discount parameter may also be specified, though this is by default for these
scores equal to 1 (since the lambda choice is essentially picking a penalty discount for you).

MAG SEM BIC Test


This gives a BIC score (used as a test here) for a Mixed Ancestral Graph (MAG).

Probabilistic Test
The Probabilistic Test applies a Bayesian method to derive the posterior probability of an independence constraint R = (X⊥Y|Z) given a dataset D. This
is intended for use with datasets with discrete variables. It can be used with constraint-based algorithms (e.g., PC and FCI). Since this test provides a
probability for each independence constraint, it can be used stochastically by sampling based on the probabilities of the queried independence
constraints to obtain several output graphs. It can also be used deterministically by using a fixed decision threshold on the probabilities of the queried
independence constraints to generate a single output graph.

Parameters

noRandomlyDeterminedIndependence cutoffIndTest priorEquivalentSampleSize

Conditional Correlation Independence (CCI) Test


CCI ("Conditional Correlation Independence") is a fairly general independence test—not completely general, but general for additive noise models—that
is, model in which each variable is equal to a (possibly nonlinear) function of its parents, plus some additive noise, where the noise may be arbitrarily
distributed. That is, X = f(parent(X)) + E, where f is any function and E is noise however distributed; the only requirement is that thre be the “+” in the
formula separating the function from the noise. The noise can’t for instance, be multiplicative, e.g., X = f(parent(X)) x E. The goal of the method is to
estimate whether X is independent of Y given variables Z, for some X, Y, and Z. It works by calculating the residual of X given Z and the residual of Y
given Z and looking to see whether those two residuals are independent. This test may be used with any constraint-based algorithm (PC, FCI, etc.).

Parameters

alpha, numBasisFunctions, kernelType, kernelMultiplier, basisType, kernelRegressionSampleSize

Chi Square Test


This is the usual Chi-Square test for discrete variables; consult an introductory statistics book for details for the unconditional case, where you're just
trying, e.g., to determine if X and Y are independent. For the conditional case, the test proceeds as in Fienberg, S. E. (2007). The analysis of cross-
classified categorical data, Springer Science & Business Media, by identifying and removing from consideration zero rows or columns in the conditional
tables and judging dependence based on the remaining rows and columns.

Parameters

alpha

D-Separation Test
This is the usual test of d-separation, a property of graphs, not distributions. It's not really a test, but it can be used in place of a test of the true graph is
known. This is a way to find out, for constraint-based algorithms, or even for some score-based algorithms like FGES, what answer the algorithm would
give if all of the statistical decisions made are correct. Just draw an edge from the true graph to the algorithm--the d-separation option will appear, and
you can then just run the search as usual.

Discrete BIC Test


This is a BIC score for the discrete case, used as a test. The likelihood is judged by the multinomial tables directly, and this is penalized as is usual for a
BIC score. The only surprising thing perhaps is that we use the formula BIC = 2L - k ln N, where L is the likelihood, k the number of parameters, and N
the sample size, instead of the usual L + k / 2 ln N. So higher BIC scores will correspond to greater dependence. In the case of independence, the BIC
score will be negative, since the likelihood will be zero, and this will be penalized. The test yields a p-value; we simply use alpha - p as the score, where
alpha is the cutoff for rejecting the null hypothesis of independence. This is a number that is positive for dependent cases and negative for independent
cases.

Parameters

penaltyDiscount, structurePrior

G Square Test
This is completely parallel to the Chi-Square statistic, using a slightly different method for estimating the statistic. The alternative statistic is still
distributed as chi-square in the limit. In practice, this statistic is more or less indistinguishable in most cases from Chi-Square. For an explanation, see
Spirtes, P., Glymour, C. N., Scheines, R., Heckerman, D., Meek, C., Cooper, G., & Richardson, T. (2000). Causation, prediction, and search. MIT press.

Parameters

alpha

Kernel Conditional Independence (KCI) Test


KCI ("Kernel Conditional Independence") is a general independence test for model in which X = f(parents(X), eY); here, eY does not need to be
additive; it can stand in any functional relationships to the other variables. The variables may even be discrete. The goal of the method is to estimate

cmu-phil.github.io/tetrad/manual/ 57/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
whether X is independent of Y given Z, completely generally. It uses the kernel trick to estimate this. As a result of using the kernel trick, the method is
complex in the direction of sample size, meaning that it may be very slow for large samples. Since it’s slow, individual independence results are always
printed to the console so the user knows how far a procedure has gotten. This test may be used with any constraint-based algorithm (PC, FCI, etc.)
Parameters

alpha, kciUseAppromation, kernelMultiplier, kciNumBootstraps, thresholdForNumEigenvalues, kciEpsilon

Conditional Gaussian Likelihood Ratio Test


Conditional Gaussian Test is a likelihood ratio test based on the conditional Gaussian likelihood function. This is intended for use with datasets where
there is a mixture of continuous and discrete variables. It is assumed that the continuous variables are Gaussian conditional on each combination of
values for the discrete variables, though it will work fairly well even if that assumption does not hold strictly. This test may be used with any constraint-
based algorithm (PC, FCI, etc.). See See Andrews, B., Ramsey, J., & Cooper, G. F. (2018). Scoring Bayesian networks of mixed variables. International
journal of data science and analytics, 6(1), 3-18.

Degenerate Gaussian Likelihood Ratio Test may be used for the case where there is a mixture of discrete and Gaussian variables. Calculates a a
likelihood ratio based on likelihood that is calculated using a conditional Gaussian assumption. See Andrews, B., Ramsey, J., & Cooper, G. F. (2019).
Learning high-dimensional directed acyclic graphs with mixed data-types. Proceedings of machine learning research, 104, 4.

Parameters

structurePrior

Parameters

alpha, discretize

Resampling
Most TETRAD searches can be performed with resampling. This option is available on the Set Parameters screen. When it is selected, the search will
be performed multiple times on randomly selected subsets of the data, and the final output graph will be the result of a voting procedure among all of
the graphs. These subsets may be selected with replacement (bootstrapping) or without. There are also options for the user to set the size of the
subset, and the number of resampling runs. The default number of resampling runs is zero, in which case no resampling will be performed.

For each potential edge in the final output graph, the individual sampled graphs may contain a directed edge in one direction, the other direction, a
bidirected edge, an uncertain edge, or no edge at all. The voting procedure reconciles all of these possible answers into a single final graph, and the
"ensemble method," which can be set by the user in the parameter settings screen, determines how it will do that.

The three available ensemble methods are Preserved, Highest, and Majority. Preserved tends to return the densest graphs, then Highest, and finally
Majority returns the sparsest. The Preserved ensemble method ensures that an edge that has been found by some portion of the individual sample
graphs is preserved in the final graph, even if the majority of sample graphs returned [no edge] as their answer for that edge. So the voting procedure
for Preserved is to return the edge orientation that the highest percentage of sample graphs returned, other than [no edge]. The Highest ensemble
method, on the other hand, simply returns the edge orientation which the highest proportion of sample graphs returned, even if that means returning [no
edge]. And the Majority method requires that at least 50 percent of the sample graphs agree on an edge orientation in order to return any edge at all. If
the highest proportion of sample graphs agree on, for instance, a bidirected edge, but only 40 percent of them do so, then the Majority ensemble
method will return [no edge] for that edge.

Scoring Functions
Like the tests, above, all of the below tests do testwise deletion as a default way of dealing with missing values. For testwise deletion, if a score, say,
score(X | Y, Z), is done, columns for X, Y, and Z are scanned for missing values. If any row occurs in which X, Y, or Z is missing, that row is deleted from
the data for those three variables. So if a different test, score(R | W, Q, T) is done, different rows may be stricken from the data. That is, the deletion is
done testwise. For a useful discussion of the testwise deletion condition, see for instance Tu, R., Zhang, C., Ackermann, P., Mohan, K., Kjellström, H., &
Zhang, K. (2019, April). Causal discovery in the presence of missing data. In The 22nd International Conference on Artificial Intelligence and Statistics
(pp. 1762-1770). PMLR. For all of these tests, if no data are missing, the behavior will be as if testwise deletion were not being done.

BDeu Score
This is the BDeu score given in Heckerman, D., Geiger, D., & Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge
and statistical data. Machine learning, 20(3), 197-243. This gives a score for any two variables conditioned on any list of others which is more positive
for distributions which are more strongly dependent.

Parameters

equivalentSampleSize, samplePrior

Conditional Gaussian BIC Score


Conditional Gaussian BIC Score may be used for the case where there is a mixture of discrete and Gaussian variables. Calculates a BIC score based
on likelihood that is calculated using a conditional Gaussian assumption. See Andrews, B., Ramsey, J., & Cooper, G. F. (2018). Scoring Bayesian
networks of mixed variables. International journal of data science and analytics, 6(1), 3-18.

Parameters
cmu-phil.github.io/tetrad/manual/ 58/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
structurePrior, discretize

Degenerate Gaussian BIC Score may be used for the case where there is a mixture of discrete and Gaussian variables. Calculates a BIC score based
on likelihood that is calculated using a conditional Gaussian assumption. See Andrews, B., Ramsey, J., & Cooper, G. F. (2019). Learning high-
dimensional directed acyclic graphs with mixed data-types. Proceedings of machine learning research, 104, 4.

Parameters

structurePrior

D-separation Score
This uses d-separation to make something that acts as a score if you know the true graph. A score in Tetrad, for FGES, say, is a function that for X and
Y conditional on Z, returns a negative number if X _||_ Y | Z and a positive number otherwise. So to get this behavior in no u certain terms, we simply
return -1 for independent cases and +1 for dependent cases. Works like a charm. This can be used for FGES to check what the ideal behavior of the
algorithm should be. Simply draw an edge from the true graph to the search box, select FGES, and search as usual.

Discrete BIC Score


This is a BIC score for the discrete case. The likelihood is judged by the multinomial tables directly, and this is penalized as is usual for a BIC score. The
only surprising thing perhaps is that we use the formula BIC = 2L - k ln N, where L is the likelihood, k the number of parameters, and N the sample size,
instead of the usual L + k / 2 ln N. So higher BIC scores will correspond to greater dependence. In the case of independence, the BIC score will be
negative, since the likelihood will be zero, and this will be penalized.

SEM BIC Score


This is specifically a BIC score for the linear, Gaussian case, where we include an additional penalty term, which is commonly used. We call this the
penalty discount. So our formulas has BIC = 2L - ck log N, where L is the likelihood, c the penalty discount (usually greater than or equal to 1), and N
the sample size. Since the assumption is that the data are distributed as Gaussian, this reduces to BIC = -n log sigma - ck ln N, where sigma is the
standard deviation of the linear residual obtained by regressing a child variable onto all of its parents in the model.

Parameters

penaltyDiscount

EBIC Score
This is the Extended BIC (EBIC) score of Chen and Chen (Chen, J., & Chen, Z. (2008). Extended Bayesian information criteria for model selection with
large model spaces. Biometrika, 95(3), 759-771.). This score is adapted to score-based search in high dimensions. There is one parameter, gamma,
which takes a value between 0 and 1; if it's 0, the score is standard BIC. A value of 0.5 or 1 is recommended depending on how many variables there
are per sample.

Kim et al. Scores


This is a set of generalized information criterion (GIC) scores based on the paper, Kim, Y., Kwon, S., & Choi, H. (2012). Consistent model selection
criteria on high dimensions. The Journal 0of Machine Learning Research, 13(1), 1037-1057. One needs to select which lambda to use in place of the
usual lambda for the linear, Gaussian BIC score. An penalty discount parameter may also be specified, though this is by default for these scores equal
to 1 (since the lambda choice is essentially picking a penalty discount for you).

Poisson Prior Score


This is likelihood score attenuated by the log of the Poisson distribution. It has one parameter, lambda, from the Poisson distribution, which acts as a
structure prior.

Zhang-Shen Bound Score


Uses Theorem 1 from Zhang, Y., & Shen, X. (2010). Model selection procedure for high‐dimensional data. Statistical Analysis and Data Mining: The
ASA Data Science Journal, 3(5), 350-358, to make a score that controls false positives. The is one parameter, the "risk bound", a number between 0
and 1 (a bound on false positive risk probability).

MAG SEM BIC Score


This gives a BIC score for a Mixed Ancestral Graph (MAG).

Search Parameters
Note: You must specify the "Value Type" of each parameter, and the value type must be one of the following: Integer, Long, Double, String, Boolean.

addOriginalDataset
Short Description: Yes, if adding the original dataset as another bootstrapping
Long Description: Select “Yes” here to include an extra run using the original dataset for improved accuracy.
Default Value: true
Lower Bound:
Upper Bound:

cmu-phil.github.io/tetrad/manual/ 59/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Value Type: Boolean

alpha
Short Description: Cutoff for p values (alpha) (min = 0.0)
Long Description: The cutoff, beyond which test results are judged as dependent, for a statistical test of independence. Detaulf 0.05. Higher alpha
yields a sparser graph.
Default Value: 0.01
Lower Bound: 0.0
Upper Bound: 1.0
Value Type: Double

applyR1
Short Description: Yes if the orient away from arrow rule should be applied
Long Description: Set this parameter to “No” if a chain of directed edges pointing in the same direction when only the first few such orientations
are justified based on the data.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

avgDegree
Short Description: Average degree of graph (min = 1)
Long Description: The average degree of a graph is equal to 2E / V, where E is the number of edges in the graph and V the number of variables
(vertices) in the graph, since each edge has two endpoints.
Default Value: 2
Lower Bound: 1
Upper Bound: 2147483647
Value Type: Double

probabilityOfEdge
Short Description: Probability of an adjacency being included in the graph
Long Description: Every possible adjacency in the graph is included it the graph with this probability.
Default Value: 0.05
Lower Bound: 0.0
Upper Bound: 1.0
Value Type: Double

basisType
Short Description: Basis type (1 = Polynomial, 2 = Cosine)
Long Description: For CCI, this determines which basis type will be used (1 = Polynomial, 2 = Cosine)
Default Value: 2
Lower Bound: 1
Upper Bound: 2
Value Type: Integer

cciScoreAlpha
Short Description: Cutoff for p values (alpha) (min = 0.0)
Long Description: Alpha level (0 to 1)
Default Value: 0.01
Lower Bound: 0.0
Upper Bound: 1.0
Value Type: Double

cgExact
Short Description: Yes if the exact algorithm should be used for continuous parents and discrete children
Long Description: For the conditional Gaussian likelihood, if the exact algorithm is desired for discrete children and continuous parents, set this
parameter to “Yes”.
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

coefHigh
Short Description: High end of coefficient range (min = 0.0)
Long Description: Value m2 for coefficients drawn from U(-m2, -m1) U U(m1, m2).
Default Value: 1.0
Lower Bound: 0.0
Upper Bound: 1.7976931348623157E308
Value Type: Double

cmu-phil.github.io/tetrad/manual/ 60/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
coefLow
Short Description: Low end of coefficient range (min = 0.0)
Long Description: The parameter m1 for coefficents drawn from U(-m2, -m1) U U(m1, m2).
Default Value: 0.0
Lower Bound: 0.0
Upper Bound: 1.7976931348623157E308
Value Type: Double

coefSymmetric
Short Description: Yes if negative coefficient values should be considered
Long Description: Yes if coefficients should be drawn from +/-(a, b); No if from +(a, b).
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

colliderDiscoveryRule
Short Description: Collider discovery: 1 = Lookup from adjacency sepsets, 2 = Conservative (CPC), 3 = Max-P
Long Description: One may look them up from sepsets, as in the original PC, or estimate them conservatively, as from the Conservative PC
algorithm, or by choosing the sepsets with the maximum p-value, as in PC-Max.
Default Value: 1
Lower Bound: 1
Upper Bound: 3
Value Type: Integer

completeRuleSetUsed
Short Description: Yes if the complete FCI rule set should be used
Long Description: No if the (simpler) final orientation rules set due to P. Spirtes, guaranteeing arrow completeness, should be used; yes if the
(fuller) set due to J. Zhang, should be used guaranteeing additional tail completeness.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

doDiscriminatingPathRule
Short Description: Yes if the discriminating path rule should be done, No if not
Long Description: Yes if the discriminating path FCI rule (part of the final orientation, requiring an additional test) should be done, No if not
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

doDiscriminatingPathColliderRule
Short Description: Yes if the discriminating path collider rule should be done, No if not
Long Description: Yes if the discriminating path collider FCI rule (part of the final orientation, requiring an additional test) should be done, No if not
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

doDiscriminatingPathTailRule
Short Description: Yes if the discriminating path tail rule should be done, No if not
Long Description: Yes if the discriminating path tail FCI rule (part of the final orientation, requiring an additional test) should be done, No if not
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

concurrentFAS
Short Description: Yes if a concurrent FAS should be done
Long Description: Yes if the version of the PC adjacency search that uses concurrent processing should be used, no if not.
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

conflictRule
Short Description: Collider conflicts: 1 = Overwrite, 2 = Orient bidirected, 3 = Prioritize existing colliders

cmu-phil.github.io/tetrad/manual/ 61/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Long Description: 1 if the “overwrite” rule as introduced in the PCALG R package, 2 if all collider conflicts using bidirected edges, or 3 if existing
colliders should be prioritized, ignoring subsequent conflicting information.
Default Value: 3
Lower Bound: 1
Upper Bound: 3
Value Type: Integer

connected
Short Description: Yes if graph should be connected
Long Description: Yes if a random graph should be generated in which paths exists from every node to every other, no if not.
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

cstarQ
Short Description: Examine this q
Long Description:
Default Value: 1
Lower Bound: 1
Upper Bound: 500000
Value Type: Integer

targets
Short Description: Target names (comma separated)
Long Description: Target names (comma separated).
Default Value:
Lower Bound:
Upper Bound:
Value Type: String

selectionMinEffect
Short Description: Minimum effect size for listing effects in the CStaR table
Long Description: Minimum effect size for listing effects in the CStaR table
Default Value: 0.0
Lower Bound: 0.0
Upper Bound: 1.0
Value Type: Double

numSubsamples
Short Description: Number of sumbsamples.
Long Description:
Default Value: 50
Lower Bound: 1
Upper Bound: 500000
Value Type: Integer

covHigh
Short Description: High end of covariance range (min = 0.0)
Long Description: The parameter c2 for range +/-U(c1, c2) for covariance values, c1 >= 0.0
Default Value: 0.0
Lower Bound: 0.0
Upper Bound: 1.7976931348623157E308
Value Type: Double

covLow
Short Description: Low end of covariance range (min = 0.0)
Long Description: The parameter c1 for range +/-U(c1, c2) for covariance values, c2 >= c1
Default Value: 0.0
Lower Bound: 0.0
Upper Bound: 1.7976931348623157E308
Value Type: Double

covSymmetric
Short Description: Yes if negative covariance values should be considered
Long Description: Usually covariance values are chosen from +/-U(a, b) for some a, b, no if from +U(a, b).
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

cmu-phil.github.io/tetrad/manual/ 62/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
cutoffConstrainSearch
Short Description: Constraint-independence cutoff threshold
Long Description: null
Default Value: 0.5
Lower Bound: 0.0
Upper Bound: 1.0
Value Type: Double

cutoffDataSearch
Short Description: Independence cutoff threshold
Long Description: null
Default Value: 0.5
Lower Bound: 0.0
Upper Bound: 1.0
Value Type: Double

cutoffIndTest
Short Description: Independence cutoff threshold
Long Description: null
Default Value: 0.5
Lower Bound: 0.0
Upper Bound: 1.0
Value Type: Double

dataType
Short Description: "continuous" or "discrete"
Long Description: For a mixed data type simulation, if this is set to “continuous” or “discrete”, all variables are taken to be of that sort. This is used
as a double-check to make sure the percent discrete is set appropriately.
Default Value: categorical
Lower Bound:
Upper Bound:
Value Type: String

depth
Short Description: Maximum size of conditioning set (unlimited = -1)
Long Description: The depth of search for algorithms like the PC adjacency search, which is the maximum size of any conditioning set
considered. In order to express that no limit should be imposed, use the value -1.
Default Value: -1
Lower Bound: -1
Upper Bound: 2147483647
Value Type: Integer

determinismThreshold
Short Description: Threshold for judging a regression of a variable onto its parents to be deternimistic (min = 0.0)
Long Description: When regressing a child variable onto a set of parent variables, one way to test for determinism is to test how close to singular
the data is; this gives a threshold for this. The default value is 0.1.
Default Value: 0.1
Lower Bound: 0.0
Upper Bound: Infinity
Value Type: Double

differentGraphs
Short Description: Yes if a different graph should be used for each run
Long Description: If ‘Yes’ a new random graph is chosen for each run; if ‘No’, the same graph is always used.
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

mb
Short Description: Find Markov blanket(s)
Long Description: Looks for the graph over the Markov blanket(s) and target(s) if true
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

discretize
Short Description: Yes if continuous variables should be discretized when child is discrete

cmu-phil.github.io/tetrad/manual/ 63/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Long Description: Yes if for the conditional Gaussian likelihood, when scoring X->D where X is continuous and D discrete, one shoudl to simply
discretize X for just those cases. If no, the integration will be exact.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

calculateEuclidean
Short Description: Yes if the Euclidean norm squared should be calculated (slow), No if not
Long Description: The generalized information criterion is defined with an information term that take a Euclidean norm squares; the can be
calculated directly.
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

takeLogs
Short Description: Yes logs should be taken, No if not
Long Description: The formula for the score allows a log to be taken optionally in the information term.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

doColliderOrientation
Short Description: Yes if unshielded collider orientation should be done
Long Description: Please see the description of this algorithm in Thomas Richardson and Peter Spirtes in Chapter 7 of Computation, Causation,
& Discovery by Glymour and Cooper eds.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

doFgesFirst
Short Description: Yes if FGES should be done as an initial step
Long Description: For BOSS, for some cases, doing FGES as an initial step can reduce the maximum permutation size needed to solve a
problem.
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

errorsNormal
Short Description: Yes if errors should be Normal; No if they should be abs(Normal) (i.e., non-Gaussian)
Long Description: A “quick and dirty” way to generate linear, non-Gaussian data is to set this parameter to “No”; then the errors will be sampled
from a Beta distribution.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

skewEdgeThreshold
Short Description: Threshold for including additional edges detectable by skewness
Long Description: For FASK, this includes an adjacency X—Y in the model if |corr(X, Y | X > 0) – corr(X, Y | Y > 0)| exceeds some threshold. The
default for this threshold is 0.3.
Default Value: 0.3
Lower Bound: 0.0
Upper Bound: Infinity
Value Type: Double

twoCycleScreeningThreshold
Short Description: Upper bound for |left-right| to count as 2-cycle. (Set to zero to turn off pre-screening.)
Long Description: 2-cycles are screened by looking to see if the left-right rule returns a difference smaller than this threshold. To turn off the
screening, set this to zero.
Default Value: 0.0
Lower Bound: 0.0
Upper Bound: Infinity
Value Type: Double

orientationAlpha

cmu-phil.github.io/tetrad/manual/ 64/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Short Description: Alpha threshold used for orientation (where necessary). ('0' turns this off.)
Long Description: Used for orienting 2-cycles and testing for zero edges.
Default Value: 0.0
Lower Bound: 0.0
Upper Bound: 1.0
Value Type: Double

faskDelta
Short Description: For FASK v1 and v2, the bias for orienting with negative coefficients ('0' means no bias.)
Long Description: The bias procedure for v1 is given in the published description.
Default Value: 0.0
Lower Bound: -Infinity
Upper Bound: Infinity
Value Type: Double

faskLeftRightRule
Short Description: The left right rule: 1 = FASK v1, 2 = FASK v2, 3 = RSkew, 4 = Skew, 5 = Tanh
Long Description: The FASK left right rule v2 is default, but two other (related) left-right rules are given for relation to the literature, and the v1
FASK rule is included for backward compatibility.
Default Value: 2
Lower Bound: 1
Upper Bound: 5
Value Type: Integer

faskAssumeLinearity
Short Description: Linearity assumed
Long Description: True if a linear, non-Gaussian, additive model is assume; false if a nonlinear, non-Gaussian, additive model is assumed.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

faskNonempirical
Short Description: Variables should be assumed to have positive skewness
Long Description: If false (default), each variable is multiplied by the sign of its skewness in the left-right rule.
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

acceptanceProportion
Short Description: Acceptance Proportion
Long Description: An edge occurring in this proportion of individual FASK graphs will appear in the final graph.
Default Value: 0.5
,.
Lower Bound: 0.0
Upper Bound: 1.0
Value Type: Double

faskAdjacencyMethod
Short Description: Non-skewness Adjacencies: 1 = FAS Stable, 2 = FGES, 3 = External Graph, 4 = None
Long Description: This is the method FASK will use to find non-skewness adjacencies. For External graph, an external graph must be supplied.
Default Value: 1
Lower Bound: 1
Upper Bound: 4
Value Type: Integer

faithfulnessAssumed
Short Description: Yes if (one edge) faithfulness should be assumed
Long Description: Assumes that if X _||_ Y, by an independence test, then X _||_ Y | Z for nonempty Z.
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

fasHeuristic
Short Description: Test ordering: 1 = PC-1 1, 2 = PC-2, 3 = PC-3
Long Description: PC-1 = sort nodes alphabetically; PC-1 = sort edges by p-value; PC-3 = additionally sorted edges in reverse order using p-
values of associated independence facts. See manual.
Default Value: 1

cmu-phil.github.io/tetrad/manual/ 65/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Lower Bound: 1
Upper Bound: 3
Value Type: Integer

fasRule
Short Description: Adjacency search: 1 = PC, 2 = PC-Stable, 3 =f Concurrent PC-Stable
Long Description: For variants of PC, one may select either to use the usual PC adjacency search, or the procedure from the PC-Stable
algorithm (Diego and Maathuis), or the latter using a concurrent algorithm.
Default Value: 1
Lower Bound: 1
Upper Bound: 3
Value Type: Integer

fastIcaA
Short Description: Fast ICA 'a' parameter.
Long Description: This is the 'a' parameter of Fast ICA. (See Hyvarinen, A. (2001); it ranges between 1 and 2; we use a default of 1.1.
Default Value: 1.1
Lower Bound: 1.0
Upper Bound: 2.0
Value Type: Double

fastIcaMaxIter
Short Description: The maximum number of optimization iterations.
Long Description: This is the maximum number if iterations of the optimization procedure of ICA. (See Hyvarinen, A. (2001). It's an integer greater
than 0; we use a default of 2000.
Default Value: 2000
Lower Bound: 1
Upper Bound: 500000
Value Type: Double

fastIcaTolerance
Short Description: Fast ICA tolerance parameter.
Long Description: This is the tolerance parameter of Fast ICA. (See Hyvarinen, A. (2001); we use a default of 1e-6.
Default Value: 1e-6
Lower Bound: 0.0
Upper Bound: 1000.0
Value Type: Double

thresholdBHat
Short Description: Threshold on the B Hat matrix.
Long Description: The estimated B matrix is thresholded by setting small entries less than this threshold to zero.
Default Value: 0.1
Lower Bound: 0.0
Upper Bound: Infinity
Value Type: Double

guaranteeAcyclic
Short Description: True if the output should be guaranteed to be acyclic
Long Description: The estimated B matrix is further thresholded by setting small coefficients to zero until an acyclic model is produced.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

thresholdSpine
Short Description: Threshold on the diagonal of the W matrix.
Long Description: The diagonal of the estimated W matrix is thresholded by setting small entries less than this threshold to zero. Should be >= W
threshold.
Default Value: 0.8
Lower Bound: 0.0
Upper Bound: Infinity
Value Type: Double

fisherEpsilon
Short Description: Epsilon where |xi.t - xi.t-1| < epsilon, criterion for convergence
Long Description: This is a parameter for the linear Fisher option. The idea of Fisher model (for the linear case) is to shock the system every so
often and let it converge by applying the rules of transformation (that is, the linear model) repeatedly until convergence.
Default Value: 0.001
Lower Bound: 4.9E-324
Upper Bound: 1.7976931348623157E308

cmu-phil.github.io/tetrad/manual/ 66/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Value Type: Double

generalSemErrorTemplate
Short Description: General function for error terms
Long Description: This template specifies how distributions for error terms are to be generated. For help in constructing such templates, see the
Generalized SEM PM model.
Default Value: Beta(2, 5)
Lower Bound:
Upper Bound:
Value Type: String

generalSemFunctionTemplateLatent
Short Description: General function template for latent variables
Long Description: This template specifies how equations for latent variables are to be generated. For help in constructing such templates, see the
Generalized SEM PM model.
Default Value: TSUM(NEW(B)*$)/>
Lower Bound:
Upper Bound:
Value Type: String

generalSemFunctionTemplateMeasured
Short Description: General function template for measured variables
Long Description: This template specifies how equations for measured variables are to be generated. For help in constructing such templates,
see the Generalized SEM PM model.
Default Value: TSUM(NEW(B)*$>
Lower Bound:
Upper Bound:
Value Type: String

generalSemParameterTemplate
Short Description: General function for parameters
Long Description: This template specifies how distributions for parameter terms are to be generated. For help in constructing such templates, see
the Generalized SEM PM model.
Default Value: Split(-1.0, -0.5, 0.5, 1.0)
Lower Bound:
Upper Bound:
Value Type: String

imagesMetaAlg
Short Description: IMaGES "meta" algorithm. 1 = FGES, 2 = BOSS-Tuck
Long Description: Sets the meta algorithm to be optimized using the IMaGES (average BIC) score.
Default Value: 1
Lower Bound: 1
Upper Bound: 5
Value Type: Integer

ia
Short Description: IA parameter (GLASSO)
Long Description: Sets the maximum number of iterations of the optimization loop.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

includeNegativeCoefs
Short Description: Yes if negative coefficients should be included in the model
Long Description: One may include positive coefficients, negative coefficients, or both, in the model. To include negative coefficients, set this
parameter to “Yes”.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

includeNegativeSkewsForBeta
Short Description: Yes if negative skew values should be included in the model, if Beta errors are chosen
Long Description: Yes if negative skew values should be included in the model, if Beta errors are chosen.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

cmu-phil.github.io/tetrad/manual/ 67/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
includePositiveCoefs
Short Description: Yes if positive coefficients should be included in the model
Long Description: Yes if We may include positive coefficients, should be included in the model, no if not.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

includePositiveSkewsForBeta
Short Description: Yes if positive skew values should be included in the model, if Beta errors are chosen
Long Description: Yes if positive skew values should be included in the model, if Beta errors are chosen.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

intervalBetweenRecordings
Short Description: Interval between data recordings for the linear Fisher model (min = 1)
Long Description:
Default Value: 10
Lower Bound: 1
Upper Bound: 2147483647
Value Type: Integer

intervalBetweenShocks
Short Description: Interval beween shocks (R. A. Fisher simulation model) (min = 1)
Long Description: This is a parameter for the linear Fisher option. This sets the number of step between shocks.
Default Value: 10
Lower Bound: 1
Upper Bound: 2147483647
Value Type: Integer

ipen
Short Description: IPEN parameter (GLASSO)
Long Description: This sets the maximum number of iterations of the optimization loop.
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

is
Short Description: IS parameter (GLASSO)
Long Description: Sets the maximum number of iterations of the optimization loop.
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

itr
Short Description: ITR parameter (GLASSO)
Long Description: Sets the maximum number of iterations of the optimization loop.
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

kciAlpha
Short Description: Cutoff for p values (alpha) (min = 0.0)
Long Description: Alpha level (0 to 1)
Default Value: 0.05
Lower Bound: 0.0
Upper Bound: 1.0
Value Type: Double

kciCutoff
Short Description: Cutoff
Long Description: Cutoff for p-values.
Default Value: 6
Lower Bound: 1

cmu-phil.github.io/tetrad/manual/ 68/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Upper Bound: 2147483647
Value Type: Integer

kciEpsilon
Short Description: Epsilon for Proposition 5, a small positive number
Long Description: See Zhang, K., Peters, J., Janzing, D., & Schölkopf, B. (2012). Kernel-based conditional independence test and application in
causal discovery.. This parameter is the epsilon for Proposition 5, a small positive number.
Default Value: 0.001
Lower Bound: 0.0
Upper Bound: Infinity
Value Type: Double

kciNumBootstraps
Short Description: Number of bootstraps for Theorems 4 and Proposition 5 for KCI
Long Description: This parameter is the number of bootstraps for Theorems 4 from Zhang, K., Peters, J., Janzing, D., & Schölkopf, B. (2012) and
Proposition 5, a positive integer.
Default Value: 5000
Lower Bound: 1
Upper Bound: 2147483647
Value Type: Integer

kciUseAppromation
Short Description: Use the approximate Gamma approximation algorithm
Long Description: Referring to Zhang, K., Peters, J., Janzing, D., & Schölkopf, B. (2012), if this parameter is set to ‘Yes’, the Gamma
approximation algorithm is used; if no, the exact procedure is used.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

kernelMultiplier
Short Description: Bowman and Azzalini (1997) default kernel bandwidhts should be multiplied by...
Long Description: For the conditional correlation algorithm. Bowman, A. W., & Azzalini, A. (1997), give a formula for default optimal kernel widths.
We allow these defaults to be multiplied by this factor, to capture more or less than this optimal signal.
Default Value: 1.0
Lower Bound: 4.9E-324
Upper Bound: Infinity
Value Type: Double

kernelRegressionSampleSize
Short Description: Minimum sample size to use per conditioning for kernel regression
Long Description: The smallest set of nearest data points on which to allow a judgment to be based for a nonlinear regression.
Default Value: 100
Lower Bound: -2147483648
Upper Bound: 2147483647
Value Type: Integer

kernelType
Short Description: Kernel type (1 = Gaussian, 2 = Epinechnikov)
Long Description: For CCI, this determine which kernel type will be used (1 = Gaussian, 2 = Epinechnikov).
Default Value: 2
Lower Bound: 1
Upper Bound: 2
Value Type: Integer

kernelWidth
Short Description: Kernel width
Long Description: A larger kernel width means that more information will be taken into account but possibly less focused information.
Default Value: 1.0
Lower Bound: 4.9E-324
Upper Bound: Infinity
Value Type: Double

latentMeasuredImpureParents
Short Description: Number of Latent --> Measured impure edges
Long Description: It is possible for structural nodes to have as children measured variables that are children of other structural nodes. These
edges in the graph will be considered impure.
Default Value: 0
Lower Bound: -2147483648
Upper Bound: 2147483647

cmu-phil.github.io/tetrad/manual/ 69/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Value Type: Integer

lowerBound
Short Description: Lower bound cutoff threshold
Long Description: null
Default Value: 0.3
Lower Bound: 0.0
Upper Bound: 1.0
Value Type: Double

maxCategories
Short Description: Maximum number of categories (min = 2)
Long Description: The maximum number of categories to be used for randomly generated discrete variables. The default is 2. This needs to be
greater or equal to than the minimum number of categories.
Default Value: 3
Lower Bound: 2
Upper Bound: 2147483647
Value Type: Integer

maxCorrelation
Short Description: Maximum absolute correlation considered
Long Description: For the Nandy rule, the absolute max correlation r. For the standard BIC or high-dimensional rule, the maximum absolute
residual correlation.
Default Value: 1.0
Lower Bound: 0.0
Upper Bound: 1.0
Value Type: Double

maxDegree
Short Description: The maximum degree of the graph (min = -1)
Long Description: An upper bound on the maximum degree of any node in the graph. If no limit is to be placed on the maximum degree, use the
value -1.
Default Value: 1000
Lower Bound: 1
Upper Bound: 2147483647
Value Type: Integer

maxDistinctValuesDiscrete
Short Description: The maximum number of distinct values in a column for discrete variables (min = 0)
Long Description: Discrete variables will be simulated using any number of categories from 2 up to this maximum. If set to 0 or 1, discrete
variables will not be generated.
Default Value: 0
Lower Bound: 0
Upper Bound: 2147483647
Value Type: Integer

maxIndegree
Short Description: Maximum indegree of graph (min = 1)
Long Description: An upper bound on the maximum indegree of any node in the graph. If no limit is to be placed on the maximum degree, use the
value -1.
Default Value: 1000
Lower Bound: 1
Upper Bound: 2147483647
Value Type: Integer

zsMaxIndegree
Short Description: Maximum indegree of true graph (min = 0)
Long Description: This is the maximum number of parents one expects any node to have in the true model.
Default Value: 4
Lower Bound: 0
Upper Bound: 2147483647
Value Type: Integer

maxIterations
Short Description: The maximum number of iterations the algorithm should go through orienting edges
Long Description: In orienting, this algorith may go through a number of iterations, conditioning on more and more variables until orientations are
set. This sets that number.
Default Value: 15
Lower Bound: 0
Upper Bound: 2147483647

cmu-phil.github.io/tetrad/manual/ 70/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Value Type: Integer

maxOutdegree
Short Description: Maximum outdegree of graph (min = 1)
Long Description: An upper bound on the maximum outdegree of any node in the graph. If no limit is to be placed on the maximum degree, use
the value -1.
Default Value: 1000
Lower Bound: 1
Upper Bound: 2147483647
Value Type: Integer

maxPOrientationMaxPathLength
Short Description: Maximum path length for the unshielded collider heuristic for max P (min = 0)
Long Description: For the Max P “heuristic” to work, it must be the case that X and Z are only weakly associated—that is, that paths between
them are not too short. This bounds the length of paths for this purpose.
Default Value: 3
Lower Bound: 0
Upper Bound: 2147483647
Value Type: Integer

maxPathLength
Short Description: The maximum length for any discriminating path. -1 if unlimited (min = -1)
Long Description: See Spirtes, Glymour, and Scheines (2000) for the definition of discrimination path. Finding discriminating paths can be
expensive. This sets the maximum length of such paths that the algorithm tries to find.
Default Value: -1
Lower Bound: -1
Upper Bound: 2147483647
Value Type: Integer

maxit
Short Description: MAXIT parameter (GLASSO) (min = 1)
Long Description: Sets the maximum number of iterations of the optimization loop.
Default Value: 10000
Lower Bound: 1
Upper Bound: 2147483647
Value Type: Integer

meanHigh
Short Description: High end of mean range (min = 0.0)
Long Description: The default is for there to be no shift in mean, but shifts from a minimum value to a maximum value may be specified. The
minimum must be less than or equal to this maximum.
Default Value: 1.0
Lower Bound: 0.0
Upper Bound: 1.7976931348623157E308
Value Type: Double

meanLow
Short Description: Low end of mean range (min = 0.0)
Long Description: The default is for there to be no shift in mean, but shifts from a minimum value to a maximum value may be specified. The
minimum must be greater than or equal to this minimum.
Default Value: 0.5
Lower Bound: 0.0
Upper Bound: 1.7976931348623157E308
Value Type: Double

measuredMeasuredImpureAssociations
Short Description: Number of Measured <-> Measured impure edges
Long Description: It is possible for measures from two different structural nodes to be confounded. These confounding (bidirected) edges will be
considered to be impure.
Default Value: 0
Lower Bound: -2147483648
Upper Bound: 2147483647
Value Type: Integer

measuredMeasuredImpureParents
Short Description: Number of Measured --> Measured impure edges
Long Description: It is possible for measures from two different structural nodes to have directed edges between them. These edges will be
considered to be impure.
Default Value: 0
Lower Bound: -2147483648

cmu-phil.github.io/tetrad/manual/ 71/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Upper Bound: 2147483647
Value Type: Integer

measurementModelDegree
Short Description: Number of measurements per Latent
Long Description: Each structural node in the MIM will be created to have this many measured children.
Default Value: 5
Lower Bound: -2147483648
Upper Bound: 2147483647
Value Type: Integer

measurementVariance
Short Description: Additive measurement noise variance (min = 0.0)
Long Description: If the value is greater than zero, independent Gaussian noise will be added with mean zero and the given variance to each
variables in the simulated output.
Default Value: 0.0
Lower Bound: 0.0
Upper Bound: 1.7976931348623157E308
Value Type: Double

mgmParam1
Short Description: MGM tuning parameter #1 (min = 0.0)
Long Description: The MGM algorithm has three internal tuning parameters, of which this is one.
Default Value: 0.1
Lower Bound: 0.0
Upper Bound: 1.7976931348623157E308
Value Type: Double

mgmParam2
Short Description: MGM tuning parameter #2 (min = 0.0)
Long Description: The MGM algorithm has three internal tuning parameters, of which this is one.
Default Value: 0.1
Lower Bound: 0.0
Upper Bound: 1.7976931348623157E308
Value Type: Double

mgmParam3
Short Description: MGM tuning parameter #3 (min = 0.0)
Long Description: The MGM algorithm has three internal tuning parameters, of which this is one.
Default Value: 0.1
Lower Bound: 0.0
Upper Bound: 1.7976931348623157E308
Value Type: Double

minCategories
Short Description: Minimum number of categories (min = 2)
Long Description: The minimum number of categories to be used for randomly generated discrete variables. The default is 2.
Default Value: 3
Lower Bound: 2
Upper Bound: 2147483647
Value Type: Integer

noRandomlyDeterminedIndependence
Short Description: Yes, if use the cutoff threshold for the independence test.
Long Description: null
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

numBasisFunctions
Short Description: Number of functions to use in (truncated) basis
Long Description: This parameter specifies how many of the most significant basis functions to use as a basis.
Default Value: 30
Lower Bound: 1
Upper Bound: 2147483647
Value Type: Integer

numBscBootstrapSamples
Short Description: The number of bootstrappings drawing from posterior dist. (min = 1)

cmu-phil.github.io/tetrad/manual/ 72/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Long Description: The number of bootstrappings drawing from posterior dist. (min = 1)
Default Value: 50
Lower Bound: 1
Upper Bound: 2147483647
Value Type: Integer

numCategories
Short Description: Number of categories for discrete variables (min = 2)
Long Description: The number of categories to be used for randomly generated discrete variables. The default is 4; the minimum is 2.
Default Value: 4
Lower Bound: 2
Upper Bound: 2147483647
Value Type: Integer

numCategoriesToDiscretize
Short Description: The number of categories used to discretize continuous variables, if necessary (min = 2)
Long Description: In case the exact algorithm is not used for discrete children and continuous parents is not used, the This parameter gives the
number of categories to use for this second (discretize) backup copy of the continuous variables.
Default Value: 3
Lower Bound: 2
Upper Bound: 2147483647
Value Type: Integer

numLags
Short Description: The number of lags in the time lag model
Long Description: A time lag model may take variables from previous time steps into account. This determines how many steps back these
relevant variables might go.
Default Value: 1
Lower Bound: -2147483648
Upper Bound: 2147483647
Value Type: Integer

numLatents
Short Description: Number of additional latent variables (min = 0)
Long Description: Thye numbger of additional latent variables to include in the datasets
Default Value: 0
Lower Bound: 0
Upper Bound: 2147483647
Value Type: Integer

numMeasures
Short Description: Number of measured variables (min = 1)
Long Description: The number of measured (recorded in data) variables to include in the dataset.
Default Value: 10
Lower Bound: 1
Upper Bound: 2147483647
Value Type: Integer

numRandomizedSearchModels
Short Description: The number of search probabilistic model (min = 1)
Long Description: The number of search probabilistic model (min = 1)
Default Value: 10
Lower Bound: 1
Upper Bound: 2147483647
Value Type: Integer

numRuns
Short Description: Number of runs (min = 1)
Long Description: An analysis(randomly pick graph, randomly simulate a dataset, run an algorithm on it, look at the result) may be run over and
over again this many times.
Default Value: 1
Lower Bound: 1
Upper Bound: 2147483647
Value Type: Integer

numStructuralEdges
Short Description: Number of structural edges
Long Description: This is a parameter for generating random multiple indictor models (MIMs). A structural edge is an edge connecting two
structural nodes.
Default Value: 3

cmu-phil.github.io/tetrad/manual/ 73/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Lower Bound: -2147483648
Upper Bound: 2147483647
Value Type: Integer

numStructuralNodes
Short Description: Number of structural nodes
Long Description: This is a parameter for generating random multiple indictor models (MIMs). A structural node is one of the latent variables in
the model; each structural node has a number of child measured variables.
Default Value: 3
Lower Bound: -2147483648
Upper Bound: 2147483647
Value Type: Integer

numberResampling
Short Description: The number of bootstraps/resampling iterations (min = 0)
Long Description: For bootstrapping, the number of bootstrap iterations that should be done by the algorithm, with results summarized.
Default Value: 0
Lower Bound: 0
Upper Bound: 2147483647
Value Type: Integer

numStarts
Short Description: The number of restarts, random after the first (default 1)
Long Description: The number of times the algorithm should srarted from different initializations. By default, the algorithm will be run through at
least once using the initialized parameters (zero random restarts).
Default Value: 1
Lower Bound: 1
Upper Bound: 2147483647
Value Type: Integer

otherPermMethod
Short Description: 1 = RCG, 2 = GSP, 3 = ESP, 4 = SP
Long Description: RCG (Random Carnival Game); GSP ("Greedy SP") GSP using tucking ESP ("Edge SP") is from Solus et al. SP ("Sparsest
Permutation") Raskutti and Uhler
Default Value: 1
Lower Bound: 1
Upper Bound: 5
Value Type: Integer

bossAlg
Short Description: Picks the BOSS algorithm type, BOSS1 or BOSS2
Long Description: 1 = BOSS1, 2 = BOSS2, 3 = BOSS3
Default Value: 1
Lower Bound: 1
Upper Bound: 3
Value Type: Integer

graspCheckCovering
Short Description: Yes if covering of edges should be checked (GASP), no if not (GRASP)
Long Description: An edge X is covered if Parents(X) = Parents(Y) \ {X}. Not checking covering expands the search space.
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

graspForwardTuckOnly
Short Description: Yes if only forward tucks should be checked, no if also reverse tucks should be checked.
Long Description: A forward tuck for X->Y moves Y to the before position of X in the permutation. A reverse tuck moves Y to after the position of X
in the permutation. Including reverse tucks expands the search space.
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

graspBreakAFterImprovement
Short Description: Yes if depth first search returns after first improvement, No for depth first traversal.
Long Description: Exploring the full list in every DFS call is equivalent to what we've been calling the Random Carnival Game procedure (RCG).
Default Value: true
Lower Bound:
Upper Bound:

cmu-phil.github.io/tetrad/manual/ 74/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Value Type: Boolean

graspOrderedAlg
Short Description: Yes if earlier GRaSP stages should be performed before later stages
Long Description: GRaSP has three stages; these can be performed separately or in order; by default Yes.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

graspUseScore
Short Description: Yes if the score should be used for MB calculations, no if the test should be used instead.
Long Description: In either case, compositional graphoid axioms are assumed by the Grow-Shrink algorithm.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

graspUseRaskuttiUhler
Short Description: Yes to use Raskutti and Uhler's DAG-building method (test), No to use Grow-Shrink (score).
Long Description: Raskutti and Uhler's method adds and edge X->Y if Y ~_||_ X | Prefix(Y, pi) \ {X}. Grow-Shrink adds an edge X->Y if X is in the
Markov blanket of Y where the variable set is restricted to Prefix(Y, pi).
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

graspUseDataOrder
Short Description: Yes just in case data variable order should be used for the first initial permutation.
Long Description: In either case, if multiple starting points are used, taking the best scoring model from among these, subsequent starting points
will all be random shuffles.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

graspAllowRandomnessIndideAlgorithm
Short Description: Allow randomness inside algorithms
Long Description: This allows variables orders to be shuffled in certain sports to speed up large linear, Gaussain searches
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

graspUseVpScoring
Short Description: No sure
Long Description: Not sure
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

graspAlg
Short Description: 1 = GRaSP1, 2 = GRaSP2, 3 = esp, 4 = GRaSP4, 5 = GRaSP4
Long Description: Which version of GRaSP (temp parameter)
Default Value: 1
Lower Bound: 1
Upper Bound: 5
Value Type: Integer

graspDepth
Short Description: Recursion depth
Long Description: This is the depth of recursion for the depth first search.
Default Value: 4
Lower Bound: 0
Upper Bound: 2147483647
Value Type: Integer

graspSingularDepth
Short Description: Recursion depth for singular tucks

cmu-phil.github.io/tetrad/manual/ 75/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Long Description: This is the depth of recursion for the singular tucks.
Default Value: 1
Lower Bound: 0
Upper Bound: 2147483647
Value Type: Integer

graspNonSingularDepth
Short Description: Recursion depth for nonsingular tucks
Long Description: This is the depth of recursion at which multiple tucks may be consdidered per score improvement
Default Value: 1
Lower Bound: 0
Upper Bound: 2147483647
Value Type: Integer

graspToleranceDepth
Short Description: Recursion depth for tolerance tucks
Long Description: This is the maximum number of non-greedy tucks in depth first order --that is, tucks where the score is allowed to decrease
rather than increase.
Default Value: 0
Lower Bound: 0
Upper Bound: 2147483647
Value Type: Integer

timeout
Short Description: Timeout (best graph returned, -1 = no timeout)
Long Description: The algorithm will timeout at approximately this number of seconds from when it started and return the final graph found at that
point.
Default Value: -1
Lower Bound: -1
Upper Bound: 2147483647
Value Type: Integer

Short Description: Yes if the algorithm should try moving variables pairwise
Long Description: In some cases, two moves are required simultaneously to get an orientation right in the final step. This is not generally needed
when optimizing using BIC or for large models.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

recursive
Short Description: Yes if the algorithm should proceed recursively, no if not
Long Description: Where recursive or nonrecursive variants of an algorithm are available, this selects which one to use.
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

orientTowardDConnections
Short Description: Yes if Richardson's step C (orient toward d-connection) should be used
Long Description: Please see the description of this algorithm in Thomas Richardson and Peter Spirtes in Chapter 7 of Computation, Causation,
& Discovery by Glymour and Cooper eds.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

orientVisibleFeedbackLoops
Short Description: Yes if visible feedback loops should be oriented
Long Description: Please see the description of this algorithm in Thomas Richardson and Peter Spirtes in Chapter 7 of Computation, Causation,
& Discovery by Glymour and Cooper eds.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

outputRBD
Short Description: Constraint Scoring: Yes: Dependent Scoring, No: Independent Scoring.
Long Description: Constraint Scoring: Yes: Dependent Scoring, No: Independent Scoring.
Default Value: true
Lower Bound:

cmu-phil.github.io/tetrad/manual/ 76/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Upper Bound:
Value Type: Boolean

precomputeCovariances
Short Description: True if covariance matrix should be precomputed for tubular continuous data
Long Description: For more than 5000 variables or so, set this to false to calculate covariances on the fly from data.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

penaltyDiscount
Short Description: Penalty discount (min = 0.0)
Long Description: The parameter c added to a modified BIC score of the form 2L – c k ln N, where L is the likelihood, k the number of degrees of
freedom, and N the sample size. Higher c yield sparser graphs.
Default Value: 2.0
Lower Bound: 0.0
Upper Bound: 1.7976931348623157E308
Value Type: Double

penaltyDiscountZs
Short Description: Penalty discount (min = 0.0)
Long Description: The parameter c added to a modified BIC score of the form 2L – c k lambda, where L is the likelihood, k the number of degrees
of freedom, and lambda the choice of GIC lambda. Higher c yield sparser graphs.
Default Value: 1.0
Lower Bound: 0.0
Upper Bound: 1.7976931348623157E308
Value Type: Double

zSRiskBound
Short Description: Risk bound
Long Description: This is the probability of getting the true model if a correct model is discovered. Could underfit.
Default Value: 0.1
Lower Bound: 0
Upper Bound: 1
Value Type: Double

ebicGamma
Short Description: EBIC Gamma (0-1)
Long Description: The gamma parameter for Extended BIC (Chen and Chen). In [0, 1].
Default Value: 0.8
Lower Bound: 0.0
Upper Bound: 1.0
Value Type: Double

trueErrorVariance
Short Description: True error variance
Long Description: The true error variance of the model, assuming this is the same for all variables.
Default Value: 1.0
Lower Bound: 0.0
Upper Bound: 1.7976931348623157E308
Value Type: Double

correlationThreshold
Short Description: Correlation Threshold
Long Description: The algorithm will complain if correlations are found that are greater than this in absolute value.
Default Value: 1
Lower Bound: 0
Upper Bound: 1
Value Type: Double

manualLambda
Short Description: Lambda (manually set)
Long Description: The manually set lambda for GIC--the default is 10, though this should be set by the user to a good value.
Default Value: 10.0
Lower Bound: 0.0
Upper Bound: 1.7976931348623157E308
Value Type: Double

errorThreshold

cmu-phil.github.io/tetrad/manual/ 77/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Short Description: Error Threshold
Long Description: Adjusts the threshold for judging conditional dependence.
Default Value: 0.5
Lower Bound: 0.0
Upper Bound: 1
Value Type: Double

parallelized
Short Description: Yes if the search should be parallelized
Long Description: This search is capable of being parallelized; select yes if the search should be parallelized, not if it should be run in a single
thread
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

percentDiscrete
Short Description: Percentage of discrete variables (0 - 100) for mixed data
Long Description: For a mixed data type simulation, specifies the percentage of variables that should be simulated (randomly) as discrete. The
rest will be taken to be continuous. The default is 0—i.e. no discrete variables.
Default Value: 50.0
Lower Bound: 0.0
Upper Bound: 100.0
Value Type: Double

percentResampleSize
Short Description: The percentage of resample size (min = 10%)
Long Description: This parameter specifies the percentage of records in the bootstrap (as a percentage of the total original sample size of the
data being bootstrapped).
Default Value: 100
Lower Bound: 10
Upper Bound: 100
Value Type: Integer

possibleDsepDone
Short Description: Yes if the possible dsep search should be done
Long Description: This algorithm has a possible d-sep path search, which can be time-consuming. See Spirtes, Glymour, and Scheines (2000) for
details.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

probCycle
Short Description: The probability of adding a cycle to the graph
Long Description: Sets the probability that any particular set of 3, 4, or 5 of nodes will be used to form a cycle in the graph.
Default Value: 1.0
Lower Bound: 0.0
Upper Bound: 1.0
Value Type: Double

probTwoCycle
Short Description: The probability of creating a 2-cycles in the graph (0 - 1)
Long Description: Any edge X*-*Y may be replaced with a 2-cycle (feedback loop) between X and Y with this probility.
Default Value: 0.0
Lower Bound: 0.0
Upper Bound: 1.0
Value Type: Double

randomSelectionSize
Short Description: The number of datasets that should be taken in each random sample
Long Description: The number of dataset that should be taken in each random sample of datasets.
Default Value: 1
Lower Bound: -2147483648
Upper Bound: 2147483647
Value Type: Integer

randomizeColumns
Short Description: Yes if the order of the columns in each datasets should be randomized

cmu-phil.github.io/tetrad/manual/ 78/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Long Description: In the real world where unfaithfulness is an issue the order of variables in the data may for some algorithms affect the output.
For testing purposes, if Yes, the data columns are randomly re-ordered.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

guaranteeIid
Short Description: Yes if the Fisher simulation should guarantee that the sample is i.i.d.; No if standard Fisher model
Long Description: The standard model applies a shock every so often to the simulation, so is effectively a time series. Yes here guarantees that a
new data point starts from a new shock without influence from the previous time step.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

rcitNumFeatures
Short Description: The number of random features to use
Long Description:
Default Value: 10
Lower Bound: 1
Upper Bound: 2147483647
Value Type: Integer

resamplingEnsemble
Short Description: Ensemble method: Preserved (1), Highest (2), Majority (3)
Long Description: Preserved = keep highest frequency edges; Highest = keep highest frequency edges but ignore the no edge case if maximal;
Majority = keep edges only if their frequency is greater than 0.5.
Default Value: 1
Lower Bound: 1
Upper Bound: 3
Value Type: Integer

resamplingWithReplacement
Short Description: Yes, if sampling with replacement (bootstrapping)
Long Description: Yes if resampling can be done with replacement, No if not. or without replacement. If with replacement, it is possible to have
more than one copy of some of the records in the original dataset being included in the bootstrap.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

priorEquivalentSampleSize
Short Description: Prior equivalent sample size (min = 1.0)
Long Description: This sets the prior equivalent sample size. This number is added to the sample size for each conditional probability table in the
model and is divided equally among the cells in the table.
Default Value: 10.0
Lower Bound: 1.0
Upper Bound: 1.7976931348623157E308
Value Type: Double

sampleSize
Short Description: Sample size (min = 1)
Long Description: Ddetermines now many records should be generated for the data. The minimum number of records is 1; the default is set to
1000.
Default Value: 1000
Lower Bound: 1
Upper Bound: 2147483647
Value Type: Integer

saveLatentVars
Short Description: Save latent variables.
Long Description: Yes if one wishes to have values for latent variables saved out with the rest of the data; No if only data for the measured
variables should be saved.
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

scaleFreeAlpha

cmu-phil.github.io/tetrad/manual/ 79/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Short Description: For scale-free graphs, the parameter alpha (min = 0.0)
Long Description: We use the algorithm for generating scale free graphs described in B. Bollobas,C. Borgs, J. Chayes, and O. Riordan (2003).
Please see this article for a description of the parameters.
Default Value: 0.05
Lower Bound: 0.0
Upper Bound: 1.0
Value Type: Double

scaleFreeBeta
Short Description: For scale-free graphs, the parameter beta (min = 0.0)
Long Description: We use the algorithm for generating scale free graphs described in B. Bollobas,C. Borgs, J. Chayes, and O. Riordan (2003).
Please see this article for a description of the parameters.
Default Value: 0.9
Lower Bound: 0.0
Upper Bound: 1.0
Value Type: Double

scaleFreeDeltaIn
Short Description: For scale-free graphs, the parameter delta_in (min = 0.0)
Long Description: We use the algorithm for generating scale free graphs described in B. Bollobas,C. Borgs, J. Chayes, and O. Riordan (2003).
Please see this article for a description of the parameters.
Default Value: 3
Lower Bound: -2147483648
Upper Bound: 2147483647
Value Type: Integer

scaleFreeDeltaOut
Short Description: For scale-free graphs, the parameter delta_out (min = 0.0)
Long Description: We use the algorithm for generating scale free graphs described in B. Bollobas,C. Borgs, J. Chayes, and O. Riordan (2003).
Please see this article for a description of the parameters.
Default Value: 3
Lower Bound: -2147483648
Upper Bound: 2147483647
Value Type: Integer

seed
Short Description: Seed for pseudorandom number generator (-1 = off)
Long Description: The seed is the initial value of the internal state of the pseudorandom number generator. A value of -1 skips setting a new
seed.
Default Value: -1
Lower Bound: -1
Upper Bound: 9223372036854775807
Value Type: Long

selfLoopCoef
Short Description: The coefficient for the self-loop (default 0.0)
Long Description: For simulating time series data, each variable depends on itself one time-step back with a linear edge that has this coefficient.
Default Value: 0.0
Lower Bound: 0.0
Upper Bound: Infinity
Value Type: Double

semBicRule
Short Description: Lambda: 1 = Chickering, 2 = Nandy
Long Description: The Chickering Rule uses a the difference of BIC scores to add or remove edges. The Nandy et al. rule uses a single
calculation of a partial correlation in place of the likelihood difference.
Default Value: 1
Lower Bound: 1
Upper Bound: 2
Value Type: Integer

semGicRule
Short Description: Lambda: 1 = ln n, 2 = pn^1/3, 3 = 2 ln pn, 4 = 2(ln pn + ln ln pn), 5 = ln ln n ln pn, 6 = ln n ln pn, 7 = Manual
Long Description: The rule used for calculating the lambda term of the score. We follow Kim, Y., Kwon, S., & Choi, H. (2012) and articles
referenced therein. For high-dimensional data.
Default Value: 4
Lower Bound: 1
Upper Bound: 7
Value Type: Integer

semBicStructurePrior
cmu-phil.github.io/tetrad/manual/ 80/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Short Description: Structure Prior for SEM BIC (default 0)
Long Description: Structure prior; default is 0 (turned off); may be any positive number otherwise
Default Value: 0
Lower Bound: 0
Upper Bound: Infinity
Value Type: Double

poissonLambda
Short Description: Lambda parameter for the Poisson distribution (> 0)
Long Description: Lambda parameter for the Poisson distribution
Default Value: 1
Lower Bound: 1e-10
Upper Bound: Infinity
Value Type: Double

skipNumRecords
Short Description: Number of records that should be skipped between recordings (min = 0)
Long Description: Data recordings are made every this many steps.
Default Value: 0
Lower Bound: 0
Upper Bound: 2147483647
Value Type: Integer

stableFAS
Short Description: Yes if the 'stable' FAS should be done
Long Description: If Yes, the "stable" version of the PC adjacency search is used, which for k > 0 fixes the graph for depth k + 1 to that of the
previous depth k.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

standardize
Short Description: Yes if the data should be standardized
Long Description: Yes if each varaibles in the data should be standardized to have mean zero and variance 1.
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

structurePrior
Short Description: Structure prior coefficient (min = 0.0)
Long Description: The default number of parents for any conditional probability table. Higher weight is accorded to tables with about that number
of parents. The prior structure weights are distributed according to a binomial distribution.
Default Value: 1.0
Lower Bound: 0
Upper Bound: 1.7976931348623157E308
Value Type: Double

symmetricFirstStep
Short Description: Yes if the first step step for FGES should do scoring for both X->Y and Y->X
Long Description: If Yes, scores for both X->Y and X<-Y will be calculated and the higher score used.
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

targetName
Short Description: Target variable name
Long Description: The name of the target variables--for Markov blanket searches, this is the name of the variable for which one wants the Markov
blanket or Markov blanket graph.
Default Value:
Lower Bound:
Upper Bound:
Value Type: String

thr
Short Description: THR parameter (GLASSO) (min = 0.0)
Long Description: Sets the maximum number of iterations of the optimization loop.
Default Value: 1.0E-4

cmu-phil.github.io/tetrad/manual/ 81/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Lower Bound: 0.0
Upper Bound: 1.7976931348623157E308
Value Type: Double

thresholdForNumEigenvalues
Short Description: Threshold to determine how many eigenvalues to use--the lower the more (0 to 1)
Long Description: Referring to Zhang, K., Peters, J., Janzing, D., & Schölkopf, B. (2012), this parameter is the threshold to determine how many
eigenvalues to use--the lower the more (0 to 1).
Default Value: 0.001
Lower Bound: 0.0
Upper Bound: Infinity
Value Type: Double

thresholdNoRandomConstrainSearch
Short Description: Yes, if use the cutoff threshold for the meta-constraints independence test (stage 2).
Long Description: Yes, if use the cutoff threshold for the meta-constraints independence test (stage 2).
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

thresholdNoRandomDataSearch
Short Description: Yes, if use the cutoff threshold for the constraints independence test (stage 1).
Long Description: null
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

twoCycleAlpha
Short Description: Alpha orienting 2-cycles (min = 0.0)
Long Description: The alpha level of a T-test used to determine where 2-cycles exist in the graph. A value of zero turns off 2-cycle detection.
Default Value: 0.0
Lower Bound: 0.0
Upper Bound: 1.0
Value Type: Double

timeLimit
Short Description: Time limit
Long Description: T-Separation requires a time limit. Default 1000.
Default Value: 1000.0
Lower Bound: 0.0
Upper Bound: 1.7976931348623157E308
Value Type: Double

adjustOrientations
Short Description: Yes, if the orientation adjustment step should be included
Long Description: Yes, if the orientation adjustment step should be included
Default Value: false
Lower Bound:
g
Upper Bound:
Value Type: Boolean

upperBound
Short Description: Upper bound cutoff threshold
Long Description: null
Default Value: 0.7
Lower Bound: 0.0
Upper Bound: 1.0
Value Type: Double

useCorrDiffAdjacencies
Short Description: Yes if adjacencies from conditional correlation differences should be used
Long Description: FASK can use adjacencies X—Y where |corr(X,Y|X>0) – corr(X,Y|Y>0)| > threshold. This expression will be nonzero only if
there is a path between X and Y; heuristically, if the difference is greater than, say, 0.3, we infer an adjacency.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

cmu-phil.github.io/tetrad/manual/ 82/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
useFasAdjacencies
Short Description: Yes if adjacencies from the FAS search (correlation) should be used
Long Description: Determines whether adjacencies found by conditional correlation should be included in the final model.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

semImSimulationType
Short Description: Yes if recursive simulation, No if reduced form simulation
Long Description: Determines the type of simulation done. If recursive, the graph must be a DAG in causal order. "Reduced form" means X = (I -
B)^-1 e, which requires a possibly large matrix inversion.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

simulationErrorType
Short Description: 1 = Usual LG SEM, 2 = Indep U(lb, ub), 3 = Indep Exp(lambda), 4 = Indep Gumbel(mu, beta)
Long Description: Exogenous error type
Default Value: 1
Lower Bound: 1
Upper Bound: 4
Value Type: Integer

simulationParam1
Short Description: Indep error parameter #1
Long Description: Exogenous error parameter #1
Default Value: 0.0
Lower Bound: -1000
Upper Bound: 1000
Value Type: Double

simulationParam2
Short Description: Indep error parameter #2, if used
Long Description: Exogenous error parameter #2
Default Value: 1.0
Lower Bound: -1000
Upper Bound: 1000
Value Type: Double

ess
Short Description: Yes if the equivalent sample size should be used in place of N
Long Description: We calculate the equivalent sample size by assuming that all record are equally correlated
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

timeLag
Short Description: A time lag for time series data, automatically applied (zero if none)
Long Description: Automatically applies the time lag transform to the data, creating additional lagged variables. If zero, no time lag is applied. A
positive integer
Default Value: 0
Lower Bound: 0
Upper Bound: 2147483647
Value Type: Integer

useGap
Short Description: Yes if the GAP algorithms should be used. No if the SAG algorithm should be used
Long Description: True if one should first find all possible initial sets, grows these out, and then picks a non-overlapping such largest sets from
these. No if one should grow pure clusters one at a time, excluding variables found in earlier clusters.
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

useMaxPOrientationHeuristic
Short Description: Yes if the heuristic for orienting unshielded colliders for max P should be used

cmu-phil.github.io/tetrad/manual/ 83/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Long Description: Another way to do the orientation if X and Z are only weakly dependent, is to simply see whether the p-value for X _||_ Z | Y is
greater than the p-value for X _||_ Z. The purpose is to speed up the search.
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

useSkewAdjacencies
Short Description: Yes if adjacencies based on skewness should be used
Long Description: FASK can use adjacencies X—Y where |corr(X,Y|X>0) – corr(X,Y|Y>0)| > threshold. This expression will be nonzero only if
there is a path between X and Y; heuristically, if the difference is greater than, say, 0.3, we infer an adjacency. To see adjacencies included for
this reason, set this parameter to “Yes”. Sanchez-Romero, Ramsey et al., (2018) Network Neuroscience.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

useWishart
Short Description: Yes if the Wishart test shoud be used. No if the Delta test should be used
Long Description: This is a parameter for the FOFC (Find One Factor Clusters) algorithm. There are two tests implemented there for testing for
tetrads being zero, Wishart and Delta. This parameter picks which of these tests should be use: ‘Yes’ for Wishart and ‘No’ for Delta.
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

checkType
Short Description: Model significance check type: 1 = Significance, 2 = Clique, 3 = None
Long Description: Model significance check type: 1 = Significance, 2 = Clique, 3 = None
Default Value: 1
Lower Bound: 1
Upper Bound: 3
Value Type: Integer

varHigh
Short Description: High end of variance range (min = 0.0)
Long Description: The parameter 'b' for drawing independent variance values, from +U(a, b).
Default Value: 3.0
Lower Bound: 0.0
Upper Bound: 1.7976931348623157E308
Value Type: Double

significanceChecked
Short Description: True if the significance of the cluster should be checked.
Long Description: True if the significance of clusters should be checked, false if not.
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

varLow
Short Description: Low end of variance range (min = 0.0)
Long Description: The parameter 'a' for drawing independent variance values, from +U(a, b).
Default Value: 1.0
Lower Bound: 0.0
Upper Bound: 1.7976931348623157E308
Value Type: Double

verbose
Short Description: Yes if verbose output should be printed or logged
Long Description: If this parameter is set to ‘Yes’, extra (“verbose”) output will be printed if available giving some details about the step-by-step
operation of the algorithm.
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

meekVerbose
Short Description: Yes if verbose output for Meek rule applications should be printed or logged
Long Description: If this parameter is set to ‘Yes’, Meek rule appications will be printed out to the log.

cmu-phil.github.io/tetrad/manual/ 84/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Default Value: false
Lower Bound:
Upper Bound:
Value Type: Boolean

useScore
Short Description: Yes if the score should be used; no if the test should be used
Long Description: BOSS can run either from a score or a test; this lets you choose which.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

outputCpdag
Short Description: Yes if CPDAG should be ouput, no if a DAG.
Long Description: BOSS can output a DAG or the CPDAG of the DAG.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

cacheScores
Short Description: Yes score results should be cached, no if not
Long Description: Caching scores can use a lot of memory.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

verbose
Short Description: Yes if the (MimBuild) stucture model should be included in the output graph
Long Description: FOFC proper yields a measurement model--that is, a set of pure children for each of the discovered latents. One can estimate
the structure over the latents (the structure model) using Mimbuild. This struture model is included in the output if this parameter is set to Yes.
Default Value: true
Lower Bound:
Upper Bound:
Value Type: Boolean

Regression Box
The regression box performs regression on variables in a data set, in an attempt to discover causal correlations between them. Both linear and
regression are available.

Possible Parent Boxes of the Regression Box


A data box
A simulation box

Possible Child Boxes of the Instantiated Model Box:


A graph box
A compare box
A parametric model box
A data box
A simulation box
A search box

Multiple Linear Regression


Linear regression is performed upon continuous data sets. If you have a categorical data set upon which you would like to perform linear regression,
you can make it continuous using the data manipulation box.

Take, for example, a data set with the following underlying causal structure:

cmu-phil.github.io/tetrad/manual/ 85/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

When used as input to the linear regression box, the following window results:

To select a variable as the response variable, click on it in the leftmost box, and then click on the top right-pointing arrow. If you change your mind about
which variable should be the response variable, simply click on another variable and click on the arrow again.

To select a variable as a predictor variable, click on it in the leftmost box, and then click on the second right- pointing arrow. To remove a predictor
variable, click on it in the predictor box and then click on the left-pointing arrow.

Clicking “Sort Variables” rearranges the variables in the predictor box so that they follow the same order they did in the leftmost box. The alpha value in
the lower left corner is a threshold for independence; the higher it is set, the less discerning Tetrad is when determining the independence of two
variables.

When we click “Execute,” the results of the regression appear in the box to the right. For each predictor variable, Tetrad lists the standard error, t value,
and p value, and whether its correlation with the response variable is significant.

The Output Graph tab contains a graphical model of the information contained in the Model tab. For the case in which X4 is the response variable and
X1, X2, and X3 are the predictors, Tetrad finds that only X1 is significant, and the output graph looks like this:

cmu-phil.github.io/tetrad/manual/ 86/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
Comparison to the true causal model shows that this correlation does exist, but that it runs in the opposite direction.

Logistic Regression
Logistic regression may be run on discrete, continuous, or mixed data sets; however, the response variable must be binary. In all other ways, the logistic
regression box functions like the linear regression box.

Appendices
An Introduction to PAGs
Peter Spirites

The output of the FCI algorithm [Spirtes, 2001] is a partial ancestral graph (PAG), which is a graphical object that represents a set of causal Bayesian
networks (CBNs) that cannot be distinguished by the algorithm. Suppose we have a set of cases that were generated by random sampling from some
CBN. Under the assumptions that FCI makes, in the large sample limit of the number of cases, the PAG returned by FCI is guaranteed to include the
CBN that generated the data.

An example of a PAG is shown in Figure 2. This PAG represents the pair of CBNs in Figure 1a and 1b (where measured variables are in boxes and
unmeasured variables are in ovals), as well as an infinite number of other CBNs that may have an arbitrarily large set of unmeasured confounders.
Despite the fact that there are important differences between the CBNs in Figure 1a and 1b (e.g., there is an unmeasured confounder of X1 and X2 in
Figure 1 b but not in Figure 1a), they share a number of important features in common (e.g., in both CBNs, X2 is a direct cause of X6, there is no
unmeasured confounder of X2 and X6, and X6 is not a cause of X2). It can be shown that every CBN that a PAG represents shares certain features in
common. The features that all CBNs represented by a PAG share in common can be read off of the output PAG according to the rules described next.

There are 4 kinds of edges that occur in a PAG: A -> B, A o-> B, A o–o B, and A <-> B. The edges indicate what the CBNs represented by the PAG have
in common. A description of the meaning of each edge in a PAG is given in Table A1.

Table A1: Types of edges in a PAG.

Relationships that are


Edge type Relationships that are present
absent

A is a cause of B. It may be a direct or indirect cause that may include other measured variables.
A --> B B is not a cause of A.
Also, there may be an unmeasured confounder of A and B.

There is an unmeasured variable (call it L) that is a cause of A and B. There may be measured A is not a cause of B. B is
A <-> B
variables along the causal pathway from L to A or from L to B. not a cause of A.

A o-> B Either A is a cause of B, or there is an unmeasured variable that is a cause of A and B, or both. B is not a cause of A.

Exactly one of the following holds: (a) A is a cause of B, or (b) B is a cause of A, or (c) there is an
A o–o B
unmeasured variable that is a cause of A and B, or (d) both a and c, or (e) both b and c.

Table A1 is sufficient to understand the basic meaning of edge types in PAGs. Nonetheless, it can be helpful to know the following additional
perspective on the information encoded by PAGs. Each edge has two endpoints, one on the A side, and one on the B side. For example A --> B has a
tail at the A end, and an arrowhead at the B end. Altogether, there are three kinds of edge endpoints: a tail "–", an arrowhead ">", and a "o." Note that
some kinds of combinations of endpoints never occur; for example, A o– B never occurs. As a mnemonic device, the basic meaning of each kind of
edge can be derived from three simple rules that explain what the meaning of each kind of endpoint is. A tail "–" at the A end of an edge between A and
B means "A is a cause of B"; an arrowhead ">" at the A end of an edge between A and B means "A is not a cause of B"; and a circle "o" at the A end of
an edge between A and B means "can't tell whether or not A is a cause of B". For example A --> B means that A is a cause of B, and that B is not a
cause of A in all of the CBNs represented by the PAG.

The PAG in Figure 2 shows examples of each type of edge, and the CBNs. Figure 1. show some examples of what kinds of CBNs can be represented
by that PAG.

cmu-phil.github.io/tetrad/manual/ 87/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

Figure 1. Two CBNs that FCI (as well as FCI+, GFCI, and RFCI) cannot distinguish.

Figure 2. The PAG that represents the CBN s in both Figures 1a and 1b.

Arc Specializations in PAGs


This section describes two types of arc specializations that provide additional information about the nature of an arc in a PAG.

One arc specialization is colored green and is called definitely visible. In a PAG P without selection bias, a green (definitely visible) arc from A to B
denotes that A and B do not have a latent confounder. If an arc is not definitely visible (represented as black) then A and B may have a latent
confounder.

Another arc specialization is shown as bold and is called definitely direct. In a PAG P without selection bias, a bold (definitely direct) arc from A to B
denotes that A is a direct cause of B, relative to the other measured variables. If an arc is not definitely direct (represented as not bolded) then A may
not be a direct cause of B, in which case there may be one or more measured variables on every causal path from A to B.

In the following examples, the DAG representing a causal process is on the left, and the corresponding PAG is on the right. All variables are observed
except for latent variable L.

Example of an edge C ➔ D that is definitely visible (green) and definitely direct (bold):

Example of an edge (C ➔ E) that is definitely visible (green) and not definitely direct (not bold):

cmu-phil.github.io/tetrad/manual/ 88/90
6/3/23, 12:00 PM Tetrad Single HTML Manual

Example of an edge (F ➔ E) that is not definitely visible (black) and not definitely direct (not bold):

It is conjectured that it is not possible for an edge to be definitely direct (bold) and not definitely visible (black).

Solving Out of Memory Errors


By default Java will allocate the smaller of 1/4 system memory or 1GB to the Java virtual machine (JVM). If you run out of memory (heap memory
space) running your analyses you should increase the memory allocated to the JVM with the following switch '-XmxXXG' where XX is the number of
gigabytes of ram you allow the JVM to utilize. To run Tetrad with more memory you need to start it from the command line or terminal. For example to
allocate 8 gigabytes of ram you would add -Xmx8G immediately after the java command e.g., java -Xmx8G -jar tetrad-gui.jar.

Glossary of Terms
Adjacent

Two vertices in a graph are adjacent if there is a directed, or undirected, or double headed edge between them.

Degree

The total number of edges directed both into and out of a vertex.

Indegree

The number of edges directed into a vertex.

Markov Blanket

In a variable set V, with joint probability Pr, the Markov Blanket of a variable X in V is the smallest subset M of V \ {X} such that X II V \ M | M. In a DAG
model, the Markov Blanket of X is the union of the set of direct causes (parents) of X, the set of direct effects (children) of X, and the set of direct
causes of direct effects of X.

Markov Equivalent Graphs

Two directed acyclic graphs (DAGS) are Markov Equivalent if they have the same adjacencies and for every triple X – Y – Z of adjacent vertices, if X
and Z are not adjacent, X -> Y <- Z in both graphs or in neither graph.

Meek Orientation Rules

Rules for finding all directions of edges implied by a CPDAG, consistent with any specified “knowledge” constraints on directions. See
https://arxiv.org/pdf/1302. 4972.pdf

Mixed Ancestral Graph (MAG)

An acyclic graph with directed and undirected edges. Directed edges have the same interpretation as in DAGs. Undirected edges represent common
causes. See Richardson, T. (2003). Markov properties for acyclic directed mixed graphs. Scandinavian Journal of Statistics, 30(1), 145-157.

Multiple Indicator Model

A graphical model in which unmeasured variables each have multiple measured effects. There may be directed edges between unmeasured variables,
but no directed edges from measured variables to unmeasured variables are allowed.

Outdegree

The number of edges directed out of a vertex.

Partial Ancestral Graph (PAG)

See PAG description in this manual.

CPDAG

A graphical representation of a Markov Equivalence Class or Classes, having both directed and undirected edges, with an undirected edge indicating
that for each possible direction of the edge, there is a graph in the class or classes having that edge direction.

Scale Free Graph

cmu-phil.github.io/tetrad/manual/ 89/90
6/3/23, 12:00 PM Tetrad Single HTML Manual
A network in which the frequency of nodes with degree k obeys a power law--the relation between log of degree and log of frequency is roughly linear.
See https://cs.brynmawr.edu/Courses/cs380/ spring2013/section02/slides/10_ScaleFreeNetworks.pdf.

Trek

A trek between X and Y is a directed path from X to Y or from Y to X, or two directed paths from a third variable Z into X and Y that do not intersect
except at Z.

cmu-phil.github.io/tetrad/manual/ 90/90

You might also like