You are on page 1of 19

TEXT ANALYSIS

EXAMPLE
The Federalist Papers Session 1

Credit: SAS Institute


Pages 38 - 63 of that big PDF
The Federalist Papers

A collection of 85 documents, written between 1787 and


1788.
During post-Revolutionary War era, argued for the adoption
of the US Constitution (sans Bill of Rights) by New York.

Collaborative work between Alexander Hamilton, James


Madison, and John Jay.
Madison

Hamilton Jay
Timeline
1788 Papers all have been published, but are un-attributed.
1804 Hamilton duels Aaron Burr, and gives his attorney list
of authorship prior-to just in case. Dies in the duel.
1818 Madison releases his own list, with some differences.
Attributes differences to hastily-assembled initial list.
Some instances changed to a collaboration between
Madison and Hamilton.
Most of differences are where Madison claims authorship
to things Hamilton took credit for.
Also one pretty-much definite typo regarding one of Jays.
Our Corpus

85 Essays, sub-setted to 77
51 Hamilton (Train)
14 Madison (Train)
12 Disputed (Predict)
190,000+ words of free text (old school Natural Language)
8,752 unique words
Two exam bonus points if you email me AFTER
class with the data discrepancy shown here and
explanation of why it happened.
Guided Example Step 1

Create project (or open existing one)


Add Library pointing to the folder containing your data
Create a new Data Source. Changes:
Authors role = Label
Targets level = Nominal
Create a new Diagram
Add your new Data Source to your new Diagram
Guided Example Step 2

Add a Filter node from the Sample ribbon.


We want to customize this, so set the default options for Class and
Interval filtering to None.
Then go into the menu for Class variables and exclude:
Records pertaining to Jay (we know he didnt write it)
Records pertaining to Hamilton and Madison collaboration
(blended styles)
Filter by selecting Target = 2 or 3

Check In: Do you have 77 items remaining?


Guided Example Step 3

Ensure a binary target using the Metadata node (Utility)


Enter the Target menu.
Change our target variable to a Binary level.
Run.
Guided Example Step 4

Text Parsing node.


Stop List = DMTXT.FederalistStop.
Find Entities = Standard.

Stoplist: A list of words to automatically omit.

Run.
Guided Example Step 5

Text Filter node.

Term Weight = Inverse Document Frequency


A term is considered more impactful based on rarity
Minimum Number of Documents = 2

Run.
Guided Example Step 6

Text Cluster node.

Exact or Maximum Number = Exact


Number of Clusters= 2
We want exactly 2 clusters, because we want to bucket
into Madison or Hamilton only

Run.
Guided Example Step 7

Text Topic node.

Number of Multi-Term Topics = 5

Run.
Guided Example Step 8

Regression node

Defaults are fine

Logistic Regression will be employed

Run.
Guided Example Step 9

Within the properties of your Regression node:


Exported Data >> TRAIN data >> Explore button.
Click on the plot wizard icon.
In the plot wizard, select Bar, Next.
Roles:
Target = Category
I_Target = Group
Click Finish. Behold the graph.

You might also like