Asa Snaintro

Social Network Analysis
American Sociological Association

San Francisco, August 2004
James Moody
Introduction
We live in a connected world:
“To speak of social life is to speak of the association between people –

their associating in work and in play, in love and in war, to trade or to
worship, to help or to hinder. It is in the social relations men establish that
their interests find expression and their desires become realized.”
Peter M. Blau
Exchange and Power in Social Life, 1964
"If we ever get to the point of charting a whole city or a whole nation, we
would have … a picture of a vast solar system of intangible structures,
powerfully influencing conduct, as gravitation does in space. Such an
invisible structure underlies society and has its influence in determining the
conduct of society as a whole."
J.L. Moreno, New York Times, April 13, 1933
These patterns of connection form a social space, that can be seen in multiple
contexts:
Introduction
Source: Linton Freeman “See you in the funny pages” Connections, 23, 2000, 32-42.
Introduction
High Schools as Networks
Introduction
And yet, standard social science analysis methods do not take this space
into account.
“For the last thirty years, empirical social research has been
dominated by the sample survey. But as usually practiced, …, the
survey is a sociological meat grinder, tearing the individual from his
social context and guaranteeing that nobody in the study interacts
with anyone else in it.”
Allen Barton, 1968 (Quoted in Freeman 2004)
Moreover, the complexity of the relational world makes it impossible to

identify social connectivity using only our intuitive understanding.
Social Network Analysis (SNA) provides a set of tools to empirically

extend our theoretical intuition of the patterns that construct social
structure.
Introduction Why do Networks Matter? Local vision
Introduction Why do Networks Matter? Local vision
Introduction
Why networks matter:
• Intuitive: “goods” travel through contacts between actors,

which can reflect a power distribution or influence attitudes
and behaviors. Our understanding of social life improves if
we account for this social space.
• Less intuitive: patterns of inter-actor contact can have effects

on the spread of “goods” or power dynamics that could not be
seen focusing only on individual behavior.
Introduction
Social network analysis is:
•a set of relational methods for systematically understanding

and identifying connections among actors. SNA
•is motivated by a structural intuition based on ties linking
social actors
•is grounded in systematic empirical data
•draws heavily on graphic imagery
•relies on the use of mathematical and/or computational
models.
•Social Network Analysis embodies a range of theories

relating types of observable social spaces and their relation to
individual and group behavior.
1. Introduction
2. Social Network Data
a. Basic data Elements
b. Collecting network data
c. Basic data structures
3. Measuring Networks
a. Flows within of goods in networks
1) Topology
2) Time
b. Structure of Social Space
1) Small Worlds, Scale-Free, Triads
2) Cohesive Groups
3) Role Positions
4. Modeling with Networks
a. Modeling Behaviors with Networks
1) Peer attribute models
2) Network Autocorrelation Models
3) Dyad / QAP Models
b. Modeling Network Network Structure
1) QAP for network structure
2) Exponential Random Graph Models
5. SNA Computer Programs
Social Network Data
The unit of interest in a network are the combined sets of

actors and their relations.
We represent actors with points and relations with lines.

Actors are referred to variously as:
Nodes, vertices or points
Relations are referred to variously as:
Edges, Arcs, Lines, Ties
Example:
b d
a c e
Social Network Data
In general, a relation can be:

Binary or Valued
Directed or Undirected
b d b d
a c e a c e
Undirected, binary Directed, binary
b d b d
1 3 1 2
a c 4
e a c e
Undirected, Valued Directed, Valued
Social Network Data
Social network data are substantively divided by the number of
modes in the data.
1-mode data represents edges based on direct contact between

actors in the network. All the nodes are of the same type (people,
organization, ideas, etc). Examples:
Communication, friendship, giving orders, sending email.
1-mode data are usually singly reported (each person reports on

their friends), but you can use multiple-informant data, which is
more common in child development research (Cairns and
Cairns).
Social Network Data
Social network data are substantively divided by the number of
modes in the data.
2-mode data represents nodes from two separate classes, where

all ties are across classes. Examples:
People as members of groups
People as authors on papers
Words used often by people
Events in the life history of people
The two modes of the data represent a duality: you can project
the data as people connected to people through joint membership
in a group, or groups to each other through common membership
There may be multiple relations of multiple types connecting

your nodes.
Social Network Data
We can examine networks across multiple levels:
1) Ego-network
- Have data on a respondent (ego) and the people they are connected to
(alters). Example: 1985 GSS module
- May include estimates of connections among alters
2) Partial network
- Ego networks plus some amount of tracing to reach contacts of
contacts
- Something less than full account of connections among all pairs of

actors in the relevant population
- Example: CDC Contact tracing data for STDs

Social Network Data
We can examine networks across multiple levels:
3) Complete or “Global” data

- Data on all actors within a particular (relevant) boundary
- Never exactly complete (due to missing data), but boundaries are set
-Example: Coauthorship data among all writers in the social

sciences, friendships among all students in a classroom
For the most part, I will be discussing techniques surrounding global

networks today, though I will briefly mention some standard uses of
ego-network data.
Social Network Data
Collecting Network Data
Data capture any connection between the nodes. Sources include

surveys, published accounts, special informants, etc.
In general, you can only make conclusions about relations among the
set of nodes you have collected, so it is important to observe as
much of the network as possible.
See W&F, chap 2 on different types of data collection

Social Network Data
If you use surveys to collect data, some general rules of thumb:
a) Network data collection can be time consuming. It is better (I think) to

have breadth over depth. Having detailed information on <50% of the
sample will make it very difficult to draw conclusions about the general
network structure.
b) Question format:
• If you ask people to recall names (an open list format), fatigue will
result in under-reporting
• If you ask people to check off names from a full list, you can often get
over-reporting
c) It is common to limit people to ~5 nominations. This will bias network stats
for stars, but is sometimes the best choice to avoid fatigue.
d) Concrete relational indicators are best (who did you talk to?) over attitudes
that are harder to define (who do you like?)
Social Network Data
Existing Sources of Social Network Data
1) Check INSNA: The International Network of Social Network Analysis

2) Many secondary sources (particularly for 2-mode data)
3) National Longitudinal Survey of Adolescent Health (Add Health)
Social Network Data
Basic Data Structures
Working with pictures.
No standard way to draw a sociogram: each of these are equal:
Social Network Data
In general, graphs are cumbersome to work with analytically, though there is a

great deal of good work to be done on using visualization to build network
intuition.
I recommend using layouts that optimize on the feature you are most interested
in, and find that either a hierarchical layout or a force-directed layout are best.
Social Network Data
From pictures to matrices
b d b d
a c e a c e
Undirected, binary Directed, binary
a b c d e a b c d e
a 1 a 1
b 1 1 b 1
c 1 1 1 c 1 1 1
d 1 1 d
e 1 1 e 1 1
Social Network Data
From matrices to lists
a b c d e Adjacency List Arc List

a 1 ab
b 1 1 ab ba
bac bc
c 1 1 1 cbde cb
d 1 1 dce cd
e 1 1 ecd ce
dc
de
ec
ed
Measuring Networks: Flow
“Goods” flow through networks:

In addition to the simple probability that one actor passes information on

to another (pij), two factors affect flow through a network:
Topology
-the shape, or form, of the network
- Example: one actor cannot pass information to another unless they
are either directly or indirectly connected
Time
- the timing of contact matters
- Example: an actor cannot pass information he has not receive yet
Two features of the network’s topology are known to be important: connectivity

and centrality
Connectivity refers to how actors in one part of the network are connected to
actors in another part of the network.
• Reachability: Is it possible for actor i to reach actor j? This can only be

true if there is a chain of contact from one actor to another.
• Distance: Given they can be reached, how many steps are they from
each other?
• Number of paths: How many different paths connect each pair?

Without full network data, you can’t distinguish actors with limited
information potential from those more deeply embedded in a setting.
b
a
Reachability
Indirect connections are what make networks systems. One actor can
reach another if there is a path in the graph connecting them.
b d a
b f
a c e
c
f
d e
Paths can be directed, leading to a distinction between “strong” and “weak”

components
Reachability
Reachability
If you can trace a sequence of relations from one actor to another,

then the two are reachable. If there is at least one path connecting
every pair of actors in the graph, the graph is connected and is called
a component.
Intuitively, a component is the set of people who are all connected by

a chain of relations.
Reachability
This example
contains many
components.
Distance & number of paths
Distance is measured by the (weighted) number of relations separating a pair:
Actor “a” is:

1 step from 4
2 steps from 5
3 steps from 4
4 steps from 3
5 steps from 1
a
Paths are the different routes one can take. Node-independent paths are
particularly important.
There are 2 independent

paths connecting a and
b.
b
There are many non-

independent paths
a
Probability of transfer
by distance and number of paths, assume a constant pij of 0.6
1.2
1
10 paths
0.8
probability
5 paths
0.6
2 paths
0.4
1 path
0.2
0
2 3 4 5 6
Path distance
Reachability in Colorado Springs
(Sexual contact only) •High-risk actors over 4 years
•695 people represented
•Longest path is 17 steps
•Average distance is about 5 steps
•Average person is within 3 steps
of 75 other people
•137 people connected through 2
independent paths, core of 30
people connected through 4
independent paths
(Node size = log of degree)

Centrality
Centrality refers to (one dimension of) location, identifying where an actor

resides in a network.
• For example, we can compare actors at the edge of the network to actors
at the center.
• In general, this is a way to formalize intuitive notions about the

distinction between insiders and outsiders.
Centrality
At the individual level, one dimension of position in the network can be

captured through centrality.
Conceptually, centrality is fairly straight forward: we want to identify

which nodes are in the ‘center’ of the network. In practice, identifying
exactly what we mean by ‘center’ is somewhat complicated, but
substantively we often have reason to believe that people at the center
are very important.
Three standard centrality measures capture a wide range of

“importance” in a network:
•Degree
•Closeness
•Betweenness
Centrality
The most intuitive notion of centrality focuses on degree. Degree is

the number of ties, and the actor with the most ties is the most
important:
C D  d (ni )  X i    X ij
j
Centrality
If we want to measure the degree to which the graph as a whole is centralized,

we look at the dispersion of centrality:
Simple: variance of the individual centrality scores.
 g
2
S D   (CD (ni )  Cd )  / g
2
 i 1 
Or, using Freeman’s general formula for centralization (which ranges from 0 to 1):
CD 
 C
g
i 1 D (n )  CD (ni )
*

[( g  1)( g  2)]
Measuring Networks: Flow Degree Centralization Scores
Centrality
Freeman: 1.0 Freeman: 0.0

Freeman: .02
Variance: 3.9 Variance: 0.0
Variance: .17
Freeman: .07
Variance: .20
Centrality
A second measure of centrality is closeness centrality. An actor is considered

important if he/she is relatively close to all other actors.
Closeness is based on the inverse of the distance of each actor to every other actor
in the network.
Closeness Centrality:
1
 g

Cc (ni )   d (ni , n j )
 j 1 
Normalized Closeness Centrality
CC' (ni )  (CC (ni ))( g  1)

Centrality Closeness Centrality in the examples
C=1.0 C=0.0
C=0.36
C=0.28
Centrality
Betweenness Centrality:
Model based on communication flow: A person who lies on
communication paths can control communication flow, and is thus important.
Betweenness centrality counts the number of shortest paths between i and k
that actor j resides on.
b
a
C d e f g h
Centrality
C B (ni )   g jk (ni ) / g jk
j k
Where gjk = the number of geodesics connecting jk, and

gjk(ni) = the number that actor i is on.
Usually normalized by:
C (ni )  CB (ni ) /[( g  1)( g  2) / 2]

'
B
Centrality
Centralization: 1.0 Centralization: .59 Centralization: 0
Centralization: .31
Centrality
Actors that appear very
different when seen
individually, are
comparable in the global
network.
(Node size proportional to betweenness centrality )

Time
Two factors that affect network flows:

Topology
- the shape, or form, of the network
- simple example: one actor cannot pass information to
another unless they are either directly or indirectly
connected
Time
- the timing of contacts matters
- simple example: an actor cannot pass information he has
not yet received.
Time
Timing in networks
A focus on contact structure has often slighted the importance of network

dynamics,though a number of recent pieces are addressing this.
Time affects networks in two important ways:

1) The structure itself evolves, in ways that will affect the topology an
thus flow.
2) The timing of contact constrains information flow

Time Drug Relations, Colorado Springs, Year 1
Data on drug users in

Colorado Springs, over
5 years
Current year in red, past relations in gray
Time
What impact does timing have on flow through the network?
8-9
C E
2-5
A B
3-5
D F
Numbers above lines indicate contact periods

Time
The path graph for the hypothetical contact network
C E
A B
D F
While clearly important, this is not often handled well by current software.
Measuring Networks: Structure & Social Space
The second broad division for measuring networks steps back to
generalized features of the global network.
These factors almost always are of interest because of what they imply
about how goods move through the network, but have resulted in a distinct
line of methods and substantive research.
We focus on 3 such factors today:

1) Basic structure of large-scale networks
2) Cohesive Peer Groups
3) Identifying Role positions (blockmodels)
Measuring Networks: Large-Scale Models
Small World Networks
Based on Milgram’s (1967) famous work,

the substantive point is that networks
are structured such that even when
most of our connections are local,
any pair of people can be connected
by a fairly small number of relational
steps.
Works on 2 parameters:
1) The Clustering Coefficient (c) =
average proportion of closed
triangles
2) The average distance (L)
separating nodes in the network
C=Large, L is Small =
SW Graphs
•High probability that a node’s contacts are connected to each other.

•Small average distance between nodes
In a highly clustered, ordered

network, a single random
connection will create a shortcut
that lowers L dramatically
Watts demonstrates that small

world properties can occur in
graphs with a surprisingly small
number of shortcuts
Diffusion / flow implications are

unclear, but seem similar to a
random graphs where local
clusters are reduced to a single
point.
Scale-Free Networks
Across a large number of substantive

settings, Barabási points out that the
distribution of network involvement
(degree) is highly and characteristically
skewed.
Scale Free Networks
Many large networks are characterized by a highly skewed distribution of the

number of partners (degree)
Scale Free Networks
Many large networks are characterized by a highly skewed distribution of the

number of partners (degree)

p(k ) ~ k
Scale Free Networks
The scale-free model focuses on the distance-reducing

capacity of high-degree nodes:
Scale Free Networks
The scale-free model focuses on the distance-reducing capacity of high-

degree nodes, as ‘hubs’ create shortcuts that carry network flow.
Scale Free Networks
Colorado Springs High-Risk

(Sexual contact only) •Network is approximately
scale-free, with  = -1.3
•But connectivity does not

depend on the hubs.
Social Cohesion
White, D. R. and F. Harary. 2001. "The Cohesiveness of Blocks

in Social Networks: Node Connectivity and Conditional
Density." Sociological Methodology 31:305-59.
Moody, James and Douglas R. White. 2003. “Structural

Cohesion and Embeddedness: A hierarchical Conception of
Social Groups” American Sociological Review 68:103-127
White, Douglas R., Jason Owen-Smith, James Moody, &

Walter W. Powell (2004) "Networks, Fields, and
Organizations: Scale, Topology and Cohesive
Embeddings." Computational and Mathematical
Organization Theory. 10:95-117
Moody, James "The Structure of a Social Science

Collaboration Network: Disciplinary Cohesion from
1963 to 1999" American Sociological Review. 69:213-
238
Social Cohesion
Formal definition of Structural Cohesion:

(a) A group’s structural cohesion is equal to the minimum number of actors who,
if removed from the group, would disconnect the group.
Equivalently (by Menger’s Theorem):
(b) A group’s structural cohesion is equal to the minimum number of independent

paths linking each pair of actors in the group.
Social Cohesion
•Networks are structurally cohesive if they remain connected even when

nodes are removed
0 1 2 3
Node Connectivity
Social Cohesion
Structural cohesion gives rise automatically to a clear notion of

embeddedness, since cohesive sets nest inside of each other.
2
1 3
9
4 8 10
11
5 7 12
13
6 14
15
17
18 16
19
20
2
22
23
Social Cohesion
Project 90, Sex-only network (n=695)
3-Component (n=58)
Social Cohesion
IV Drug Sharing Connected

Largest BC: 247 Bicomponents
k > 4: 318
Max k: 12
Structural Cohesion
simultaneously gives
us a positional and
subgroup analysis.
Measuring Networks:
Cohesive Sub Groups
A primary interest in Social Network Analysis is the identification of

“significant social subgroups” – some smaller collection of nodes in
the graph that can be considered, at least in some senses, as a “unit”
based on the pattern, strength, or frequency of ties.
There are many ways to identify groups. They all insist on a group
being in a connected component, but other than that the variation is
wide.
Measuring Networks:
Cohesive Sub Groups
Graph Theoretical Models.
Start with a clique. A clique is defined as a maximal subgraph in which every

member of the graph is connected to every other member of the graph.
Cliques are collections of nodes where density = 1.0.
Properties of cliques:
• Density: 1.0
• Everyone connected to n-1 alters
• Distance between every pair is 1
• Ratio of within group ties to between
group ties is infinite
• All triads are transitive
Measuring Networks:
Cohesive Sub Groups
Graph Theoretical Models.
In practice, complete cliques are not very useful. They tend to overlap
heavily and are limited in their size.
Graph theorists have thus

relaxed the complete
connectivity requirement
(with varying degrees of
success). See the Moody
& White (2003) for a
discussion of these
attempts.
Measuring Networks:
Cohesive Sub Groups
Identifying Primary groups:
1) Measures of fit
To identify a primary group, we need some measure of how clustered
the network is. Usually, this is a function of the number of ties that
fall within group to the number of ties that fall between group.
2) Algorithmic approaches to maximizing (1)

Once we have such an index, we need a method for searching through
the network to maximize the fit.
3) Generalized cluster analysis

In addition to maximizing a group function such as (1) we can use the
relational distance directly, and look for clusters in the data. We next
go over two different styles of cluster analysis
Measuring Networks:
Cohesive Sub Groups
Segregation Index
(Freeman, L. C. 1972. "Segregation in Social Networks." Sociological Methods and
Research 6411-30.)
Freeman asked how we could identify segregation in a social network.

Theoretically, he argues, if a given attribute (group label) does not matter for
social relations, then relations should be distributed randomly with respect to the
attribute. Thus, the difference between the number of cross-group ties expected
by chance and the number observed measures segregation.
E( X )  X
Seg 
E( X )
Measuring Networks:
Cohesive Sub Groups
Consider the (hypothetical) network below. There are two

attributes in this network: people with Blue eyes and Brown eyes
and people who are square or not (they must be hip).
Measuring Networks:
Cohesive Sub Groups
Segregation Index
Mixing Matrix:
Blue Brown
Blue 6 17
Brown 17 16
Seg = -0.25
Hip Square
Hip 20 3
Square 3 30
Seg = 0.78
Measuring Networks:
Cohesive Sub Groups
The segregation index is one metric used to identify groups. Others include:
a) The ratio of in-group to out-group ties (Negopy, UCINET Factions)
b) Maximizing the probability of in-group contact (CliqueFinder)
c) The Segregation Matrix Index (SMI)
d) The dyadic factor loadings for overlapping groups (akin to a latent
class model)
e) Minimize the within-group distance
Once a metric has been chosen, some algorithm is needed to search through
the graph to identify clusters. These algorithms range from very sophisticated
“graph-intelligent” algorithms, such as NEGOPY, to simple cluster analysis
of distance matrices.
In most cases, you have to pre-set the number of groups to use (the exceptions
are NEGOPY and CliqueFinder. Moody’s CROWDS algorithm also has
automatic stopping criteria, but you have to give it starting values.
Measuring Networks:
Cohesive Sub Groups
In practice, the different

algorithms will give
different results.
Here, I compare the

NEGOPY results to the
RNM results. NEGOPY
returned one large group,
RNM found many smaller,
denser groups.
It’s usually a good idea to

explore multiple solutions
and algorithms.
Measuring Networks: Gangon Prison Network
Cohesive Sub Groups
In practice, the different

algorithms will give
different results.
Here, I compare
NEGOPY, FACTIONS
and RNM. Groups A and
B are identical, C is close.
F, E and D differ.
It’s usually a good idea to

explore multiple solutions
and algorithms.
(all solutions constrained to 6 groups)

Measuring Networks:
Role Positions
Overview
•Social life can be described (at least in part) through social roles.
•To the extent that roles can be characterized by regular interaction
patterns, we can summarize roles through common relational patterns.
•Identifying these sets is the goal of block-model analyses.
Nadel: The Coherence of Role Systems

•Background ideas for White, Boorman and Brieger. Social life as
interconnected system of roles
•Important feature: thinking of roles as connected in a role system =
social structure
White, Harrison C.; Boorman, Scott A., and Breiger, Ronald L. Social
Structure from Multiple Networks I. American Journal of Sociology.
1976; 81730-780.
•The key article describing the theoretical and technical elements of
block-modeling
Measuring Networks:
Role Positions
Elements of a Role:
•Rights and obligations with respect to other people or classes of

people
•Roles require a ‘role compliment’ another person who the role-

occupant acts with respect to
Examples:
Parent - child, Teacher - student, Lover - lover, Friend - Friend,
Husband - Wife, etc.
Nadel (Following functional anthropologists and sociologists) defines

‘logical’ types of roles, and then examines how they can be linked together.
Measuring Networks:
Role Positions
White et al: From logical role systems to empirical social structures
Start with some basic ideas of what a role is: An exchange of something (support,
ideas, commands, etc) between actors. Thus, we might represent a family as:
H W
C
C C
Romantic Love
Provides food for
Bickers with
(and there are, of course, many other relations inside a family!)
Measuring Networks:
Role Positions
The key idea, is that we can express a role through a relation (or set of relations)
and thus a social system by the inventory of roles. If roles equate to positions in
an exchange system, then we need only identify particular aspects of a position.
But what aspect?
Structural Equivalence
Two actors are structurally equivalent if they have the same

types of ties to the same people.
Measuring Networks:
Role Positions
A single relation
Measuring Networks:
Role Positions
Graph reduced to positions

Measuring Networks:
Role Positions
Blockmodeling: basic steps
In any positional analysis, there are 4 basic steps:
1) Identify a definition of equivalence

2) Measure the degree to which pairs of actors are equivalent
3) Develop a representation of the equivalencies
4) Assess the adequacy of the representation
Measuring Networks:
Role Positions
1) Identify a definition of equivalence

Structural Equivalence:
Two actors are equivalent if they have the same type of ties to the same people.
Measuring Networks:
Role Positions
Automorphic Equivalence:
Actors occupy indistinguishable structural locations in the network. That is,
that they are in isomorphic positions in the network.
In general, automorphically equivalent nodes are equivalent with respect to

all graph theoretic properties (I.e. degree, number of people reachable,
centrality, etc.)
Measuring Networks:
Role Positions
Automorphic Equivalence:
Measuring Networks:
Role Positions
Regular Equivalence:
Regular equivalence does not require actors to have identical
ties to identical actors or to be structurally indistinguishable.
Actors who are regularly equivalent have identical ties to and

from equivalent actors.
If actors i and j are regularly equivalent, then for all relations

and for all actors, if i k, then there exists some actor l such
that j l and k is regularly equivalent to l.
Measuring Networks:
Role Positions
Regular Equivalence:
There may be multiple regular equivalence partitions in a network, and thus we tend
to want to find the maximal regular equivalence position, the one with the fewest
positions.
Measuring Networks:
Role Positions
Role or Local Equivalence:
While most equivalence measures focus on position within the full network, some
measures focus only on the patters within the local tie neighborhood. These have
been called ‘local role’ equivalence.
Note that:
Structurally equivalent actors are automorphically equivalent,
Automorphically equivalent actors are regularly equivalent.
Structurally equivalent and automorphically equivalent actors are role equivalent
In practice, we tend to ignore some of these distinctions, as they get blurred quickly
once we have to operationalize them in real-world graphs. It turns out that few
people are ever exactly equivalent, and thus we approximate the links between the
types.
In all cases, the procedure can work over multiple relations simultaneously.
The process of identifying positions is called blockmodeling, and requires identifying

a measure of similarity among nodes.
Measuring Networks:
Role Positions
Once you identify equivalent actors, block them in the matrix and reduce it, based on the number of ties
in the cell of interest. The key values are a zero block (no ties) and a one-block (all ties present):
1 2 3 4 5 6 1 2 3 4 5 6
1 . 1 1 0 1 1 0 0 0
1 1 0 0 0 0 0 0 0 0 0 0
2 1 0 0 1 0 0
2 1 . 0 0 1 1 0 0 0 0 0 0 0 0
3 1 0 1 0 1 0
1 0 . 1 0 0 1 1 1 1 0 0 0 0
3 4 0 1 0 1 0 1
1 0 1 . 0 0 1 1 1 1 0 0 0 0
5 0 0 1 0 0 0
0 1 0 0 . 1 0 0 0 0 1 1 1 1
4 6 0 0 0 1 0 0
0 1 0 0 1 . 0 0 0 0 1 1 1 1
0 0 1 1 0 0 . 0 0 0 0 0 0 0
5 0 0 1 1 0 0 0 . 0 0 0 0 0 0
0 0 1 1 0 0 0 0 . 0 0 0 0 0
0 0 1 1 0 0 0 0 0 . 0 0 0 0
0 0 0 0 1 1 0 0 0 0 . 0 0 0
6 0 0 0 0 1 1 0 0 0 0 0 . 0 0
0 0 0 0 1 1 0 0 0 0 0 0 . 0
0 0 0 0 1 1 0 0 0 0 0 0 0 .
Structural equivalence thus generates 6 positions in the network

Measuring Networks:
Role Positions
Once you partition the matrix, reduce it:
. 1 1 1 0 0 0 0 0 0 0 0 0 0 1 2 3
1 . 0 0 1 1 0 0 0 0 0 0 0 0
1 0 . 1 0 0 1 1 1 1 0 0 0 0
1 1 1 0
1 0 1 . 0 0 1 1 1 1 0 0 0 0 2 1 1 1
0 1 0 0 . 1 0 0 0 0 1 1 1 1 3 0 1 0
0 1 0 0 1 . 0 0 0 0 1 1 1 1
0 0 1 1 0 0 . 0 0 0 0 0 0 0
0 0 1 1 0 0 0 . 0 0 0 0 0 0
0 0 1 1 0 0 0 0 . 0 0 0 0 0
0 0 1 1 0 0 0 0 0 . 0 0 0 0
0 0 0 0 1 1 0 0 0 0 . 0 0 0
0 0 0 0 1 1 0 0 0 0 0 . 0 0
0 0 0 0 1 1 0 0 0 0 0 0 . 0 1 2
0 0 0 0 1 1 0 0 0 0 0 0 0 .
3
Regular equivalence
(here I placed a one in the image matrix if there were any ties in the ij block)
Measuring Networks:
Role Positions
Operationally, you have to measure the similarity between actors. If two actors
are structurally equivalent, then they will have identical ties to other people.
Consider the example again:
C and D match on all
1 2 3 4 5 6 12 other people, and
C D Match
1 . 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 are thus structurally
2 1 . 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 equivalent.
1 0 . 1 0 0 1 1 1 1 0 0 0 0 . 1 .
3
1 0 1 . 0 0 1 1 1 1 0 0 0 0 1 . .
0 1 0 0 . 1 0 0 0 0 1 1 1 1 0 0 1
4
0 1 0 0 1 . 0 0 0 0 1 1 1 1 0 0 1
0 0 1 1 0 0 . 0 0 0 0 0 0 0 1 1 1
5 0 0 1 1 0 0 0 . 0 0 0 0 0 0 1 1 1
0 0 1 1 0 0 0 0 . 0 0 0 0 0 1 1 1
0 0 1 1 0 0 0 0 0 . 0 0 0 0 1 1 1
0 0 0 0 1 1 0 0 0 0 . 0 0 0 0 0 1
6 0 0 0 0 1 1 0 0 0 0 0 . 0 0 0 0 1
0 0 0 0 1 1 0 0 0 0 0 0 . 0 0 0 1
0 0 0 0 1 1 0 0 0 0 0 0 0 . 0 0 1
Sum: 12
Measuring Networks:
Role Positions
If the model is going to be based on asymmetric or multiple relations, you simply stack the
various relations, usually including both “directions” of asymmetric relations:
Stacked
Romance
0 1 0 0 0 0 1 0 0 0
H W 1 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 1 1 1
0 0 1 1 1
Feeds
C 0 0 1 1 1
0 0 0 0 0
0 0 0 0 0
C C 0 0 1 1 1
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
Romantic Love 0 0 0 0 0 0 0 0 0 0
Provides food for 1 1 0 0 0
Bickers with Bicker 1 1 0 0 0
0 0 0 0 0 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 0 0 0 0 0
0 0 1 0 0 0 0 0 1 1
0 0 1 1 0 0 0 1 0 1
0 0 1 1 0
Measuring Networks:
Role Positions
The metric used to measure structural equivalence by White, Boorman and Brieger is
the correlation between each node’s set of ties. For the example, this would be:
1.00 -0.20 0.08 0.08 -0.19 -0.19 0.77 0.77 0.77 0.77 -0.26 -0.26 -0.26 -0.26
-0.20 1.00 -0.19 -0.19 0.08 0.08 -0.26 -0.26 -0.26 -0.26 0.77 0.77 0.77 0.77
0.08 -0.19 1.00 1.00 -1.00 -1.00 0.36 0.36 0.36 0.36 -0.45 -0.45 -0.45 -0.45
0.08 -0.19 1.00 1.00 -1.00 -1.00 0.36 0.36 0.36 0.36 -0.45 -0.45 -0.45 -0.45
-0.19 0.08 -1.00 -1.00 1.00 1.00 -0.45 -0.45 -0.45 -0.45 0.36 0.36 0.36 0.36
-0.19 0.08 -1.00 -1.00 1.00 1.00 -0.45 -0.45 -0.45 -0.45 0.36 0.36 0.36 0.36
0.77 -0.26 0.36 0.36 -0.45 -0.45 1.00 1.00 1.00 1.00 -0.20 -0.20 -0.20 -0.20
0.77 -0.26 0.36 0.36 -0.45 -0.45 1.00 1.00 1.00 1.00 -0.20 -0.20 -0.20 -0.20
0.77 -0.26 0.36 0.36 -0.45 -0.45 1.00 1.00 1.00 1.00 -0.20 -0.20 -0.20 -0.20
0.77 -0.26 0.36 0.36 -0.45 -0.45 1.00 1.00 1.00 1.00 -0.20 -0.20 -0.20 -0.20
-0.26 0.77 -0.45 -0.45 0.36 0.36 -0.20 -0.20 -0.20 -0.20 1.00 1.00 1.00 1.00
-0.26 0.77 -0.45 -0.45 0.36 0.36 -0.20 -0.20 -0.20 -0.20 1.00 1.00 1.00 1.00
-0.26 0.77 -0.45 -0.45 0.36 0.36 -0.20 -0.20 -0.20 -0.20 1.00 1.00 1.00 1.00
-0.26 0.77 -0.45 -0.45 0.36 0.36 -0.20 -0.20 -0.20 -0.20 1.00 1.00 1.00 1.00
Another common metric is the Euclidean distance between pairs of actors, which you
then use in a standard cluster analysis.
Measuring Networks:
Role Positions
Automorphic and Regular equivalence are more difficult to find, and require
iteratively searching over possible class assignments for sets that have the same
graph theoretic patterns. Usually start with a set of nodes defined as similar on
a number of network measures, then look within these classes for automorphic
equivalence classes.
A theoretically appealing method for finding structures that are very similar to
regular equivalence, role equivalence, uses the triad census. Each node is
involved in (n-1)(n-2)/2 triads, and occupies a particular position in each of
these triads.
Measuring Networks:
Role Positions
Moving from a similarity/distance matrix to a blockmodel:

number of groups and determining blocks:
“An important decision in an analysis using CONCOR is how fine the

partition should be; in other words, when should one stop splitting
positions? Theory and the interpretability of the solution are the
primary consideration in deciding how many positions to produce.”
(W&F, p.378)
“In defining positions of actors, the ‘trick’ is to choose the point along
the series that gives a useful and interpretable partition of the actors
into equivalence classes.” (W&F p.383)
Measuring Networks:
Role Positions
An example:
Padgett, J. F. and Ansell, C. K.
Robust action and the rise of
the Medici, 1400-1434.
American Journal of
Sociology. 1993; 981259-
1319.
“Political Groups” in the attribute

sense do not seem to exist, so
P&A turn to the pattern of
network relations among
families.
This is the block reduction of the

full 92 family network.
Modeling with Networks: Behaviors
There are two general approaches to modeling behaviors with network data:
1) Using network measures as variables to predict individual outcomes
2) Network autocorrelation / peer influence models
3) Dyad / QAP models of the similarity of actors and their joint network
position
The simplest way to use network data in research is to include the network
measure as a covariate in a standard model:
Y = a0 + b(netvars) + b(other vars) + e
“netvars” most commonly include:

•Functions of each person’s direct contacts attributes
•Such as: mean income of friends, proportion of friends who are
employed, racial heterogeneity of the friends,etc.
•Structural indicators:
•Such as: Centrality, dummies for group / role membership, etc.
These models are the only option for ego-network data,where information on
network alters is collected from a single respondent’s (ego’s) report.
They can be used from extractions of partial or complete data, but the error term
is – by definition – autocorrelated. Cases are not independent, but connected
through the social relations
Network Autocorrelation models (aka Peer Influence models):
Friedkin, N. E. 1984. "Structural Cohesion and Equivalence

Explanations of Social Homogeneity." Sociological Methods and
Research 12:235-61.
———. 1998. A Structural Theory of Social Influence. Cambridge:
Cambridge.
Friedkin, N. E. and E. C. Johnsen. 1990. "Social Influence and
Opinions." Journal of Mathematical Sociology 15(193-205).
———. 1997. "Social Positions in Influence Networks." Social
Networks 19:209-22.
~
Y ()
 αWY ()
 Xb  e
Where W is a direct function of the adjacency matrix, and a is the estimated
value of peer influence.
There are two general ways to test for peer influence in an observed network.
The first estimates the parameters (a and b) of the peer influence model directly,
the second transforms the network into a dyadic model, predicting similarity
among actors.
Peer influence model:

See Doreian, Patrick. “Maximum likelihood methods for linear models Spatial
Effects and Spatial Disturbances Terms.” Sociological Methods and Research.
1982; 10243-269.
Gould, Roger V. Multiple Networks and mobilization in the Paris Commune,

1871. American Sociological Review. 1991; 56716-729. (applied example)
~
Y ()
 αWY ()
 Xb  e
The basic model says that people’s opinions are a function of the opinions of
others and their characteristics.
~
Y ()
 αWY ()
 Xb  e
WY = A simple vector which can be added to your model. That is, multiply
Y by a W matrix, and run the regression with WY as a new variable, and the
regression coefficient is an estimate of a.
This is what Doriean calls the QAD (“Quick and Dirty” estimate of peer
influence, and is equivalent (under certain assumptions) to adding the mean of
ego’s friends to the model.
The problem with the above regression is that cases are, by definition, not
independent. In fact, WY is also known as the ‘network autocorrelation’
coefficient, since a ‘peer influence’ effect is an autocorrelation effect -- your value
is a function of the people you are connected to. In general, OLS is not the best
way to estimate this equation. That is, QAD = Quick and Dirty, and your results
will not be exact.
In practice, the QAD approach (perhaps combined with a GLS estimator) results in
empirical estimates that are “virtually indistinguishable” from MLE (Doreian et al,
1984)
The proper way to estimate the peer equation is to use maximum likelihood
estimates, and Doreian gives the formulas for this in his paper.
The other way is to use non-parametric approaches, such as the Quadratic

Assignment Procedure, to estimate the effects.
An empirical Example: Peer influence in the OSU Graduate Student Network.

Each person was asked to rank their satisfaction with the program, which is the dependent variable
in this analysis.
I constructed two W matrices, one from HELP the other from Best Friend. I treat relations as
symmetric and valued, such that:
 1 if Aijt  1 or A jit  1 
 
Wijt  2 if Aijt  1 and A jit  1
 
 0 otherwise 
Wij  1
j
Wii  0
I also include Race (white/Non-white, Gender and Cohort Year as exogenous variables in the model.
An empirical Example: Peer influence in the OSU Graduate Student Network.

Distribution of Satisfaction with the department.
Parameter Estimates
Parameter Standardized
Variable Estimate Pr > |t| Estimate
Intercept 2.60252 0.0931 0

FEMALE -1.07540 0.0142 -0.25455
NONWHITE -0.22087 0.5975 -0.05491
y00 0.93176 0.0798 0.21627
y99 -0.19375 0.7052 -0.04586
y98 -0.45912 0.4637 -0.08289
y97 0.60670 0.3060 0.11919
PEER_BF 0.23936 0.0002 0.42084
PEER_H 0.50668 0.0277 0.23321
Model R2 = .41, compared to .15 without the peer effects

Dyad QAP models
Another way to get at peer influence is not through the level of Y, but through the
extent to which actors are similar with respect to Y.
The model is now expressed at the dyad level as:
Yij  b0  b1 Aij   bk X k  eij

k
Where Y is a matrix of similarities, A is an adjacency matrix, and Xk is a

matrix of similarities on attributes
Dyad QAP models
NODE ADJMAT SAMERCE SAMESEX

1 0 1 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0
2 1 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 1
3 1 1 0 0 1 0 1 0 0 0 0 0 1 0 1 1 1 0 1 0 0 1 0 0 1 1 0
4 1 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 1 0 0 0 1 1 0
5 0 0 1 1 0 1 0 1 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 1
6 0 0 0 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1
7 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1 0 1 1 0 0 0 1 0
8 0 0 0 0 1 1 0 0 1 0 0 1 1 0 1 1 0 0 1 0 1 1 0 0 1 0 0
9 0 0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0
Dyad QAP models
Y Distance (Dij=abs(Yi-Yj)
0.32 .000 .277 .228 .181 .278 .298 .095 .307 .481
0.59 .277 .000 .049 .096 .555 .575 .182 .584 .758
0.54 .228 .049 .000 .047 .506 .526 .134 .535 .710
0.50 .181 .096 .047 .000 .459 .479 .087 .488 .663
0.04 .278 .555 .506 .459 .000 .020 .372 .029 .204
0.02 .298 .575 .526 .479 .020 .000 .392 .009 .184
0.41 .095 .182 .134 .087 .372 .392 .000 .401 .576
0.01 .307 .584 .535 .488 .029 .009 .401 .000 .175
-0.17 .481 .758 .710 .663 .204 .184 .576 .175 .000
Dyad QAP models
The REG Procedure
Model: MODEL1
Dependent Variable: SIM
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 4 0.90657 0.22664 9.29 <.0001

Error 31 0.75591 0.02438
Corrected Total 35 1.66248
Root MSE 0.15615 R-Square 0.5453

Dependent Mean 0.33161 Adj R-Sq 0.4866
Coeff Var 47.08929
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 0.51931 0.05116 10.15 <.0001

NOM 1 -0.17054 0.05963 -2.86 0.0075
SAMERCE 1 0.05387 0.05916 0.91 0.3696
SAMESEX 1 -0.06535 0.05365 -1.22 0.2324
NCOMFND 1 -0.16134 0.03862 -4.18 0.0002
Dyad QAP models
Like the basic Peer influence model, cases in dyad models are not
independent. However, the non-independence now comes from two sources:
(1) the fact that the same person is represented in (n-1) dyads and (2) that i and
j are linked through relations.
One of the best solutions to this problem is QAP: Quadratic Assignment

Procedure. A non-parametric procedure for significance testing.
QAP runs the model of interest on the real data, then randomly permutes the
rows/cols of the data matrix and estimates the model again. In so doing, it
generates an empirical distribution of the coefficients, generating n levels of
the coefficients at ‘chance’ levels, which you then compare to the observed
data. This is implemented in UCINET for regression, and in DAMN for
logistic regression (J.L. Martin).
Dyad QAP models
Procedure:
1. Calculate the observed association / model
2. for K iterations do:
a) randomly sort one of the matrices
b) recalculate the association / model
c) store the outcome
3. compare the observed outcome to the distribution of
outcomes created by the random permutations.
Dyad QAP models
Comparing multiple networks: QAP

Dyad QAP models
Dyad QAP models
MULTIPLE REGRESSION QAP W/ MISSING VALUES

--------------------------------------------------------------------------------
# of permutations: 2000
Diagonal valid? NO
Random seed: 533
Dependent variable: EX_SIM
Expected values: c:\moody\Classes\soc884\examples\UCINET\mrqap-predicted
Independent variables: EX_NCOM
EX_ADJ
EX_SRCE
EX_SSEX
Number of valid observations among the X variables = 72

N = 72
Number of permutations performed: 1999
MODEL FIT
R-square Adj R-Sqr Probability # of Obs
-------- --------- ----------- -----------
0.545 0.525 0.029 72
REGRESSION COEFFICIENTS
Un-stdized Stdized Proportion Proportion
Independent Coefficient Coefficient Significance As Large As Small
----------- ----------- ----------- ------------ ----------- -----------
Intercept 0.519314 0.000000 0.012 0.012 0.988
EX_NCOM -0.161337 -0.541828 0.011 0.989 0.011
EX_ADJ -0.170539 -0.381186 0.020 0.980 0.020
EX_SRCE 0.053864 0.124551 0.236 0.236 0.764
EX_SSEX -0.065364 -0.151144 0.180 0.820 0.180
Note that the coefficient values will be identical, but the p values differ
Dyad QAP models
A substantive question raised with any kind of network autocorrelation

model is whether observed associations between network structure and
behaviors is due to selection or influence.
Theory is your best friend here, as there is no fool proof method to

distinguish the two.
However, recent work has made great progress using individual-level

fixed effect models (sometimes random effects models), where the
network features vary over time. This removes any stable characteristic
that might account for selection into a particular group.
Modeling with Networks: Structure
Dyad QAP models
While the most common way to use QAP models is to predict the
similarity on some substantive variable, one can just as easily predict the
presence/absence of a relation given attribute similarity.
This makes it possible to model the network itself, and ask questions about
how particular structures form.
Exponential Random Graph Models (p*)
A long research tradition in statistics and random graph theory has lead to
parametric models of networks.
These are models of the entire graph, though as we will see they often work on
the dyads in the graph to be estimated.
Substantively, the approach is to ask whether the graph in question is an element

of the class of all random graphs with the given known elements. For example,
all graphs with 5 nodes and 3 edges, or, put probabilistically, the probability of
observing the current graph given the conditions.
The earliest approaches are based on simple random graph theory, but there’s
been a flurry of activity in the last 10 years or so.
Key references:
- Holland and Leinhardt (1981) JASA
- Frank and Strauss (1986) JASA
- Wasserman and Faust (1994) – Chap 15 & 16
- Wasserman and Pattison (1996)
Thanks to Mark Handcock for sharing some figures/slides about these models.
exp{ z ( x)}
p( X  x) 
 ( )
Where:
 is a vector of parameters (like regression coefficients)
z is a vector of network statistics, conditioning the graph
 is a normalizing constant, to ensure the probabilities sum to 1.
The simplest graph is a Bernoulli random graph,where each Xij is

independent:
exp{ ij xij }
p( X  x) 
i, j
 ( )
Where:
ij = logit[P(Xij = 1)]
() =P[1 + exp(ij )]
Note this is one of the few cases where () can be written.
Typically, we add a homogeneity condition, so that all isomorphic

graphs are equally likely. The homogeneous bernulli graph model:
exp  { xij }
p( X  x) 
i, j
 ( )
Where:
() =[1 + exp()]g
If we want to condition on anything much more complicated than density, the

normalizing constant ends up being a problem. We need a way to express the
probability of the graph that doesn’t depend on that constant. It turns out we
can do this by conditioning on a ‘complement’ graph.
First some terms:
X i, j  Sociomatri x with ij element forced to 1

X i, j  Sociomatri x with ij element forced to 0
X ic, j  Sociomatri x with no tie between i and j
After some algebra:
 p( X ij  1 | X ijc ) 
 ij  log  c 

  [ z ( xij )  z ( xij )]
 
 p( X ij  0 | X ij ) 
Note that we can now model the conditional probability of the graph,
as a function of a set of difference statistics, without reference to the
normalizing constant.
The model, then, simply reduces to a logit model on the dyads.

Fitting p* models
I highly recommend working through the p* primer examples, which can be

found at:
http://kentucky.psych.uiuc.edu/pstar/index.html
Including:
A Practical Guide To Fitting p* Social Network Models
Via Logistic Regression
The site includes the PREPSTAR program for creating the difference variables
of interest.
1 2 3 |4 5 6
1 1 1  1
2
2 1 1 
 
3 1 1 1 3
6
x  
4 1 1 4
5 1  5
 
6 1 1 
We can model this network based on parameters for overall degree of Choice
(), Differential Choice Within Positions (W), Mutuality(), Differential
Mutuality Within Positions (W), and Transitivity (T).
The vector of model parameters to be estimated is:  = {  W  W T }.

Exponential Random Graph Models (p*) 1
2
proc logistic descending ;

tie = l lw m mw tt / noint; 3
6
run;
4
L = Choice
LW = Within Group
M = Mutuality
MW = Mutual within Group
TT = Transitivity
Substantively, this graph is likely from the random class of graphs with similar mutuality and size
One practical problem is that the resulting values are often quite correlated,
making estimation difficult. This is particularly difficult with “star”
parameters.
lw m mw tt
lw 1.00000 0.58333 0.80178 0.15830

0.0007 <.0001 0.4034
m 0.58333 1.00000 0.80178 -0.02435

0.0007 <.0001 0.8984
mw 0.80178 0.80178 1.00000 -0.11716

<.0001 <.0001 0.5375
tt 0.15830 -0.02435 -0.11716 1.00000

0.4034 0.8984 0.5375
Parameters that are often fit include:

1) Expansiveness and attractiveness parameters. = dummies for
each sender/receiver in the network
2) Degree distribution
3) Mutuality
4) Group membership (and all other parameters by group)
5) Transitivity / Intransitivity
6) K-in-stars, k-out-stars
7) Cyclicity
Comparing to Random Graphs
A conceptual merge between random graph models and QAP models is to identify a
sample of graphs from the universe you are trying to model. So, instead of
estimating:
exp{ z ( x)}
p( X  x) 
 ( )
generate X empirically, then compare z(x) to see how likely a measure on x would
be given X. The difficulty, however, is generating X.
The first option would be to generate all isomorphic graphs within a given
constraint.
This is possible for small graphs, but the number gets large fast. For a
network with 3 nodes, there are 16 possible directed graphs. For a
network with 4 nodes, there are 218, for 5 nodes 9608, for 6
nodes1,540,944, and so on…
So, the best approach is to sample from the universe, but, of course, if you
had the universe you wouldn’t need to sample from it. How do you
sample from a population you haven’t observed?
Use a construction algorithm that generates a random graph with known

constraints.
Example: Bearman, Peter S., James Moody and Katherine Stovel (2004) “Chains of Affection:
The Structure of Adolescent Romantic and Sexual Networks” American Journal of Sociology
110:44:92
Romantic Relations in Jefferson High
Simulate random networks with similar degree distribution:

Simulated networks preserve observed degree, isolated dyad
distribution, and four-cycle constraint
Simulated networks preserve observed degree, isolated dyad
distribution, and four-cycle constraint: 4 examples from the
simulated set
Social Network Software
UCINET
•The Standard network analysis program, runs in Windows
•Good for computing measures of network topography for single nets
•Input-Output of data is a special 2-file format, but is now able to read
PAJEK files directly.
•Not optimal for large networks
•Available from:
Analytic Technologies
PAJEK
•Program for analyzing and plotting very large networks
•Intuitive windows interface
•Used for most of the real data plots in this presentation
•Started mainly a graphics program, but has expanded to a wide range of
analytic capabilities
•Can link to the R statistical package
•Free
•Available from:
Cyram Netminer for Windows
•Newest Product, not yet widely used
•Price range depends on application
•Limited to smaller networks O(100)
http://www.netminer.com/NetMiner/home_01.jsp
NetDraw
•Also very new, but by one of the best known names
in network analysis software.
•Free
•Limited to smaller networks O(100)
NEGOPY
•Program designed to identify cohesive sub-groups in a network,
based on the relative density of ties.
•DOS based program, need to have data in arc-list format
•Moving the results back into an analysis program is difficult.
•Available from:
William D. Richards
http://www.sfu.ca/~richards/Pages/negopy.htm
SPAN - Sas Programs for Analyzing Networks (Moody, ongoing)

•is a collection of IML and Macro programs that allow one to:
a) create network data structures from nomination data
b) import/export data to/from the other network programs
c) calculate measures of network pattern and composition
d) analyze network models
•Allows one to work with multiple, large networks
•Easy to move from creating measures to analyzing data
•Available by sending an email to:
Moody.77@sociology.osu.edu

Asa Snaintro

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Asa Snaintro

Uploaded by

Copyright:

Available Formats

Social Network Analysis

American Sociological Association

We live in a connected world:

“To speak of social life is to speak of the association between people –

Moreover, the complexity of the relational world makes it impossible to

Social Network Analysis (SNA) provides a set of tools to empirically

Why networks matter:

• Intuitive: “goods” travel through contacts between actors,

• Less intuitive: patterns of inter-actor contact can have effects

Social network analysis is:

•a set of relational methods for systematically understanding

•Social Network Analysis embodies a range of theories

The unit of interest in a network are the combined sets of

We represent actors with points and relations with lines.

In general, a relation can be:

1-mode data represents edges based on direct contact between

1-mode data are usually singly reported (each person reports on

2-mode data represents nodes from two separate classes, where

There may be multiple relations of multiple types connecting

We can examine networks across multiple levels:

- May include estimates of connections among alters

- Something less than full account of connections among all pairs of

- Example: CDC Contact tracing data for STDs

We can examine networks across multiple levels:

3) Complete or “Global” data

-Example: Coauthorship data among all writers in the social

For the most part, I will be discussing techniques surrounding global

Data capture any connection between the nodes. Sources include

See W&F, chap 2 on different types of data collection

If you use surveys to collect data, some general rules of thumb:

a) Network data collection can be time consuming. It is better (I think) to

Existing Sources of Social Network Data

1) Check INSNA: The International Network of Social Network Analysis

In general, graphs are cumbersome to work with analytically, though there is a

From pictures to matrices

From matrices to lists

a b c d e Adjacency List Arc List

“Goods” flow through networks:

In addition to the simple probability that one actor passes information on

Two features of the network’s topology are known to be important: connectivity

• Reachability: Is it possible for actor i to reach actor j? This can only be

• Number of paths: How many different paths connect each pair?

Paths can be directed, leading to a distinction between “strong” and “weak”

If you can trace a sequence of relations from one actor to another,

Intuitively, a component is the set of people who are all connected by

Distance is measured by the (weighted) number of relations separating a pair:

Actor “a” is:

There are 2 independent

There are many non-

(Node size = log of degree)

Centrality refers to (one dimension of) location, identifying where an actor

• In general, this is a way to formalize intuitive notions about the

At the individual level, one dimension of position in the network can be

Conceptually, centrality is fairly straight forward: we want to identify

Three standard centrality measures capture a wide range of

The most intuitive notion of centrality focuses on degree. Degree is

If we want to measure the degree to which the graph as a whole is centralized,

Simple: variance of the individual centrality scores.

Freeman: 1.0 Freeman: 0.0

A second measure of centrality is closeness centrality. An actor is considered

CC' (ni )  (CC (ni ))( g  1)

Where gjk = the number of geodesics connecting jk, and