CSS 692 Social Network Analysis

Maksim (Max) Tsvetovat Center for Social Complexity George Mason University mtsvetov@gmu.edu January 24, 2012
This is version 8.0

1

Introduction

There has been a dramatic rise in the use of social network analysis over the last decade. The availability of standard texts and robust software has undoubtedly contributed to this increase. Social network analysis focuses on the relationships between actors and acknowledges that an individual’s behaviour is influenced by those around them. Actors and their actions are viewed as interdependent rather than independent units. This view means that the unit of analysis is not the individual, but an entity consisting of the individuals and the linkages connecting them. The purpose of this class is to introduce you to both social-science and mathematical concepts underlying the field of social network analysis. We shall look at the description and visualisation of network data and consider issues of validity and representation. We will then focus on uncovering structural properties of individual actors and the detection and description of groups. Finally we will consider how to test network hypothesis. This is a research-oriented course; its purpose is to give you basic tools for navigating the Social Network Analysis literature, and introducing you to the methods of doing social network analysis on real data.

2
2.1

Course Mechanics
Course schedule

The course meets weekly, on Tuesdays at 4:30 pm.

1

2.2

Facebook

I’m using a Facebook page as our course website this semester. If you don’t have a Facebook account already, please create one (you don’t have to ”friend me”, I won’t be offended – and you can use a fake name if you are opposed to Facebook on principle). Once you are logged onto Facebook, search for CSS 692 - Spring 2012 and the first search result will be our class page with this syllabus. Click Like and you will be allowed to post and comment on the page. One of the side-benefits of using Facebook as our class site is that we’ll be able to capture our online interactions and analyze our class social network.

2.3

Twitter

There’s lots of cool things happening in regards to SNA. I tweet about them all the time. If you have a twitter account, follow me @maksim2042, and follow hashtag #sna

2.4

Office Hours

The ”official” office hours for this course are between 3 pm and 4:15 pm on Tuesdays and Thursdays, and by appointment at mutually convenient times. My office is in the room 381 of Research 1 building. The easiest way to reach me with a question or concern is by email or Facebook. I would appreciate if – when you have questions about readings or course material – you post them to the Facebook group first. This gives everyone an opportunity to comment on them. If the matter requires a face-to-face meeting, we can also schedule appointments at mutually convenient times.

2.5

Interaction

I cannot go out of my way to seek out individual students, you need to come to me. If there’s a problem, come to my office hours, ping me on Facebook or Twitter, Skype me, whatever. I’ll give you my cell phone number if you promise not to call me at 5 am. If I didn’t respond to an email... it’s probably buried under 200 others. Ping me again then – Facebook pings usually get quicker response times simply because the volume is much lower then my main email accounts.

3

Readings
• Maksim Tsvetovat and Alex Kouznetsov, Social Network Analysis for Startups, O’Reilly # Paperback: 190 pages # Publisher: OReilly Media (September 2011)

Page 2

# # # #

Language: English ISBN-10: 1449306462 ISBN-13: 978-1449306465 List Price: $24.99

I wrote this book specifically for this class. If you buy a copy from O’Reilly directly, use code SNAFS for a 40% discount • Stanley Wasserman and Katherine Faust, Social Network Analysis: Methods and Applications, # # # # Paperback: 857 pages (hardcover also available) Publisher: Cambridge University Press (November 25, 1994) ISBN: 0521387078 List Price: $32.95

This book is the “Bible” and the Cookbook of SNA, and will answer every one of your questions as long as it begins with the words ”how do I....” . If your question begins with ”why”, you may have a slightly harder time. For that, I will provide plenty of supplemental readings on the website. • A number of research papers will also be included in readings. In a way, these are more important then the textbook, as they illustrate the history of the field as well as the state-of-the-art. Readings for each class will be posted on the website as PDF files.

4

Schedule of Topics

Note on snow days: We will probably get one or two in the semester. If a class is cancelled due to a snow day, we shift all topics forward by a week. Homework deadlines don’t shift forward, sorry ;-). Week 1 – Jan 24 What is social network analysis – social network analysis and link analysis – survey of tools and applications – basic graph theory – nodes, edges, graphs – graph density – walks, paths and geodesics. Week 2 - Jan 31 Python Tutorial Centrality in social networks – degree centrality – closeness centrality – betweenness centrality – power centrality. Network Workbench, UCINET and NetDraw Lab – first practical analysis session. Homework 1 handed out. Week 3 - Feb 7 Cohesive subgroups – cliques – clusters – clans Sept. 11 Hijacker network

Page 3

Week 4 - Feb 14 Brokerage and structural holes – Cohesion and closure – friendship vs. competition – tradeoffs in efficiency vs. inclusiveness – implication in organization theory, politics. Homework 1 due. Week 5 - Feb 21 Block modeling – finding distinct roles in social networks – analyzing social groups as systems of roles. Week 6 - Feb 28 Presentations of project proposals by students. Week 7 - Mar 6 Distance and clustering in social networks – analyzing similarities and differences – multi-dimensional scaling SPRING BREAK – YIPPIE!!!! (I’ll be at the SunBelt conference on Social Network analysis in sunny California – running a Big Data Hackathon. You are welcome to join me) Week 8 - Mar 20 Strength of ties – dealing with non-binary networks – strength of weak ties – strength of strong ties Homework 2 handed out. Week 9 - Mar 27 First approach to 2-mode networks – knowledge networks – feature matrices – networks of similarities Week 10 - Apr 3 Formation of social networks through information exchange – effect of networks on information exchange Project Checkpoint Due. Week 11 - Apr 10 Rich social networks – PCANS – MetaMatrix – Semantic Social Networks – Link Analysis Networks Homework 2 Due. Week 12 - Apr 17 Dynamic social networks – dealing with dimension of time – evolution of networks over time – forces in networks Week 13 - Apr 24 Visualization of networks – pretty pictures – what looks good and how to make it – network movies Paper draft due. May 1 - FINAL PRESENTATIONS, PART 1 May 8 - FINAL PRESENTATIONS, PART 2

Page 4

5

Assignments and Projects

The goal of this course if to familiarize you with research techniques and interpretations that comprise the field of Social Network Analysis. Thus, the course work is designed to expose you not only to the primary concepts, but also to the real-world techniques and their limitations and pitfalls.

5.1

Homeworks

There will be 2 homeworks where you will get a chance to test and apply the techniques you learn in class, using available data. Think of the homeworks as a walled-off playground where you can test your analysis tools and skills. This is also the right place and time to resolve any questions you may have with material that we are working with. Each homework is worth 20% of your grade.

5.2

Course Project - Small Groups

The goal of the course project is to expose you to the way Social Network Analysis is done in the real world. In the course project, you will need to complete a social network study, complete with data acquisition, analysis, visualization and interpretation. Given the project-oriented nature of this course, you will learn more and achieve more interesting results if you work in small groups. I recommend that you work in groups of 2-3 people. I also recommend that all members of the group participate in the project at data collection and analysis stages. After the Checkpoint, you should designate one person to act as an editor – this will result in higher quality of writing in the end product. The goal of the Checkpoint and the Draft deadlines is to prevent procrastination on the course projects. In the ideal world, you should make steady progress towards the goals you stated in your project proposal from beginning of the course. This will result in better overall quality of your final paper, and in a stress-free final presentation. Please start thinking about your project topic, and recruit members of your project team as soon as you can. I have a few project topics that I could give you if you are stuck, but please make a reasonable effort of coming up with one of your own. The project is broken down into several stages: • Project Proposal (due September 25, 5% of the grade) Please submit a 1 or 2-page abstract of what are you planning to do. Each project group will give a 10-minute in-class presentation of the proposal, followed by 5 minutes of Q and A. Please prepare a short PowerPoint presentation (2-5 slides) and email it to me ahead of the presentation. If you are having difficulties zeroing in on the project topic, please talk to me earlier rather then later; once the project proposal is presented, please consider it is set in stone. • Checkpoint (due October 6, 5% of the grade)

Page 5

By the time of the checkpoint, you should have completed data collection and . If you are having major problems with any of the steps, this is the time to talk about it. The checkpoint will be graded on a 5-point scale (0 = “nothing done yet”, 5 = “strong progress towards stated goal”) • Paper Draft (due November 27, 5% of the grade) An assessment of your progress; The research itself should be practically complete; we should be able to have a fruitful discussion about your results and their interpretation. I will act as an editor and give you written feedback, both on the quality of your research and the quality of your writing. From that point on, you will have between two and three weeks to finish writing and produce a polished piece. The draft will be graded on a 5-point scale (0 = “not started writing or not submitted the draft”, 5 = “only minor editing required for final submission”) • Project Paper (due December 18, 25% of the grade) The final product of the course project should be a scholarly paper describing the motivation for the project, data collection and analysis methods, results and discussion thereof. The goal is to create a paper that may be presented in a social network analysis conference, and potentially lead to a longer-term research project with multiple publications. The paper will be graded on its scientific merit, as well as the quality of writing. While I am not expecting a written masterpiece, the paper should at least be readable. I can put you in touch with writing and editorial help and resources, if you require this kind of help.

5.3

Final Presentations, 10% of the grade

Instead of a formal final exam, we will conclude this course with a mini-conference open to the public. The presentation format of the mini-conference will mimic that of a real research meeting, and serve as a training ground for further presentations in the field. I repeat: the final presentations will be open to the public. Please choose your project topic in such a way that a public presentation will not get you (and me!) fired or sued. Every course project will be given a 20-minute presentation slot, with a 5-10 minute question-and-answer period at the end.

6

Grade Breakdown

I’m a very easy grader. Most people get ’A’s in this course. To get a bad grade, you need to do one of the following: (a) not show up to class, (b) never participate when you do show up, (c) when participating, be obnoxious and ruin other people’s experience, (d) never do

Page 6

any homeworks or class project work, or (e) when you do get something done, turn it in half-incomplete with lots of excuses about work or whatever. I will ask you to re-do things when I’m unhappy with the quality of work – I’ll send you back for rewrites with specific comments. Once I’m happy with it, you’ll get your ”A” because – guess what? – you deserved it (it just took you a little longer then others). Because it’s a large class for such an involved process – if you don’t hear from me, you can safely assume that things are OK. If you’re still worried, ping me on Facebook and I’ll respond. • Course Project - 55% of the total grade, broken down as follows: – Project Proposal - 5% – Checkpoint - 5% – Paper Draft - 5% – Project Paper - 30% – Presentation - 10% • Homeworks – 40%, or 20% each. • Course Participation - 5%

6.1

Late Assignments

Homeworks will be accepted up 2 days late with no penalty. After this grace period, the penalty is a letter-grade for ”really late”, and 2 letter grades for ”wait, this was due, like last month”. I’m usually generous with extensions, as long as you keep me appraised of your progress. ”My dog ate my homework” or ”I was away for work” doesn’t cut it. Lateness in course projects will most likely be caused by overly ambitious project proposals – so be careful not to bite off more then you can chew. If you end up with an overly ambitious project, write up a portion of it for the course and turn it in on time, and then continue to work on the project as an independent study course.

7

Software

The purpose of this course is NOT to teach you how to use software packages for network analysis, but to work through the concepts and methodologies of the field. There are a number of good packages available for social network analysis, and all of them have a place under the sun. I will make each of the packages below available as a download, and also post downloadable documentation for them. For the final project, you will have to make a choice of software tools – and you are responsible for learning how to use them

Page 7

7.1

Social Network Analysis and Visualization

Every piece of software below has its strengths and weaknesses, and none are perfect. You are free to experiment with all of them and decide what works best for you. I will make an assumption in this class that you will be able to learn the software on your own. RTFM, please. • Python NetworkX: http://networkx.lanl.gov/ We’ll be using this in class. The book is based on it, to a large extent. Free and Open Source. What? You’ve never programmed before? Shucks! It ain’t that hard, actually. If you can figure out an Excel macro, you can deal with Python. Couple suggestions to these that have never programmed: – www.learnpython.org has an excellent interactive tutorial. Do it first. – To *really* learn Python, download a free e-book called “Learning Python the Hard Way”. It’s way excellent – I will run a tutorial at the second class session to get you up to speed – The book walks you through a lot of things – I wrote it so I don’t have to repeat them over and over ;-) Best place to install Python from (for this class) is the Enthought Distribution – www.enthought.com . They have easy installers for all platforms, and it’s free for academic use. • Gephi: http://gephi.org/ A very visual open-source package. Sometimes it works very nicely, occasionally still falls on its face. YMMV. Free and Open Source. • NodeXL: http://nodexl.codeplex.com If you’re an Excel guru, this might work very well for you. I personally avoid Excel like the plague... YMMV. Free and Open Source. • Network Workbench: http://nwb.slis.indiana.edu/ A brand new package, very impressive start. Free and Open Source. UPDATE: never really went anywhere... too bad! • R SNA Package: http://erzuli.ss.uci.edu/R.stuff/ Some swear by it, many swear AT it. Best mathematics implementation, best statistical methods, best of breed as far as rigor. If you already know R, you’ll be right at home. If not, you’ll be in a lot of pain. Free and Open Source. • UCINET, NetDraw, Mage: http://www.analytictech.com/downloaduc6.htm UCINET exposes to the user a large amount of mechanics of doing social network analysis. The package is very comprehensive and covers most tasks you will face in analysis. The major drawback is an outdated user interface which makes it difficult to do multiple analysis sets. Free for 30 days (“trial copy”), $40 for student license.

Page 8

• ORA: http://www.casos.cs.cmu.edu/projects/ora A very capable package with a nice visualization engine and clean user interface. I recommend ORA for all non-technical users as it is much easier to learn (even if not 100% complete). Free as Beer, not as Speech. • Pajek: http://vlado.fmf.uni-lj.si/pub/networks/pajek/ A Slovenian package. Stellar capabilities, but strange user interface. You can use it instead of UCINET if you prefer. Free as Beer, not as Speech. • NetMiner: www.netminer.com A commercial package; nice analysis and visualization with a clean interface, but very expensive. There’s a 30-day trial version that you could check out. • Analyst Notebook, Palantir, etc. – these are common in the government but are entirely unsuitable for this course. Ask me why and you will get a 30-minute angry rant. ;-)

7.2

Other software

• Acrobat Reader • Mathematics software - Matlab, Octave, SciLab, Mathematica. Optional. • Stats software - Stata, SPSS, R. Optional.

7.3

Writing

Use any word processor you are comfortable with; for electronic submissions, I recommend sending me PDF files. A Anybody willing to learn L TEX and use it for paper writing will be rewarded with free pizza (beer, lunch, take your pick) ;-)

8

Collaboration and Plagiarism

This course plagiarism policy adheres to the standard academic practices. If continue to work in a university setting or publish in scholarly publication, you can expect to face very similar standards. Homeworks are designed to help you enhance your analysis skills, and teach you to use the software packages. Most of the software we use does not have clean interfaces, and you will have a pretty difficult time learning it. Therefore, it is OK if you work in groups during the analysis stage of your project. However, all writing should be individual and reflect personal interpretations and conclusions drawn from the data. Longer assignments (the Project Proposal, Checkpoint and the Project Paper) can and should be collaboratively authored. Make sure that the title page of the paper lists all of

Page 9

the coauthors. If you receive help during the project in any significant form (including, but not limited to programming, data processing, visualization, editing and proofreading) from any person outside of your project team, please thank this person in the Acknowledgements section. A good guide to proper citation and acknowledgement of source material can be found at http://www.dartmouth.edu/ sources/contents.html If your group is experiencing internal dysfunction - for example, if one person is doing all of the work while the others do nothing - this will inevitably affect the quality of the end product, and everybody’s grades. If your group is not communicating well and not sharing the workload, please talk to me as soon as you can.

8.1

Plagiarism

Given the fact that collaboration is allowed and encouraged, we will probably never encounter this provision in the course. However, I am obligated to remind you that the GMU functions on the Honor Code system, which means there is a Zero Tolerance policy for plagiarized assignments. In this course, an assignment will be considered plagiarized if it consists of a verbatim copy or simple paraphrase of another student’s assignment - or significant use of copied text, data or figures without proper acknowledgements or citations. An assignment will also be considered plagiarized if you copy research results from a published paper – unless they are presented in a context of critical evaluation and properly cited in the bibliography. This is what the University requires me to say in regards to plagiarism: Any plagiarized assignment will receive an automatic grade of ”F.” This may lead to failure for the course, resulting in dismissal from the university. This dismissal will be noted on the students transcript. For foreign students who are on a university-sponsored visa (e.g. F-1, J-1 or J-2), dismissal also results in the revocation of their visa. Sounds scary, doesn’t it? So, PLEASE CITE YOUR COLLABORATORS!

Page 10

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.