Personal Network Analysis with R
Instructor: Raffaele Vacca (University of Florida, USA)
This workshop is targeted to students and researchers who are doing or intend to do research with personal network data, and are thinking of using R as their data management and analysis software.
The basic idea behind personal (or ego-centric) network research is that the characteristics of one’s personal contacts, the type of relationship one has with them, and the way these contacts interact with each other, affect outcomes in one’s life such as mental health, smoking behavior, or assimilation to a foreign culture. This idea has been fruitfully applied in such diverse disciplines as anthropology, public health, urban sociology, and migration studies. The typical personal network study involves identifying a sample of respondents, the egos, and collecting a network of usually few dozens personal contacts, the alters, from each. Ego is asked about certain characteristics of each alter, such as age, sex or residential location; certain characteristics of each ego-alter relation, such as emotional closeness or frequency of contact; and characteristics of alter-alter relations, such as whether each alter knows each other. This information is then summed up into ego-level variables that describe the composition and the structure of each ego’s personal network. Finally, these personal network variables are linked to ego-level outcomes such as mental health, smoking behavior or cultural assimilation, for example with standard regression models.
Personal network researchers are therefore commonly confronted with the problem of handling edge lists or adjacency matrices from many personal networks, possibly hundreds of them; running the same set of operations on each network to extract a number of ego-level network summary variables, including compositional (e.g. average alter age, proportion family in the network, average frequency of contact), and structural ones (e.g. network density, number of components, average alter degree); and constructing appropriate ego-level datasets where network summary variables are joined to other, non-network ego-level variables.
While such workflow can be handled by “pointing and clicking” on the menu buttons of programs like UCINET and SPSS, there are several limitations to this approach. Pointing and clicking is repetitive, boring and, most importantly, prone to errors. It typically does not allow to run the same set of operations on many objects in batch, without the user’s intervention. Moreover, pointing and clicking makes research not reproducible: since pointing-and-clicking programs do not usually keep any record, or script, of the operations executed by a user, these operations cannot be reproduced later in time in exactly the same way and order, by the same researcher on further data or by other researchers on the same data. Although specific programs for the analysis of personal networks, such as EgoNet and E-Net, do offer batch analysis procedures that reduce the amount of pointing and clicking for users, they can only perform a limited set of analyses, and they still do not allow scripting. On the other hand, R opens up a whole different way of doing personal network research: it eliminates pointing and clicking entirely and allows users to write reproducible scripts that batch analyze data on dozens, hundreds or thousands of personal networks at once.
This workshop will cover the usage of R in all the main stages of a personal network research project: the management of personal network data, the analysis of the composition and structure of personal networks, network visualization, and the analysis of the association between personal network characteristics and ego-level outcomes. We will focus on the main facilities available in R to manage data and run analyses in batch: loops (for, while, repeat), the “apply” family of functions (apply, lapply, sapply, tapply, mapply), and the plyr package for the “Split-Apply-Combine” analysis strategy. We will use the two main packages for network analysis in R, igraph and network.
The workshop will be based on data and R code from actual recent research projects on personal networks. This is a hands-on workshop where participants will run R code on real-world data, therefore attendants will need a laptop with R installed. Some familiarity with R is an advantage, but is not necessary; other introductory workshops on R and its uses in network analysis, like “Introduction to Social Network Analysis with R” (Michal Bojanowski), may give a useful background for this workshop.