Extracting Data from Online Social Networks

Instructors: Cristina Pérez-Solà (Autonomous University of Barcelona, Spain) & Jordi Herrera (Autonomous University of Barcelona, Spain)


Online Social Networks (OSNs) have gained an important space in millions of people’s lives, changing both the way humans communicate and think about the Internet. Nowadays, there are more than 70 different OSN providers that claim that their networks have more than a million users, four out of ten of the best ranked pages by Alexa are online social network providers and there exist OSNs specialized in almost anything we can think of. Moreover, social networking sites have break through the entertainment circle and are now being used in a wide variety of contexts, from companies to schools, charity organizations or libraries.

All this popularity together with the fact that OSNs are a wonderful source of information about human relations have made them a popular research topic. From sociologists, epidemiologists or economists to computer security people, a lot of researchers have shown their interest in studying Online Social Networks.

In this workshop, we will show how to interact with OSN providers in order to obtain data from their sites. First, we will present the basic technologies that the OSN sites use to provide an interface to their data. Then, we will teach the basic skills needed to develop a small computer program that is able to use these technologies to collect data from social networking sites such as Twitter or Facebook. We will also show how we can tune the program to collect the exact data we are interested in, and how to convert this data to standard network formats. After that, we will use the developed program to acquire OSN data in real time, and visualize the results.

Regarding the studied technologies, we will explain how the OSN providers use APIs (Application Programming Interfaces) and the REST architectural style (Representational State Transfer) to allow programmers to obtain data from their networks. Moreover, we will also make a quick introduction to OAuth, the most common authentication method used when obtaining data from OSN. Regarding the technical tools that we will use to obtain the data, we will make use of the programming language R. R is a language specially suited to handle and analyze data. Furthermore, R has a set of libraries that expand its functionality and that will allow us to handle authentication, data adquisition, data processing, and visualization in an easy way.

