Combinatorial Analysis of Multiple Networks. Matteo Magnani Barbora Micenková Luca Rossi - PDF

Description
Combinatorial Analysis of Multiple Nets Matteo Magnani Barbora Micenková Luca Rossi arxiv: v1 [cs.si] 20 Mar 2013 Abstract The study of complex nets has been historically based on simple graph

Please download to get full document.

View again

of 17
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information
Category:

Shopping

Publish on:

Views: 4 | Pages: 17

Extension: PDF | Download: 0

Share
Transcript
Combinatorial Analysis of Multiple Nets Matteo Magnani Barbora Micenková Luca Rossi arxiv: v1 [cs.si] 20 Mar 2013 Abstract The study of complex nets has been historically based on simple graph data models representing relationships between individuals. However, often reality cannot be accurately captured by a flat graph model. This has led to the development of multi-layer nets. These models have the potential of becoming the reference tools in net data analysis, but require the parallel development of specific analysis methods explicitly exploiting the information hidden in-between the layers and the availability of a critical mass of reference data to experiment with the tools and investigate the real-world organization of these complex systems. In this we introduce a real-world layered net combining different kinds of online and offline relationships, and present an innovative methodology and related analysis tools suggesting the existence of hidden motifs traversing and correlating different representation layers. We also introduce a notion of betweenness centrality for multiple nets. While some preliminary experimental evidence is reported, our hypotheses are still largely unverified, and in our opinion this calls for the availability of new analysis methods but also new reference multi-layer social net data. Introduction Recently, large user-generated net data have become available through online social net sites like Twitter [10], Facebook [17], YouTube [6], Friendfeed [5] and several others. This phenomenon has determined a new wave of interest in Social Net Analysis (SNA), especially when applied to very large nets. However, while traditional SNA has mainly focused on the analysis of a single platform at a time, recent researches have stressed how both online and offline social experiences can hardly be simplified using a single kind of relationship between homogeneous subjects [4, 15, 20, 13, 1, 3]. Just like our offline experience is defined as a set of relationships having a specific meaning within a specific context [9], our online experience cannot be reduced to a single flat layer of connections. By contrast, it can be seen as a composite, stratified phenomenon. As it has been pointed out [16], we are what the Italian novelist Luigi Pirandello defined as one, no one and one hundred thousand at the same time. M. Magnani is with ISTI, CNR, Pisa and has done part of this while at the Department of Computer Science, Aarhus University, Denmark, Barbora Micenková is at the Department of Computer Science, Aarhus University, Denmark and L. Rossi is at the Department of Communication Studies, University of Urbino, Italy This has been supported 1 Figure 1: Different data structures to study multiple nets: super-sociomatrix, single layers, and multi-layer matrix. The power-sociomatrix, that we introduce for the first time in this paper, is made of all possible combinations of layers mathematically, it is the power-set of the super-sociomatrix Several models have been proposed to represent multiple related nets (a multilayer net). [23] define the sociomatrix of a relation R as a matrix with entries x ij representing the value of the tie from node i to node j on relation R in this paper we focus on the case where x ij {0, 1}, i.e., a tie is either present or not. A super-sociomatrix is a collection of sociomatrices where each of them corresponds to one type of relation in a multi-layer net. In Figure 1 we have represented this data structure: four layers, each one defining a net (e.g., Twitter, Facebook, FourSquare and LinkedIn connections) and composing the so-called super-sociomatrix. While the super-sociomatrix model has been available for a long time, collecting meaningful multi-layer data is a complex activity, and there is only a limited choice of tools for a specialized analysis of multi-layer nets. In fact, with a few exceptions [20, 1, 2, 14], once these nets have been constructed, they are analyzed using traditional methods, e.g., the different layers are merged and existing net centrality measures are applied. For example, Figure 2 shows how two different layers (top) correspond to a merged net (bottom) that can be analyzed using any existing SNA tool. In this paper we propose a methodological shift in the way in which multi-layer nets can be analyzed. This is then instantiated into three main research hypothein part by the Italian Ministry of Education, Universities and Research PRIN project Relazioni sociali ed identità in Rete: vissuti e narrazioni degli italiani nei siti di social net and FIRB project RBFR Figure 2: A net visualization of a super-sociomatrix: two different layers (kinds of relationships) and a representation of the multi-layer matrix obtained merging these two layers ses to be tested on a real dataset that we have collected to this aim. The fil rouge behind these hypotheses is that the identification of different semantic layers connecting a set of individuals and the analysis of the exponential number of combinations of these layers can allow the identification of hidden patterns. A strong underlying conjecture that we make is that not only it is important to consider multiple layers, as it has been suggested in several related s, but also that considering all the single layers one by one and/or the complete multi-layer net containing all connections may result in information loss. Therefore, we introduce the new concept of power-sociomatrix. Going back to Figure 1, we can see how the power-sociomatrix is made of all combinations of layers in the figure, two of the 15 elements of the power-sociomatrix are indicated, correponding to the first two and the last three layers respectively. Our general hypothesis, if verified, would have an impact on several aspects of net analysis, including the definition of distance between nodes and methods to find communities. However, as we will see in the following, so far we have been able to observe only some of its foreseen effects, leading to the necessity of additional experiments on new datasets. Research hypotheses H1 The centrality of an individual is a function of the different kinds of relationships s/he has with other individuals. For example, consider Figure 3. Looking at the multilayer net to the left, the distance between A and D is one because they are directly connected. However, if we consider all the layers hidden behind the multi-layer net (right hand side) our analysis can become much more accurate. First, we can see that the shortest path between A and D might not be the direct connection through LinkedIn, but the connection through two Facebook edges, considering that Facebook hosts many more interactions than LinkedIn and thus the two steps on Facebook might be faster to traverse than a single step on LinkedIn. In addition, if we look at node C we can see that it is in the middle of different possible paths between A and D, e.g., A has with C who goes to the cinema and has with D. In summary, 3 Figure 3: A merged multi-layer net (left) and its expanded view revealing different layers many hidden paths would contribute to the betweenness of B and C, depending on the different nets and combinations of nets they are active in. H2 Communities can emerge when a specific combination of layers is considered. Although several community detection algorithms exist, in practice they achieve good results when some more or less well-separated clusters exist. This is strictly related to the way in which community detection algorithms have been defined: some try to maximize modularity, favoring well separated clusters, some use random walk approaches, where the probability that a walker crosses two clusters is proportional to the number of edges between them, some exploit measures like betweenness, that is high when few other edges connect distinct portions of the net [8]. However, when we deal with on-line relationships, community detection becomes extremely hard. According to our hypothesis, this depends on the fact that a large number of semantically different layers are considered all-together (i.e., a super-sociomatrix), determining the co-existence of several hidden communities. In Figure 5, the idea is illustrated on a simple example. A net with 8 nodes and 3 types of edges (i.e. a 3-layer net) is depicted in the upper left corner. It is obvious that the analysis of the super-sociomatrix does not reveal any interesting patterns as there are too many edges in the graph (in fact, the graph is fully connected in this example). The same can be observed for nets of individual relation types (on the right). However, choosing two specific layers, some more evident clusters emerge (lower left part of the figure, clusters denoted by black and white nodes). H3 Different combinations of layers can be mutually dependent. As such, being active on two different nets may not have the same importance. For example, we may find that our connections on a specific net are the same people we are connected to on a combination of two other social media sites. As a consequence, staying in this net has a cost in terms of time spent but might not determine any added value with respect to the size of our social experience. Contributions In this paper we provide the following contributions: We present a real dataset that depicts five different kinds of relationships between a close community of individuals. The dataset has been collected using both 4 Figure 4: Clusterability in combinations of layers. Lower left combination of layers produces an interesting clustering traditional survey-based methods and the analysis of online profiles, and as a consequence it constitutes a multi-layer net spanning both the online and offline dimensions. We apply an existing distance function for multi-layer data [14] to our dataset. To the best of our knowledge, this is the first time this function is tested on real data. We define a new measure, called multi-layer betweenness centrality, that extends betweenness centrality taking into consideration paths crossing several different layers. Also this function is tested on real data, discussing its ability to identify central users with respect to the whole multi-layer structure. We study the variations in our ability to identify meaningful clusters depending on the specific combination of layers under consideration. We relate the concept of net similarity on a power-sociomatrix to the concept of net portfolio, i.e., the management of our stratified social interactions. Outline of the paper In the next section we introduce the first example of multi-layer net data collected using both online and offline techniques. We present both the adopted methodology and a general description of the data, that will be made publicly available as part of the ICWSM data collection initiative. Then we introduce specific analysis methods and measures taking the combinations of net layers under consideration, including a new betweenness measure for multiple nets. All the measures are experimentally validated on our real data, and we show that while for some measures we obtain a measurable effect, other basic hypotheses cannot be clearly corroborated using the currently available data. This leads to a discussion of future directions concerning both 5 methodological issues and collection of new datasets. We conclude the paper with a review of the state of the art and some concluding remarks. Dataset In this section, we are going to introduce a new real-world dataset which consists of measurements of multiple types of relations among the population and can be treated as a multi-layer graph. We first describe the methodology behind data collection, and then provide some basic statistics of the collected data. A full analysis follows, with the application of the new methods introduced in this paper. Methodology Collection of data was conducted at the Department of Computer Science at Aarhus University among the employees. The population of the study is 61 employees (out of the total number of 142) who decided to join the survey, including professors, postdoctoral researchers, PhD students and administration staff. According to [23], there are two types of variables that can be included in a net data set: structural and composition variables. Structural variables are measured on pairs of actors and they express specific ties between the pairs (e.g. friendship). Composition variables are defined for individual actors and they are measurements of various actor attributes (e.g. age). For our study, we measured 5 structural variables, namely: current ing relationships, repeated activities, regularly eating together, co-authorship of a publication, and friendship on Facebook. These variables cover different types of relations between the actors based on their interactions. All relations are dichotomous which means that they are either present or absent, without weights. Measurements of the first three variables (off-line data) were collected via a questionnaire which had been distributed among the employees on-line. The questions were of a roster format which means that each actor was presented with a complete list of other actors in the net and asked to select people with whom he/she has the aforementioned ties (ing relationship, activities, eating ). The number of choices for each question was not limited. The measurements are results of individual assessments done by the actors and therefore the relations are directed, however, we do not intend to study the aspect of personal perception of the relationships and so we decided to flatten the data into nondirectional connections. Thus, if an actor A indicated a tie to actor B, we input an edge into the net even if actor B did not indicate 6 Work Leisure Coauthor Lunch FB # of edges # of con. comp avg. vertex deg Table 1: Basic statistics computed on the sociomatrices of the 5 different relations number of edges, number of connected components and average vertex degree a tie to actor A. On the top of this, the respondents were asked to provide their user information for a couple of most widespread online nets. 77% of the respondents who filled in the questionnaire stated that they have a Facebook account and provided their username. All respondents provided answers to all questions which means that our multi-layer net is complete. Information about the co-authorship relation was obtained from the on-line DBLP bibliography database without the need to directly ask the actors. A co-authorship of at least one publication by a pair of actors resulted in an edge in the net. DBLP gets new data with a delay of several months and therefore the current ing relationships net is quite distinct from it. Moreover, current ing relationships net includes other types of interactions than cooperation on papers (e.g. also cooperation on administrative ). Friendship relations among all the actors who stated that they have a Facebook account were retrieved from the site using a custom application. Data Description We describe some common statistical measures of the single-layer nets in Tab. 1. The Co-authorship net is the smallest and less connected of all layers, Work and Lunch nets have the most edges and the highest average vertex degree can be observed for the Facebook net. The analysis of the net through a super-sociomatrix approach has been for a long time a viable approach to handle multiple nets. In the multi-layer supersociomatrix, all the layers componing our complex structure of experience are merged together producing a single layer net. We can therefore produce a summary description of this net. It counts 61 nodes and 706 edges belonging to all the 5 layers we are using in this study. Average vertex degree is and the net has a diameter of 4. Multi-layer Net Analysis Multi-layer Betweenness Our definition of betweenness is based on the existing concept of geodesic distance for multi-layer nets [13, 14]. Therefore, we first introduce this concept by example. Let us consider Figure 6, and the distance between A and D. If we assume to merge the 7 Figure 5: Visualization of the super-sociomatrix two layers (Facebook and Lunch) and apply a traditional definition of shortest path, we can see that the distance between A and D is 2 (passing through B or through C). Figure 6: Multi-layer distances Now, let us consider the two distinct layers. The first consideration is that we have 8 a potential path through B, only traversing Facebook connections, and a potential path through C, only traversing Lunch connections. Therefore, we do not longer say that passing through B or C constitutes two equivalent shortest paths: they represent in fact different and, according to the terminology introduced in [14], incomparable paths. Then, we can additionally notice that the same sequence of nodes A-B-D can also be traversed through one layer, switch layer and continue. For example, A may send a Facebook message to B, who then has with D and physically delivers the message to him/her. In summary, the potential shortest paths between A and D on a multi-layer context are: A - C - D A - B - D A - C - D A - B - D Differently from single-net cases, the multi-layer distance between A and D is not defined as a number, but as a set of paths. However, our intuition is that having a way to compute shortest paths in a multi-layer context, we can define an extended version of node betweenness as the number of multi-layer shortest paths between any two nodes that contain the input node. The first feature of this notion is that it reduces to the existing definition of betweenness when single layers are used. The second feature is that in general we may expect a node to increase its multi-layer betweenness as more arcs are added. Therefore, multilayer betweenness when compared to single layer betweenness emphasizes if there are nodes that are more or less present in more or less layers. In general, a node with many connections on different layers will have the opportunity of significantly increasing its betweenness thanks to the many alternative paths going through it, while nodes active e.g. on a single layer will more likely keep a low overall value of multi-layer centrality. Figure 7 shows a comparison of traditional betweenness and multi-layer betweenness, for every node in our net. However, given the skewed distribution of these measures a direct comparison of their absolute values is not very informative. Instead, we can compare the ranking of the nodes, e.g., take the node with the highest traditional betweenness and check if it also has one of the highest values of multi-layer betweenness. If the two rankings are very correlated, than we can conclude that our new measure is not significant because it does not highlight any effects of considering all the layers as separate entities. Figure 8 shows the changes in ranking using traditional betweenness and multilayer betweenness. In general, we can observe that these two measures are in fact very correlated. However, there are some cases where the centrality of an individual changes significantly notice that the sample consists of 61 individuals, and two nodes change their ranking of almost 20 positions, i.e., they completely change their role if the stratified structure is taken into account. At the same time, the important nodes remain more or less stable. 9 Figure 7: Comparison of absolute values of the two different measures of betweenness. Multi-layer betweenness on x-axis versus traditional betweenness measure on y-axis. Each point corresponds to an actor Figure 8: Increase/decrease of 61 individual actors in ranking according to two different measures of betweenness. Actors were sorted (from left to right) in a descending orderincrease/decrease according toin the ranking traditional of individual measure actors according of betweenness to when calculated switching from on the supersocio
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks