Analysis of Graph Centralities with the Help of Shapley Values

Social network analysis as a modern theoretical approach to studying various processes of human life. Application SNA to fields of science. Application exact and approximate Shapley values in the assessment of centrality measures of nodes in networks.

Рубрика Программирование, компьютеры и кибернетика
Вид дипломная работа
Язык английский
Дата добавления 02.09.2016
Размер файла 735,2 K

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

Размещено на http://www.allbest.ru/

Analysis of Graph Centralities with the Help of Shapley Values

Content

  • Introduction
  • 1. The Background of the Study
  • 1.1 The Problem Statement and the Scope of the Study
  • 1.2 The Professional Significance of the Study
  • 2. LiteratureReview
  • 3. The Methodology
  • 3.1 The CapacityFunction
  • 3.1.1 The Number of Simple Paths Approach
  • 3.1.2 Price for the Length of the Simple Paths Approach
  • 3.1.3 The Number of theEdges Approach
  • 3.1.4 Non-Monotonous Function Case
  • Conclusion
  • References

Introduction

Social network analysis (SNA) is a modern theoretical approach to studying various processes of human life. SNA covers a lot of areas of applied mathematics like, graph theory, data analysis, decision theory, game theory, computational mathematics, etc., and is applicable to various fields of science like biology, physics, medicine, economics, politics, etc.

1. The Background of the Study

Generally, the analysis of networks can be viewed from two basic angles. The first one is an analysis of specific networks and relations, for instance, the analysis of transportation systems, the Internet, economic relations, social media networks, etc. The second one is the graph theory approach that studies different structural characteristics and properties of networks that are considered as graphs (i. e. mathematical objects). It also solves many mathematical problems like the finding of the shortest paths, cliques detection, impact analysis, the finding of the maximum flow, etc. Different kinds of network models, measures, methods of analysis and applications are described in [Newman 2010].

1.1 The Problem Statement and the Scope of the Study

Conventionally, network is considered as graph, whereas nodes represent agents (players, participants, etc.) and edges represent the relationship (links) between them. Such representation allows us to analyze the network with the help of different mathematical tools and to study numerous aspects. The main purpose of the present studyis to link together the existing knowledge in different areas of mathematics such as game theory and social network analysis. To put it more precisely, the question to be explored is how to identify key (pivotal, important) players and to detect the most powerful participants and groups of participants of a network with the help of the game theory approach.

1.2 The Professional Significance of the Study

Within the framework of the project a number of governing principles of a network analysis will be applied to random networks and compared with methods that will be developed in this project; then, the most valid method will be detected. The raised issue can be scientifically interesting from the viewpoint of those who study different metrics of networks because it is highly important, firstly, to reach an accuracy of calculations and, secondly, to reduce the complexity of calculations.

2. LiteratureReview

A variety of tools and measures were developed for studying the static features of a network, which include the identification of central nodes. Several types of node characteristics exist that can be regarded as measures of importance, for example, centrality measures, transitivity, groups, etc. Moreover, as will be shown below, some social choice theory and game theory methods can be used in SNA for the identification of key players.

As is indicated in [Newman 2010], variants of centrality groups exist; that include degree centralities, closeness centralities, betweenness centralities, etc. According to [Newman 2010], the simplest centrality measure is the degree of a node that indicates the number of edges connected to it. If a directed graph is considered, then we are able to distinguish between in-degree and out-degree centralities. It is indicated in [Newman 2010] that despite the fact that it is the simplest measure, the number of in - and out - going edges of a node denotes influential players, that are the most accessible to information, or if acitation network is considered, this measure will detect the most popular and important articles. However, it is important to mention, that in majoritarian graphs, that come from social theory and decision making, the less in-degree and the more out-degree centralities the vertex has, the more powerful it is.

Obviously, a degree of a node recognizes only neighborhood connections. The eigenvector centrality was introduced by Bonacich in [Bonacich 1972] to consider the long-distance relations. The relation in a network can be represented by adjacency matrix , where is the number of nodes, and the eigenvector centrality of node is the -th component of eigenvector , that calculates as , i. e. corresponds to the largest eigenvalue of . As a result, the power of nodes' neighbors impacts on the power of this very node recursively, i. e. the more powerful neighbors the vertex has, the more powerful it is itself. Generally, any eigenvector of matrix A can be interpreted as a measure of centrality, however an eigenvector that corresponds to a maximal eigenvalue is usually regarded as themostconsistent: this vector,and only this one,accurate to its co-directional vectors, is positive and real for non-negative irreducible matrix A (i. e. for a strongly-connected graph) by Perron-Frobenious theorem [Gantmacher, 2000].

Another group of centrality measures is closeness centrality that is observed in [Bavelas 1950] and characterizes an average distance between the given node and all other nodes. In other words, for node the measure is calculated as

,

where is the shortest path between nodes and (i. e. the distance between and y). Top values of closeness centrality indicate the most reachable nodes; thus, this measure is significant in information networks or in postmen relations, where information is spread among participants.

Alternatively, in [Freeman 1977] attention is drawn to the fact that the closeness centrality defined in [Bavelas 1950] has limited utility and may be applied to specific problems, so the different approach that is based on distances between nodes was introduced there. The author determines the betweenness centrality measure that is calculated as the number of times the given node occurs on paths between other nodes, i. e. the number of the shortest paths that go through the node. The high value of betweenness centrality indicates that the node controls a lot of connections in the network, or that the node dominates many other nodes. Undoubtedly, this measure is useful in any kind of communication networks.

The extension of betweenness centrality approach is proposed in [Zhizhchenko et al. 2015]. The method is based on Kirchhoff rule where the centrality measure for a node (in an electric circuit) is the number of times the electricity current flows through this node starting from one chosen node and going to an adjoined node. The advantage of this approach is that this centrality measure considers all paths between nodes.

One obvious disadvantage of both closeness and betweenness centralities lies in the fact that they both require calculation of the shortest paths between nodes, which is a quite complex issue.

More precisely, complexities of classical centrality measures [Das 2014] are represented in Table 1.

Table 1.complexities of classical centrality measures

Centrality measure

Complexity, n - number of nodes, m - number of edges

Degree

Eigenvector

Closeness

Betweenness

Another approach that is highly important in economics and financial networks is the measure of influence of agents or so-called power indices. According to the definition, power indices measure the influence of an agent or of the coalition of agents on other members of community. One of the classical power indices is regarded as Shapley-Shubik index or the Shapley value that is described in [Shapley 1953]; it was developed for n-person games and can be easily adapted to other fields of study. For agent the Shapley-Shubik index is calculated as

,

where N is a set of all agents, n is the number of agents in a game; is a coalition of agents; is the value function of coalition Sthat is usually assumed to be monotonically increasing (the definition of the value function depends on the field of study). Formally, this index shows the average contribution of agenti to all possible coalitions that has i.

In [Myerson 1977] the adaptation of Shapley values was proposed for network-graphs where nodes are able to unite into subgraphs and these subgraphs have some characteristic functions. The centrality of a node is an average contribution of a node into all subgraphs that contain this node. This approach is a basis of thepresent work.

In [Trukhina 2012] a fair division with the help of the Myerson vector is described. The author studies two cases: a tree graph and a general graph. For a tree graph a characteristic function is as follows:

,

whereS is a coalition of nodes,r is some payoff, is the number of paths of length k, L is a maximal distance between any pair of nodes in S. Then, a total division payoff for node i is

,

where is the number of paths of length k that contain node i. The author proves that this allocation procedure coincides with the Myerson value. For a general graph only the shortest paths are considered in a characteristic function; then payoff is the number of the shortest paths of length k that contain node i in a division procedure. It is also proved that this allocation satisfies Myerson conditions.

A number of exact methods were developed to calculate the Shapley value [Fatima et al. 2008]. First of all, they include the direct enumeration of all possible coalitions. This is a simple approach but the complexity of the algorithm is which means that this method is applicable only for instances with a small number of players.

Another exact method is based on generating functions [Wilf 1994]. For each player the Shapley value is calculated in terms of coefficients of a corresponded polynomial, as given in [Algaba et al. 2003]: for a weighted voting game

,

whereis the number of swings of player i in coalitions of size j. A swing for player i is defined as a pair of coalitions (SЎИ{i} and S) s. t. SЎИ{i} is a winning one and S is not (that means that i is critical for coalition S). where is the number of possible ways in which j players (j ? i) can achieve k votes summarily. This sequence {} forms the following generating function:

.

This method is useful for games in which many players have the same numbers of votes (weights), otherwise it is memory consuming for computation.

Finally, another exact method worth mentioning is the Ieong and Shoham's method [Ieong&Shoham 2005]. This approach implies so-called "marginal contribution networks” (MC-net) coalitional games that are represented by sets of rules over coalitions. This means that the Shapley value is already settled to a group of players of a given coalitional game and thereupon the approach aggregates values for defining the Shapley value of the overall game. This method has a polynomial time complexity, but it requires MC-net game representation.

Summing up the above, all exact methods have unsatisfactory features (high complexity of calculation, memory consumption, limited game form, etc.). In view of this, a number of approximation methods were invented to overcome these disadvantages [Fatima et al. 2008].

The first approximate method for the Shapley value was proposed by Mann and Shapley in [Mann & Shapley 1960]. This method uses Monte-Carlo simulation in the following way [Fatima et al. 2008]: for random coalition S random variable X (for each player) is defined that indicates the significance of i (swinging): X={0,1}. The expectation of X for S is

(the average contribution of i to S). Consequently, due to the fact that X is binomial the variance of X is

.

Then, for an independent sample with respect to coalitions the estimated Shapley value is

with variance . The problem is that we do not know how to derive a sample; it definitely influences the effectiveness of the method. Otherwise, we would get the linear time complexity method.

Another approximate method was proposed by Owen in [Owen 1972] and it is called a multi-linear extension (MLE) method. This method is based on the condition that the Shapley value in the form

can be represented with the help of a Beta function, i. e.

implies that

where [Fatima et al. 2008]. Direct evaluation of the Shapley value in this form has an exponential time complexity. Thus, the author suggested to evaluate as

,

where is a normal distribution function, and are the mean and variance of , - the weight of i. is a random variable that indicates the number of votes set by players who vote in the same way as i with probability ч. The method in this form has linear time complexity.

[Leech 2003] describes the modified MLE method which combines Owen's method form [Owen 1972] and direct enumeration in order to improve the accuracy of the Shapley estimation. This method divides players into two groups: major players (those with large weights) and minor players (those with small weights). Then, the direct enumeration is applied to major players and MLE method is applied to minor players. The larger the group with major players the more complex it is to estimate the Shapley value; the complexity is equal to where l is the number of major players. However, this method is more accurate than Owen's MLE method [Fatima et al. 2008].

The last approximate method that serves as the basis for the current work is a method from [Zlotkin&Rosenschein 1994] that is based on random permutations. The index is calculated as

,

where Р is a set of different permutations and . The advantage of this method is that it is not required to enumerate permutations: .

Summarizing the above, a variety of centrality measures exists and the interpretation of the measure depends on both the method of calculation and the nature of a network. Some of them are simple for computation (degree centralities), some of them are complicated (closeness, betweenness, etc.), but different tools have beendevelopedwith the aim of reducing complex calculations. In this work centrality measures andtheRandom Permutation method of approximate computation of indices are investigated on artificial (randomly generated) networks, with the central focus on the Shapley value calculation. Here other relaxations of methods to reduce complexity of calculation are tried. Various measures are also compared.

3. The Methodology

The main purposeof this research is to investigatethe possibility to apply exact and approximate Shapley values in the assessment of centrality measures of nodes in networks.

The key advantage of as a centrality measure is that it detects the most influential nodes among all subgroups of nodes, i. e. everybody prefers to be in a group with the most pivotal node than with anyother node.

To apply this theory let us consider graph where is a set of nodes, and is a set of edges. A classical definition of the Shapley value in terms of graphsis

, (1)

· is a subset of nodes

· is a monotonous function of asubset of nodes

· is a contribution of node to subset

An equivalent formula for is

, (2)

· is a permutation of all nodes in a graph,

· is a set of all permutations of a set of nodes

· is a node with position in permutation

· is a subset of nodes .

The main challenge here is that a direct calculation of requires the enumeration of all nonempty subsets of nodes for the formula (1) or permutations for the formula (2) which are computationally intensive tasks. For that reason approximation method of calculation based on random permutations is used. Then approximate is

(3)

· is a permutation of all nodes in a graph,

· is some set of permutations

· is a fixed number of permutations in

o means that all permutations are enumerated and

· is a node with position in permutation

· is a subset of nodes .

For each node the algorithm selects subsets of nodes that are onthe left side of node in every permutation. This means that the contribution of node into random subsetsis calculated and then the average of these contributions is taken.

Here two loops occur: one is on nodes and the other one is on permutations. Evidently, it is more rational to make the outer loop on permutations and then calculate for each permutation. Consequently, the algorithm is as follows:

· Fix a number of random permutations

·

· For each permutation :

network shapley value

o Move from to

o For each node calculate ;

o was calculated during the previous step

o Update

o Cumulate

· Normalize to the number of permutations .

The question about the accuracy of calculation may arise here. There can be different criteria of the accuracy, for example, metric criteria, where is a vector of approximation Shapley values, is a vector of exact Shapley values, is some metric, is a threshold. But in terms of centrality measures the order of values is usually required; this means that rank criteria can be used here, where is some rank correlation measure (Kendall correlation, Spearman correlation, etc.). But due to the fact that exact values are unknownthe ranks of approximation values that consider and permutations should be compared until rank correlation tends to level off. Levelling-off of ranks means that adding more permutations into calculation willchange ranks insignificantly. The last notice can be regarded as criteria for the estimation of optimal number of permutations in the following way:

Giventhreshold and thelevel of reliability is consequently estimated until the correlation exceeds times running; then the optimal number of permutations .

Evidently, when threshold is considered some of the ranks are still not stable. Hence, for the calculation of final ranks the median of all ranks from to can be calculated. The median value allows us to escape random jumps in ranks.

Additionally, tracks of node's ranks can be drawn by memorizing them on previous steps.

3.1 The CapacityFunction

For calculation some monotonous function on subsets of nodes should be defined to estimate the contribution of each node into all subsets.

Regardless of the formula for thefunction it is reasonable tosave function values for some subsets of nodes. Obviously, for each permutation when the last node is reached thecapacity function on the whole graph is calculated. But instead of calculating it for every permutation itcan be calculated only once (for the first permutation) and its value can be saved. Similarly, when last but one node is reached thecapacity function of nodes is calculated and similar graphs are met many times if etc.

Suppose that there is some limitation on the number of stored values. Then a minimal size of a subgraph, for which the function value is stored, can be easily defined. Thus, the number of graphs with nodes is ; the number of graphs with nodes is ; …; the number of graphs with nodes is . Then the minimal size of a subgraph for which the function value is stored is s. t.

,

,

where is a binomial coefficient.

So when a subgraph with nodes is met its function value is calculated and is saved to the storage. It is convenient to store values in a dictionary where the key is a binomial line of a node occurrence which is transferred to a decimal system; by such key stored values are extracted when this subgraph is met for the next time.

Below, various capacity functions which are appropriate for different graph sizes and different problem areas are described.

3.1.1 The Number of Simple Paths Approach

The first approach is based on all simple paths enumeration (paths with no cycles). Here only the number of simple paths is considered, i. e. the capacity function looks like

, (4)

where

· is a simple path between nodes ;

· is a set of all simple paths between nodes ;

· is a capacity of a set of all simple paths between nodes .

Here the length of a path is not important. Such function represents the reliability and safety of a system and, for example, it can be applied to such networks as Internet or mobile networks where only the fact of a signal access is consideredand the path of a signal is ignored. Consequently, the most powerful nodes here provide the stability of a system.

The model is illustrated with the help of a small example ( here is based on formula (1)). Given graph

.

The graph is illustrated on Figure 1.

Figure 1. Example 1.

Hence, there are nonempty subsets of nodes; if the length of a path is unlimited then the capacities of subsets are as follows (see Table 2):

Table2. Capacities of subgraphs.

Subset

Subgraph

Pairs of nodes

Num of paths between pairs

Node

{i}, i=1,…,5

-

-

0

i

0

0

1/5

{1,2}

1-2

1

1

1

2

0

0

1

1

1/20

{1,3}

1-3

1

1

1

3

0

0

1

1

1/20

{1,4}

1-4

0

0

1

4

0

0

0

0

1/20

{1,5}

1-5

1

1

1

5

0

0

1

1

1/20

{2,3}

2-3

0

0

2

3

0

0

0

0

1/20

{2,4}

2-4

1

1

2

4

0

0

1

1

1/20

{2,5}

2-5

1

1

2

5

0

0

1

1

1/20

{3,4}

3-4

1

1

3

4

0

0

1

1

1/20

{3,5}

3-5

1

1

3

5

0

0

1

1

1/20

{4,5}

4-5

0

0

4

5

0

0

0

0

1/20

{1,2,3}

1-2

1-3

2-3

1

1

0

2

1

2

3

0

1

1

2

1

1

1/30

{1,2,4}

1-2

1-4

2-4

1

1

1

3

1

2

4

1

0

1

2

3

2

1/30

{1,2,5}

1-2

1-5

2-5

2

1

1

4

1

2

5

1

1

1

3

3

3

1/30

{1,3,4}

1-3

1-4

3-4

1

0

1

2

1

3

4

1

0

1

1

2

1

1/30

{1,3,5}

1-3

1-5

3-5

1

2

1

4

1

3

5

1

1

1

3

3

3

1/30

{1,4,5}

1-4

1-5

4-5

0

1

0

1

1

4

5

0

1

0

1

0

1

1/30

{2,3,4}

2-3

2-4

3-4

1

1

1

3

2

3

4

1

1

0

2

2

3

1/30

{2,3,5}

2-3

2-5

3-5

1

1

1

3

2

3

5

1

1

0

2

2

3

1/30

{2,4,5}

2-4

2-5

4-5

1

1

1

3

2

4

5

0

1

1

3

2

2

1/30

{3,4,5}

3-4

3-5

4-5

1

1

1

3

3

4

5

0

1

1

3

2

2

1/30

{1,2,3,4}

1-2

1-3

1-4

2-3

2-4

3-4

1

2

1

1

1

1

7

1

2

3

4

3

2

3

2

4

5

4

5

1/20

{1,2,3,5}

1-2

1-3

1-5

2-3

2-5

3-5

3

1

2

1

1

1

9

1

2

3

5

3

4

4

2

6

5

5

7

1/20

{1,2,4,5}

1-2

1-4

1-5

2-4

2-5

4-5

2

2

1

1

1

1

8

1

2

4

5

3

1

4

3

5

7

4

5

1/20

{1,3,4,5}

1-3

1-4

1-5

3-4

3-5

4-5

1

0

2

1

1

1

6

1

3

4

5

3

1

4

2

3

5

2

4

1/20

{2,3,4,5}

2-3

2-4

2-5

3-4

3-5

4-5

2

2

2

2

2

2

12

2

3

4

5

3

3

3

3

9

9

9

9

1/20

{1,2,3,4,5}

1-2

1-3

1-4

1-5

2-3

2-4

2-5

3-4

3-5

4-5

3

3

3

3

2

2

2

2

2

2

24

1

2

3

4

5

12

6

8

9

7

12

18

16

15

17

1/5

Then, exact Shapley values are:

.

Hence, a final vector of Shapley values is:

.

A vector of final ranks of Shapley values is: .

We can see that if we delete node 1 then we arestill able to reach every node from any point; hence, node 1 is the least powerful in this system. Alternatively, node 2 contains many paths over all subsets and this results inthe fact that node 2 is the most powerful and guarantees the stability of a system.

If the approach of formula (3) is used then inmost cases all permutations are considered because exact differs by small values for nodes 2, 3,5. However, for some order of permutations less than 90 permutations for exact ranks estimation is required.

Evidentially, the number of simple paths can be enormous. Consider a complete graph with n nodes. Then, for any two nodes and the number of paths between them is equal to. If all pairs of nodes in a complete graph are considered, then the number of simple paths is , which is a huge number even for small n. Hence, some approximation algorithms are proposed. There are variousways of simplifying the model. One of the approaches is based on the limitation of the maximal length of a path; hence, paths that are not greater than some predefined limit are considered:

, (5)

· || is a function of the length of a path;

· L is the maximal path length.

Formula (5) still requires the enumeration of all simple paths (with the limitation on the length) which is a computationally complex issue. Due to this fact,this function is applied to small graphs (with 20 nodes).

To provide a thoughtful analysis of this approach based on formula (4) and its analogue (5) exact (without limitations) has been calculated, the ranks of exact with approximation based on formula (5) have been compared. The accuracy (rank correlation) of calculations is analyzed with respect to the limitation on path length. The criteria for the considered number of permutations is , where , is an optimal number of permutations. Figure 2 provides rank correlations of exact and with different parameter L calculated with random permutations:

Table 3. Top-10 nodes by and with different L.

Rank

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

1

9

8

9

10

9

9

9

9

9

9

8

9

9

10

9

9

9

9

9

9

9

2

7

5

6

6

5

7

6

7

7

7

7

7

5

7

7

5

6

5

7

7

7

3

20

20

19

19

19

20

20

20

20

20

20

20

20

20

20

20

20

20

20

20

20

4

18

18

20

20

20

19

19

19

18

18

18

18

18

18

18

18

18

18

18

18

18

5

17

15

15

15

15

15

15

16

16

16

16

17

17

16

16

16

16

16

17

17

17

6

15

16

16

16

16

16

16

15

15

15

15

15

15

15

15

15

15

15

15

15

15

7

14

13

11

9

10

11

10

12

12

14

17

14

14

14

13

13

14

14

14

14

14

8

4

10

8

8

8

8

7

6

5

6

14

5

4

4

6

4

5

6

4

4

4

9

16

17

17

18

17

17

17

17

17

17

6

16

16

17

17

17

17

17

16

16

16

10

19

19

18

17

18

18

18

18

19

19

19

19

19

19

19

19

19

19

19

19

19

It can be concluded that L does not influence the accuracy (i. e. high values of L do not guarantee exact results). Hence, parameter Lcan be limited by small value and get another interpretation for top nodes, for example, top nodes are pivotal in its vicinities of radius L. It should be noted that the same value ofL for subgraphs of all sizes is fixed; hence, in small subgraphs all paths are considered.

The study also provides a representative analysis of the proposed method (see example below). The following criteria of accuracy of rank definition is used:

,

where . This example considers the maximal lengths of path at Italso tracks fluctuations of correlation measures and ranks of nodes (Figures 4-5) depending on the considered number of permutations. Moreover, with classical centrality measures are compared (Tables 4 - 7).

Example Network 1:

Table 4. Graph characteristics.

Number of nodes

20

Number of edges

100

Average degree

5

Diameter

4

Density

0.263

Number of components

1

Table 5. Top-5 nodes by Sv and classical centralities.

Rank

Nodes

Degree

Eigenvector

Closeness

Betweenness

1

20

20

20

20

20

2

12

12

12

12

12

3

14

14

14

18

18

4

18

18

15

19

19

5

15

2

2

2

2

Table 6. Kendall rank correlation.

Degree

Eigenvector

Closeness

Betweenness

0,379

0,400

0,379

0,284

Degree

0,453

0,432

0,547

Eigenvector

0,537

0,421

Closeness

0,547

Betweenness

Table 7. Spearman rank correlation.

Degree

Eigenvector

Closeness

Betweenness

0,490

0,412

0,501

0,364

Degree

0,550

0,502

0,717

Eigenvector

0,660

0,528

Closeness

0,677

Betweenness

The optimal number of permutation is The number of subgraphs is equal to . The algorithm considers approximately subgraphs.

3.1.2 Price for the Length of the Simple Paths Approach

The second approach is also based on all simple paths enumeration. But here the length of paths is considered in the following way

, (6)

· is a simple path between nodes ;

· is a set of all simple paths between nodes ;

· || is a function of the length of a path.

Definitely, instead of the inverse length of a path the price for the length can have another form of a decreasing function, for example, etc. In this work only function (7) is analyzed.

The most powerful nodes here represent major hubs. Such approach can be applied to the transportation systems where we want to minimize a path length but if one of the ties or a group of ties are “out of work” we are ready to pay for an alternative (more distant) route.

Let us consider Example 1. The results are as follows (computational calculation): .

Here node 1 improved its position in the system and node 4 worsened it. The reason is that node 4 is mainly achieved through long paths and doesn't provide short passages. Conversely, despite the fact that node 1 is unreachable it provides many short passages, so it contributes more in subgraphs than node 4.

Apparently, distant nodes add rather long paths than short paths, which leads to the fact that their contribution to subsets should be less than the contribution of central (hub) nodes in some cases. Hence, the inverse length of a path is considered to decrease the importance of long paths and increase the importance of short paths. Additionally, with a limitation on the length of a path added and with only short paths taken into consideration, the contribution of central nodes will be generally more significant. Moreover, such limitation permits us not to enumerate all paths between pairs of nodes, which is a computationally complex issue. So, the capacity function looks like

,(7)

where is a limitation on the length of a path (small value).

Moreover, another approximation method can be implemented here. This approach is based on a random choice of paths between pairs of nodes with the limitation on the length of a path. Then, the capacity function looks like

,(8)

where is some random subset of all possible paths between nodes x and y. The number of paths that are considered can be taken as some fraction of the estimation of the number of paths between two nodes in a subgraph, i.e. . For the upper bound of the number of simple paths between nodes i and j of the length less than in subgraph the number of all paths (including loops) between these two nodes is calculated as , where is an element of matrix , is an adjacency matrix of subgraph A. For small graphs (n ? 20) and maximal length of paths the number of simple paths between two nodes is insignificantly overestimated.

For the analysis of the approach based on formula (6) and its analogue (7) and (8) exact (with ) has been calculated, the ranks of exact with approximation based on formula (8) with have been compared. The accuracy (rank correlation) of calculations is analyzed with respect to the limitation on the number of paths. The criteria for the considered number of permutations is , where , is an optimal number of permutations.

For subgraphs of size less than 10 nodes formula (7) is considered (without limitations on the number of paths). Figure 6 provides rank correlations of exact and with different parameter L calculated with random permutations (for one of the graphs).

Table 8. Top-10 nodes by and with different number of considered paths.

Rank

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1

10

12

11

10

10

10

10

11

10

10

10

2

5

5

4

4

4

5

5

6

7

6

5

3

19

19

19

19

19

20

19

19

19

20

19

4

20

20

20

20

20

19

20

20

20

19

20

5

15

15

15

16

15

15

15

16

15

15

15

6

16

16

16

15

16

16

16

15

16

16

16

7

9

9

9

9

9

9

9

9

9

9

9

8

8

8

8

8

8

8

8

8

8

8

8

9

17

18

18

18

18

18

18

17

17

17

17

10

18

17

17

17

17

17

17

18

18

18

18

It can be concluded that the number of considered paths influences the accuracy even if all paths for small subgraphs are considered (i. e. exclusion of some paths can significantly change ranks of nodes). Experimental analysis shows that it is betternot to rely on such approximation due to the fact that it sometimes gives inexact outcome.

As in the previous case this approach is appropriate only for small graphs. This function is applied to graphs with 20 nodes.

The criteria for the accuracy of calculation and maximal path length are similar to the one applied to the previous model. The graph and algorithm characteristics are as follows (information is given for one of the graphs):

Example Network 2:

Table 9. Graph characteristics.

Number of nodes

20

Number of edges

107

Average degree

5,35

Diameter

3

Density

0.282

Number of components

1

Table 10. Top-5 nodes by Sv and classical centralities.

Rank

Nodes

Degree

Eigenvector

Closeness

Betweenness

1

20

20

20

20

20

2

16

16

16

8

8

3

15

8

15

16

16

4

8

15

8

9

11

5

11

11

11

15

15

Table 11. Kendall rank correlation.

Degree

Eigenvector

Closeness

Betweenness

0,663

0,663

0,432

0,379

Degree

0,474

0,453

0,358

Eigenvector

0,558

0,316

Closeness

0,484

Betweenness

Table 12. Spearman rank correlation.

Degree

Eigenvector

Closeness

Betweenness

0,800

0,785

0,567

0,508

Degree

0,623

0,597

0,508

Eigenvector

0,713

0,441

Closeness

0,609

Betweenness

The optimal number of permutation is The number of subgraphs is equal to . The algorithm considers approximately subgraphs.

3.1.3 The Number of theEdges Approach

The third approach is based on the number of edges in a subgraph or, in other words, the number of paths of length 1. In this case thecapacity function looks like

, (8)

· is theindicator function of an edge between two nodes and :

The interpretation of the contribution of a node can be as follows: the bigger the value of a function is the more consolidated a node is. The difference between capacity functions with a node and without this node indicates the number of direct connections of this node. This approach is close to the degree centrality measure but the main difference is that considers different groups of nodes. This approach can be applied to friendshipnetworks where nodes are people and edges are their relationships. High values of detects people that are communicative and active in different communities while the classical degree centrality detects only local central nodes.

For our small example nodes 1, 2, 3, 5 have the same and consequently they have one rank and node 4 has a lower rank.

Due to the fact that this function is not as complicated for the calculation as the other two mentioned above, this function can be applied to graphs of middle size (in this work graphs up to 100 nodes are considered). In total 60 exponentially distributed networks with different number of nodeshave been analyzed. Hence,the complexity of calculations (i. e. the number of required permutations) can be estimated with respect to some graph characteristics. It seems reasonable to study the dependence of the optimal number of permutations on a network density: density includes information about both the number of nodes and the number of edges; a separation of these characteristics is unrepresentative. The criteria for the optimal number of permutations is represented in Table 13:

Table 13. Criteria for

Number of nodes, N

Criteria

Level of Reliability, LR

As is seen the higher the density of exponentially distributed graph is the less number of permutation...


Подобные документы

  • Overview of social networks for citizens of the Republic of Kazakhstan. Evaluation of these popular means of communication. Research design, interface friendliness of the major social networks. Defining features of social networking for business.

    реферат [1,1 M], добавлен 07.01.2016

  • Information security problems of modern computer companies networks. The levels of network security of the company. Methods of protection organization's computer network from unauthorized access from the Internet. Information Security in the Internet.

    реферат [20,9 K], добавлен 19.12.2013

  • Theoretical aspects of the application digital education resources in teaching computer science according to the capabilities of electronic programs. Capabilities of tools Microsoft Office and Macromedia Flash. Application of the program Microsoft Excel.

    контрольная работа [1,5 M], добавлен 07.07.2013

  • Social network theory and network effect. Six degrees of separation. Three degrees of influence. Habit-forming mobile products. Geo-targeting trend technology. Concept of the financial bubble. Quantitative research method, qualitative research.

    дипломная работа [3,0 M], добавлен 30.12.2015

  • IS management standards development. The national peculiarities of the IS management standards. The most integrated existent IS management solution. General description of the ISS model. Application of semi-Markov processes in ISS state description.

    дипломная работа [2,2 M], добавлен 28.10.2011

  • Программа обработки одномерного массива средствами Visual Basic for Application (VBA) на предмет преобразования, печати, удаления, сортировки, поиска сумм, положительных, чётных элементов, их кратности и дополнения другими элементами и значениями данных.

    контрольная работа [12,3 K], добавлен 07.10.2012

  • Visual Basic for Application. Объекты и коллекции. Использование VBA в среде Access. Основы современной технологии проектирования АИС. Автоматизированное проектированиеCASE-технологий. Реинжиниринг бизнес-процессов и проектирование корпоративной ИС.

    курсовая работа [2,1 M], добавлен 22.02.2008

  • Правила создания и особенности работы с приложением Windows Application. Рассмотрение структуры панели Properties и ее функционального назначения. Возможности пункта меню "View". Практическая разработка приложения - калькулятора для сложения двух чисел.

    лабораторная работа [99,1 K], добавлен 01.12.2011

  • Краткая история и основные цели создания Wireless Application Protocol (WAP) — беспроводного протокола передачи данных. Особенности работы WAP-броузеров. Адресация беспроводной сети. Поддержка протоколов Internet при использовании IP соединений.

    реферат [623,3 K], добавлен 11.04.2013

  • Consideration of a systematic approach to the identification of the organization's processes for improving management efficiency. Approaches to the identification of business processes. Architecture of an Integrated Information Systems methodology.

    реферат [195,5 K], добавлен 12.02.2016

  • Характеристика системы программирования Visual Basic For Application. Автоматизация подписки на газеты и журналы, а так же их учёт. Связь между сходными документами, Базой данных и выходными документами. Встроенные объекты MS Access, методы и свойства.

    курсовая работа [350,8 K], добавлен 22.09.2012

  • Характеристика мови програмування VBA (Visual Basic for Application): можливості й засоби. Використання редактора Visual Basic. Створення та виконання VBA-програм. Типи даних, змінні й константи, операції й вирази. Керуючі оператори, процедури й функції.

    реферат [29,9 K], добавлен 28.06.2011

  • Алгоритми розв’язання задач у вигляді блок–схем. Використання мови програмування MS VisualBasic for Application для написання програм у ході вирішення задач на одномірний, двовимірний масив, порядок розв’язання задачі на використання символьних величин.

    контрольная работа [742,9 K], добавлен 27.04.2010

  • Решение экономических задач с помощью Microsoft Excel и инструментария Visual Basic For Application. Способы запуска редактора Visual Basic, правила его синтаксиса. Создание автоматических макросов по сортировке и выборке. Создание управляющих кнопок.

    курсовая работа [852,0 K], добавлен 24.09.2010

  • Тестування і діагностика є необхідним аспектом при розробці й обслуговуванні обчислювальних мереж. Компанія Fluke Networks є лідером розробок таких приладів. Такими приладами є аналізатори EtherScope, OptіVіew Fluke Networks, AnalyzeAir та InterpretAir.

    реферат [370,5 K], добавлен 06.01.2009

  • Файлы BGI и содержимое модуля Graph, инициализация и закрытие графических режимов, их классификация, анализ и управление. Рисование графических примитивов и фигур, управление цветами и шаблонами, вывод текста, выбор шрифта и стиля, сжатия изображения.

    реферат [65,3 K], добавлен 31.05.2010

  • Основи роботи з пакетом FlexPDE: select, coordinates, variables, definitions, initial values, equations, constraints, extrusion. Оператори і функції програмного пакету. Рівняння руху рідини в циліндричній системі координат. Математичні функції, константи.

    дипломная работа [1,4 M], добавлен 08.05.2013

  • История Network File System. Общие опции экспорта иерархий каталогов. Описание протокола NFS при монтировании удаленного каталога. Монтирование файловой системы Network Files System командой mount. Конфигурации, обмен данными между клиентом и сервером.

    курсовая работа [1,3 M], добавлен 16.06.2014

  • Задачи дисциплины Social Analytics. Основное понятие Social Media Analytics и его составляющие. Важность вовлеченности компании в социальные медиа. Сбор данных и пошаговая организация вовлеченности в соц-медийные проекты. Инструменты для обработки данных.

    реферат [1,8 M], добавлен 05.12.2014

  • Програми лінійної та розгалуженої структури. Програмна реалізація функцій для роботи з датою та часом. Робота з візуальними компонентами керування. Створення інтерфейсу користувача стандартними подіями. Глобальні ідентифікатори Screen, Mouse, Application.

    отчет по практике [1,3 M], добавлен 24.02.2015

Работы в архивах красиво оформлены согласно требованиям ВУЗов и содержат рисунки, диаграммы, формулы и т.д.
PPT, PPTX и PDF-файлы представлены только в архивах.
Рекомендуем скачать работу.