The Number of Simple Paths Approach
The Problem Statement and the Scope of the Study. Complexities of classical centrality measures. Investigation of the number of simple paths approach. Kendall and Spearman correlations traces. Characteristic of the case of a Non-Monotonous function.
Рубрика | Экономика и экономическая теория |
Вид | курсовая работа |
Язык | русский |
Дата добавления | 11.02.2017 |
Размер файла | 2,9 M |
Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже
Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.
Размещено на http://www.allbest.ru/
1. Introduction
Social network analysis (SNA) is a modern theoretical approach to studying various processes of human life. SNA covers a lot of areas of applied mathematics like, graph theory, data analysis, decision theory, game theory, computational mathematics, etc., and is applicable to various fields of science like biology, physics, medicine, economics, politics, etc.
1.1. The Background of the Study
Generally, the analysis of networks can be viewed from two basic angles. The first one is an analysis of specific networks and relations, for instance, the analysis of transportation systems, the Internet, economic relations, social media networks, etc. The second one is the graph theory approach that studies different structural characteristics and properties of networks that are considered as graphs (i.e. mathematical objects). It also solves many mathematical problems like the finding of the shortest paths, cliques detection, impact analysis, the finding of the maximum flow, etc. Different kinds of network models, measures, methods of analysis and applications are described in [Newman 2010].
1.2. The Problem Statement and the Scope of the Study
Conventionally, network is considered as graph, whereas nodes represent agents (players, participants, etc.) and edges represent the relationship (links) between them. Such representation allows us to analyze the network with the help of different mathematical tools and to study numerous aspects. The main purpose of the present study is to link together the existing knowledge in different areas of mathematics such as game theory and social network analysis. To put it more precisely, the question to be explored is how to identify key (pivotal, important) players and to detect the most powerful participants and groups of participants of a network with the help of the game theory approach.
1.3. The Professional Significance of the Study
Within the framework of the project a number of governing principles of a network analysis will be applied to random networks and compared with methods that will be developed in this project; then, the most valid method will be detected. The raised issue can be scientifically interesting from the viewpoint of those who study different metrics of networks because it is highly important, firstly, to reach an accuracy of calculations and, secondly, to reduce the complexity of calculations.
correlations spearman kendall monotonous
2. Literature Review
A variety of tools and measures were developed for studying the static features of a network, which include the identification of central nodes. Several types of node characteristics exist that can be regarded as measures of importance, for example, centrality measures, transitivity, groups, etc. Moreover, as will be shown below, some social choice theory and game theory methods can be used in SNA for the identification of key players.
As is indicated in [Newman 2010], variants of centrality groups exist; that include degree centralities, closeness centralities, betweenness centralities, etc. According to [Newman 2010], the simplest centrality measure is the degree of a node that indicates the number of edges connected to it. If a directed graph is considered, then we are able to distinguish between in-degree and out-degree centralities. It is indicated in [Newman 2010] that despite the fact that it is the simplest measure, the number of in- and out- going edges of a node denotes influential players, that are the most accessible to information, or if a citation network is considered, this measure will detect the most popular and important articles. However, it is important to mention, that in majoritarian graphs, that come from social theory and decision making, the less in-degree and the more out-degree centralities the vertex has, the more powerful it is.
Obviously, a degree of a node recognizes only neighborhood connections. The eigenvector centrality was introduced by Bonacich in [Bonacich 1972] to consider the long-distance relations. The relation in a network can be represented by adjacency matrix , where is the number of nodes, and the eigenvector centrality of node is the -th component of eigenvector , that calculates as , i.e. corresponds to the largest eigenvalue of . As a result, the power of nodes' neighbors impacts on the power of this very node recursively, i.e. the more powerful neighbors the vertex has, the more powerful it is itself. Generally, any eigenvector of matrix A can be interpreted as a measure of centrality, however an eigenvector that corresponds to a maximal eigenvalue is usually regarded as the most consistent: this vector, and only this one, accurate to its co-directional vectors, is positive and real for non-negative irreducible matrix A (i.e. for a strongly-connected graph) by Perron-Frobenious theorem [Gantmacher, 2000].
Another group of centrality measures is closeness centrality that is observed in [Bavelas 1950] and characterizes an average distance between the given node and all other nodes. In other words, for node the measure is calculated as
,
where is the shortest path between nodes and (i.e. the distance between and y). Top values of closeness centrality indicate the most reachable nodes; thus, this measure is significant in information networks or in postmen relations, where information is spread among participants.
Alternatively, in [Freeman 1977] attention is drawn to the fact that the closeness centrality defined in [Bavelas 1950] has limited utility and may be applied to specific problems, so the different approach that is based on distances between nodes was introduced there. The author determines the betweenness centrality measure that is calculated as the number of times the given node occurs on paths between other nodes, i.e. the number of the shortest paths that go through the node. The high value of betweenness centrality indicates that the node controls a lot of connections in the network, or that the node dominates many other nodes. Undoubtedly, this measure is useful in any kind of communication networks.
The extension of betweenness centrality approach is proposed in [Zhizhchenko et al. 2015]. The method is based on Kirchhoff rule where the centrality measure for a node (in an electric circuit) is the number of times the electricity current flows through this node starting from one chosen node and going to an adjoined node. The advantage of this approach is that this centrality measure considers all paths between nodes.
One obvious disadvantage of both closeness and betweenness centralities lies in the fact that they both require calculation of the shortest paths between nodes, which is a quite complex issue.
More precisely, complexities of classical centrality measures [Das 2014] are represented in Table 1.
Table 1. Complexities of classical centrality measures
Centrality measure |
Complexity, n - number of nodes, m - number of edges |
|
Degree |
||
Eigenvector |
||
Closeness |
||
Betweenness |
Another approach that is highly important in economics and financial networks is the measure of influence of agents or so-called power indices. According to the definition, power indices measure the influence of an agent or of the coalition of agents on other members of community. One of the classical power indices is regarded as Shapley-Shubik index or the Shapley value that is described in [Shapley 1953]; it was developed for n-person games and can be easily adapted to other fields of study. For agent the Shapley-Shubik index is calculated as
,
where N is a set of all agents, n is the number of agents in a game; is a coalition of agents; is the value function of coalition S that is usually assumed to be monotonically increasing (the definition of the value function depends on the field of study). Formally, this index shows the average contribution of agent i to all possible coalitions that has i.
In [Myerson 1977] the adaptation of Shapley values was proposed for network-graphs where nodes are able to unite into subgraphs and these subgraphs have some characteristic functions. The centrality of a node is an average contribution of a node into all subgraphs that contain this node. This approach is a basis of the present work.
In [Trukhina 2012] a fair division with the help of the Myerson vector is described. The author studies two cases: a tree graph and a general graph. For a tree graph a characteristic function is as follows:
,
where S is a coalition of nodes, r is some payoff, is the number of paths of length k, L is a maximal distance between any pair of nodes in S. Then, a total division payoff for node i is
,
where is the number of paths of length k that contain node i. The author proves that this allocation procedure coincides with the Myerson value. For a general graph only the shortest paths are considered in a characteristic function; then payoff is the number of the shortest paths of length k that contain node i in a division procedure. It is also proved that this allocation satisfies Myerson conditions.
A number of exact methods were developed to calculate the Shapley value [Fatima et al. 2008]. First of all, they include the direct enumeration of all possible coalitions. This is a simple approach but the complexity of the algorithm is which means that this method is applicable only for instances with a small number of players.
Another exact method is based on generating functions [Wilf 1994]. For each player the Shapley value is calculated in terms of coefficients of a corresponded polynomial, as given in [Algaba et al. 2003]: for a weighted voting game
,
where is the number of swings of player i in coalitions of size j. A swing for player i is defined as a pair of coalitions (S?{i} and S) s.t. S?{i} is a winning one and S is not (that means that i is critical for coalition S). where is the number of possible ways in which j players (j ? i) can achieve k votes summarily. This sequence {} forms the following generating function:
.
This method is useful for games in which many players have the same numbers of votes (weights), otherwise it is memory consuming for computation.
Finally, another exact method worth mentioning is the Ieong and Shoham's method [Ieong & Shoham 2005]. This approach implies so-called “marginal contribution networks” (MC-net) coalitional games that are represented by sets of rules over coalitions. This means that the Shapley value is already settled to a group of players of a given coalitional game and thereupon the approach aggregates values for defining the Shapley value of the overall game. This method has a polynomial time complexity, but it requires MC-net game representation.
Summing up the above, all exact methods have unsatisfactory features (high complexity of calculation, memory consumption, limited game form, etc.). In view of this, a number of approximation methods were invented to overcome these disadvantages [Fatima et al. 2008].
The first approximate method for the Shapley value was proposed by Mann and Shapley in [Mann & Shapley 1960]. This method uses Monte-Carlo simulation in the following way [Fatima et al. 2008]: for random coalition S random variable X (for each player) is defined that indicates the significance of i (swinging): X={0,1}. The expectation of X for S is
(the average contribution of i to S). Consequently, due to the fact that X is binomial the variance of X is
.
Then, for an independent sample with respect to coalitions the estimated Shapley value is
with variance .
The problem is that we do not know how to derive a sample; it definitely influences the effectiveness of the method. Otherwise, we would get the linear time complexity method.
Another approximate method was proposed by Owen in [Owen 1972] and it is called a multi-linear extension (MLE) method.
This method is based on the condition that the Shapley value in the form
can be represented with the help of a Beta function, i.e.
implies that
where [Fatima et al. 2008]. Direct evaluation of the Shapley value in this form has an exponential time complexity. Thus, the author suggested to evaluate as
,
where is a normal distribution function, and are the mean and variance of , - the weight of i. is a random variable that indicates the number of votes set by players who vote in the same way as i with probability ч. The method in this form has linear time complexity.
[Leech 2003] describes the modified MLE method which combines Owen's method form [Owen 1972] and direct enumeration in order to improve the accuracy of the Shapley estimation. This method divides players into two groups: major players (those with large weights) and minor players (those with small weights). Then, the direct enumeration is applied to major players and MLE method is applied to minor players. The larger the group with major players the more complex it is to estimate the Shapley value; the complexity is equal to where l is the number of major players. However, this method is more accurate than Owen's MLE method [Fatima et al. 2008].
The last approximate method that serves as the basis for the current work is a method from [Zlotkin & Rosenschein 1994] that is based on random permutations. The index is calculated as
,
where Р is a set of different permutations and . The advantage of this method is that it is not required to enumerate permutations: .
Summarizing the above, a variety of centrality measures exists and the interpretation of the measure depends on both the method of calculation and the nature of a network. Some of them are simple for computation (degree centralities), some of them are complicated (closeness, betweenness, etc.), but different tools have been developed with the aim of reducing complex calculations. In this work centrality measures and the Random Permutation method of approximate computation of indices are investigated on artificial (randomly generated) networks, with the central focus on the Shapley value calculation. Here other relaxations of methods to reduce complexity of calculation are tried. Various measures are also compared.
3. The Methodology
The main purpose of this research is to investigate the possibility to apply exact and approximate Shapley values in the assessment of centrality measures of nodes in networks.
The key advantage of as a centrality measure is that it detects the most influential nodes among all subgroups of nodes, i.e. everybody prefers to be in a group with the most pivotal node than with any other node.
To apply this theory let us consider graph where is a set of nodes, and is a set of edges. A classical definition of the Shapley value in terms of graphs is
, (1)
· is a subset of nodes
· is a monotonous function of a subset of nodes
· is a contribution of node to subset
An equivalent formula for is
, (2)
· is a permutation of all nodes in a graph,
· is a set of all permutations of a set of nodes
· is a node with position in permutation
· is a subset of nodes .
The main challenge here is that a direct calculation of requires the enumeration of all nonempty subsets of nodes for the formula (1) or permutations for the formula (2) which are computationally intensive tasks. For that reason approximation method of calculation based on random permutations is used. Then approximate is
(3)
· is a permutation of all nodes in a graph,
· is some set of permutations
· is a fixed number of permutations in
o means that all permutations are enumerated and
· is a node with position in permutation
· is a subset of nodes .
For each node the algorithm selects subsets of nodes that are on the left side of node in every permutation. This means that the contribution of node into random subsets is calculated and then the average of these contributions is taken.
Here two loops occur: one is on nodes and the other one is on permutations. Evidently, it is more rational to make the outer loop on permutations and then calculate for each permutation. Consequently, the algorithm is as follows:
· Fix a number of random permutations
·
· For each permutation :
o Move from to
o For each node calculate ;
o was calculated during the previous step
o Update
o Cumulate
· Normalize to the number of permutations .
The question about the accuracy of calculation may arise here. There can be different criteria of the accuracy, for example, metric criteria , where is a vector of approximation Shapley values, is a vector of exact Shapley values, is some metric, is a threshold. But in terms of centrality measures the order of values is usually required; this means that rank criteria can be used here, where is some rank correlation measure (Kendall correlation, Spearman correlation, etc.). But due to the fact that exact values are unknown the ranks of approximation values that consider and permutations should be compared until rank correlation tends to level off. Levelling-off of ranks means that adding more permutations into calculation will change ranks insignificantly. The last notice can be regarded as criteria for the estimation of optimal number of permutations in the following way:
Given threshold and the level of reliability is consequently estimated until the correlation exceeds times running; then the optimal number of permutations
.
Evidently, when threshold is considered some of the ranks are still not stable. Hence, for the calculation of final ranks the median of all ranks from to can be calculated. The median value allows us to escape random jumps in ranks.
Additionally, tracks of node's ranks can be drawn by memorizing them on previous steps.
3.1 The Capacity Function
For calculation some monotonous function on subsets of nodes should be defined to estimate the contribution of each node into all subsets.
Regardless of the formula for the function it is reasonable to save function values for some subsets of nodes. Obviously, for each permutation when the last node is reached the capacity function on the whole graph is calculated. But instead of calculating it for every permutation it can be calculated only once (for the first permutation) and its value can be saved. Similarly, when last but one node is reached the capacity function of nodes is calculated and similar graphs are met many times if etc.
Suppose that there is some limitation on the number of stored values. Then a minimal size of a subgraph, for which the function value is stored, can be easily defined. Thus, the number of graphs with nodes is ; the number of graphs with nodes is ;…; the number of graphs with nodes is . Then the minimal size of a subgraph for which the function value is stored is s.t.
,
,
where is a binomial coefficient.
So when a subgraph with nodes is met its function value is calculated and is saved to the storage. It is convenient to store values in a dictionary where the key is a binomial line of a node occurrence which is transferred to a decimal system; by such key stored values are extracted when this subgraph is met for the next time.
Below, various capacity functions which are appropriate for different graph sizes and different problem areas are described.
3.1.1 The Number of Simple Paths Approach
The first approach is based on all simple paths enumeration (paths with no cycles). Here only the number of simple paths is considered, i.e. the capacity function looks like
, (4)
where
· is a simple path between nodes ;
· is a set of all simple paths between nodes ;
· is a capacity of a set of all simple paths between nodes .
Here the length of a path is not important. Such function represents the reliability and safety of a system and, for example, it can be applied to such networks as Internet or mobile networks where only the fact of a signal access is considered and the path of a signal is ignored. Consequently, the most powerful nodes here provide the stability of a system.
The model is illustrated with the help of a small example ( here is based on formula (1)).
Given graph
. The graph is illustrated on Figure 1.
Figure 1. Example 1.
Hence, there are nonempty subsets of nodes; if the length of a path is unlimited then the capacities of subsets are as follows (see Table 2):
Table 2. Capacities of subgraphs
Subset |
Subgraph |
Pairs of nodes |
Num of paths between pairs |
Node |
|||||
{i}, i=1,…,5 |
- |
- |
0 |
i |
0 |
0 |
1/5 |
||
{1,2} |
1-2 |
1 |
1 |
1 2 |
0 0 |
1 1 |
1/20 |
||
{1,3} |
1-3 |
1 |
1 |
1 3 |
0 0 |
1 1 |
1/20 |
||
{1,4} |
1-4 |
0 |
0 |
1 4 |
0 0 |
0 0 |
1/20 |
||
{1,5} |
1-5 |
1 |
1 |
1 5 |
0 0 |
1 1 |
1/20 |
||
{2,3} |
2-3 |
0 |
0 |
2 3 |
0 0 |
0 0 |
1/20 |
||
{2,4} |
2-4 |
1 |
1 |
2 4 |
0 0 |
1 1 |
1/20 |
||
{2,5} |
2-5 |
1 |
1 |
2 5 |
0 0 |
1 1 |
1/20 |
||
{3,4} |
3-4 |
1 |
1 |
3 4 |
0 0 |
1 1 |
1/20 |
||
{3,5} |
3-5 |
1 |
1 |
3 5 |
0 0 |
1 1 |
1/20 |
||
{4,5} |
4-5 |
0 |
0 |
4 5 |
0 0 |
0 0 |
1/20 |
||
{1,2,3} |
1-2 1-3 2-3 |
1 1 0 |
2 |
1 2 3 |
0 1 1 |
2 1 1 |
1/30 |
||
{1,2,4} |
1-2 1-4 2-4 |
1 1 1 |
3 |
1 2 4 |
1 0 1 |
2 3 2 |
1/30 |
||
{1,2,5} |
1-2 1-5 2-5 |
2 1 1 |
4 |
1 2 5 |
1 1 1 |
3 3 3 |
1/30 |
||
{1,3,4} |
1-3 1-4 3-4 |
1 0 1 |
2 |
1 3 4 |
1 0 1 |
1 2 1 |
1/30 |
||
{1,3,5} |
1-3 1-5 3-5 |
1 2 1 |
4 |
1 3 5 |
1 1 1 |
3 3 3 |
1/30 |
||
{1,4,5} |
1-4 1-5 4-5 |
0 1 0 |
1 |
1 4 5 |
0 1 0 |
1 0 1 |
1/30 |
||
{2,3,4} |
2-3 2-4 3-4 |
1 1 1 |
3 |
2 3 4 |
1 1 0 |
2 2 3 |
1/30 |
||
{2,3,5} |
2-3 2-5 3-5 |
1 1 1 |
3 |
2 3 5 |
1 1 0 |
2 2 3 |
1/30 |
||
{2,4,5} |
2-4 2-5 4-5 |
1 1 1 |
3 |
2 4 5 |
0 1 1 |
3 2 2 |
1/30 |
||
{3,4,5} |
3-4 3-5 4-5 |
1 1 1 |
3 |
3 4 5 |
0 1 1 |
3 2 2 |
1/30 |
||
{1,2,3,4} |
1-2 1-3 1-4 2-3 2-4 3-4 |
1 2 1 1 1 1 |
7 |
1 2 3 4 |
3 2 3 2 |
4 5 4 5 |
1/20 |
||
{1,2,3,5} |
1-2 1-3 1-5 2-3 2-5 3-5 |
3 1 2 1 1 1 |
9 |
1 2 3 5 |
3 4 4 2 |
6 5 5 7 |
1/20 |
||
{1,2,4,5} |
1-2 1-4 1-5 2-4 2-5 4-5 |
2 2 1 1 1 1 |
8 |
1 2 4 5 |
3 1 4 3 |
5 7 4 5 |
1/20 |
||
{1,3,4,5} |
1-3 1-4 1-5 3-4 3-5 4-5 |
1 0 2 1 1 1 |
6 |
1 3 4 5 |
3 1 4 2 |
3 5 2 4 |
1/20 |
||
{2,3,4,5} |
2-3 2-4 2-5 3-4 3-5 4-5 |
2 2 2 2 2 2 |
12 |
2 3 4 5 |
3 3 3 3 |
9 9 9 9 |
1/20 |
||
{1,2,3,4,5} |
1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5 |
3 3 3 3 2 2 2 2 2 2 |
24 |
1 2 3 4 5 |
12 6 8 9 7 |
12 18 16 15 17 |
1/5 |
Then, exact Shapley values are:
.
Hence, a final vector of Shapley values is:
· .
A vector of final ranks of Shapley values is:
· .
We can see that if we delete node 1 then we are still able to reach every node from any point; hence, node 1 is the least powerful in this system. Alternatively, node 2 contains many paths over all subsets and this results in the fact that node 2 is the most powerful and guarantees the stability of a system.
If the approach of formula (3) is used then in most cases all permutations are considered because exact differs by small values for nodes 2, 3, 5. However, for some order of permutations less than 90 permutations for exact ranks estimation is required.
Evidentially, the number of simple paths can be enormous. Consider a complete graph with n nodes. Then, for any two nodes and the number of paths between them is equal to . If all pairs of nodes in a complete graph are considered, then the number of simple paths is , which is a huge number even for small n. Hence, some approximation algorithms are proposed.
There are various ways of simplifying the model. One of the approaches is based on the limitation of the maximal length of a path; hence, paths that are not greater than some predefined limit are considered:
, (5)
· || is a function of the length of a path;
· L is the maximal path length.
Formula (5) still requires the enumeration of all simple paths (with the limitation on the length) which is a computationally complex issue. Due to this fact, this function is applied to small graphs (with 20 nodes).
To provide a thoughtful analysis of this approach based on formula (4) and its analogue (5) exact (without limitations) has been calculated, the ranks of exact with approximation based on formula (5) have been compared. The accuracy (rank correlation) of calculations is analyzed with respect to the limitation on path length. The criteria for the considered number of permutations is , where , is an optimal number of permutations. Figure 2 provides rank correlations of exact and with different parameter L calculated with random permutations:
Figure 2. Kendall and Spearman rank correlations of and depending on L
Figure 2 shows that different maximal lengths of paths give different orders of nodes as compared to exact values without length limitation. Table 3 provides top-10 nodes for and for different L.
Table 3. Top-10 nodes by and with different L
Rank |
||||||||||||||||||||||
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
|||
1 |
9 |
8 |
9 |
10 |
9 |
9 |
9 |
9 |
9 |
9 |
8 |
9 |
9 |
10 |
9 |
9 |
9 |
9 |
9 |
9 |
9 |
|
2 |
7 |
5 |
6 |
6 |
5 |
7 |
6 |
7 |
7 |
7 |
7 |
7 |
5 |
7 |
7 |
5 |
6 |
5 |
7 |
7 |
7 |
|
3 |
20 |
20 |
19 |
19 |
19 |
20 |
20 |
20 |
20 |
20 |
20 |
20 |
20 |
20 |
20 |
20 |
20 |
20 |
20 |
20 |
20 |
|
4 |
18 |
18 |
20 |
20 |
20 |
19 |
19 |
19 |
18 |
18 |
18 |
18 |
18 |
18 |
18 |
18 |
18 |
18 |
18 |
18 |
18 |
|
5 |
17 |
15 |
15 |
15 |
15 |
15 |
15 |
16 |
16 |
16 |
16 |
17 |
17 |
16 |
16 |
16 |
16 |
16 |
17 |
17 |
17 |
|
6 |
15 |
16 |
16 |
16 |
16 |
16 |
16 |
15 |
15 |
15 |
15 |
15 |
15 |
15 |
15 |
15 |
15 |
15 |
15 |
15 |
15 |
|
7 |
14 |
13 |
11 |
9 |
10 |
11 |
10 |
12 |
12 |
14 |
17 |
14 |
14 |
14 |
13 |
13 |
14 |
14 |
14 |
14 |
14 |
|
8 |
4 |
10 |
8 |
8 |
8 |
8 |
7 |
6 |
5 |
6 |
14 |
5 |
4 |
4 |
6 |
4 |
5 |
6 |
4 |
4 |
4 |
|
9 |
16 |
17 |
17 |
18 |
17 |
17 |
17 |
17 |
17 |
17 |
6 |
16 |
16 |
17 |
17 |
17 |
17 |
17 |
16 |
16 |
16 |
|
10 |
19 |
19 |
18 |
17 |
18 |
18 |
18 |
18 |
19 |
19 |
19 |
19 |
19 |
19 |
19 |
19 |
19 |
19 |
19 |
19 |
19 |
It can be concluded that L does not influence the accuracy (i.e. high values of L do not guarantee exact results). Hence, parameter L can be limited by small value and get another interpretation for top nodes, for example, top nodes are pivotal in its vicinities of radius L. It should be noted that the same value of L for subgraphs of all sizes is fixed; hence, in small subgraphs all paths are considered.
The study also provides a representative analysis of the proposed method (see example below). The following criteria of accuracy of rank definition is used: , where . This example considers the maximal lengths of path at It also tracks fluctuations of correlation measures and ranks of nodes (Figures 4-5) depending on the considered number of permutations. Moreover, with classical centrality measures are compared (Tables 4 - 7).
Example Network 1:
Table 4. Graph characteristics
Number of nodes |
20 |
|
Number of edges |
100 |
|
Average degree |
5 |
|
Diameter |
4 |
|
Density |
0.263 |
|
Number of components |
1 |
Figure 3. Example Network 1:
the larger the degree of a node is, is the bigger the size of a node is.
Table 5. Top-5 nodes by Sv and classical centralities
Rank |
Nodes |
|||||
Degree |
Eigenvector |
Closeness |
Betweenness |
|||
1 |
20 |
20 |
20 |
20 |
20 |
|
2 |
12 |
12 |
12 |
12 |
12 |
|
3 |
14 |
14 |
14 |
18 |
18 |
|
4 |
18 |
18 |
15 |
19 |
19 |
|
5 |
15 |
2 |
2 |
2 |
2 |
Table 6. Kendall rank correlation
Degree |
Eigenvector |
Closeness |
Betweenness |
|||
0,379 |
0,400 |
0,379 |
0,284 |
|||
Degree |
0,453 |
0,432 |
0,547 |
|||
Eigenvector |
0,537 |
0,421 |
||||
Closeness |
0,547 |
|||||
Betweenness |
Table 7. Spearman rank correlation
Degree |
Eigenvector |
Closeness |
Betweenness |
|||
0,490 |
0,412 |
0,501 |
0,364 |
|||
Degree |
0,550 |
0,502 |
0,717 |
|||
Eigenvector |
0,660 |
0,528 |
||||
Closeness |
0,677 |
|||||
Betweenness |
Figure 4. Kendall and Spearman correlations traces
Figure 5. Nodes' ranks traces
The optimal number of permutation is The number of subgraphs is equal to . The algorithm considers approximately subgraphs.
3.1.2 Price for the Length of the Simple Paths Approach
The second approach is also based on all simple paths enumeration. But here the length of paths is considered in the following way
, (6)
· is a simple path between nodes ;
· is a set of all simple paths between nodes ;
· || is a function of the length of a path.
Definitely, instead of the inverse length of a path the price for the length can have another form of a decreasing function, for example, etc. In this work only function (7) is analyzed.
The most powerful nodes here represent major hubs. Such approach can be applied to the transportation systems where we want to minimize a path length but if one of the ties or a group of ties are “out of work” we are ready to pay for an alternative (more distant) route.
Let us consider Example 1. The results are as follows (computational calculation):
· .
Here node 1 improved its position in the system and node 4 worsened it. The reason is that node 4 is mainly achieved through long paths and doesn't provide short passages. Conversely, despite the fact that node 1 is unreachable it provides many short passages, so it contributes more in subgraphs than node 4.
Apparently, distant nodes add rather long paths than short paths, which leads to the fact that their contribution to subsets should be less than the contribution of central (hub) nodes in some cases. Hence, the inverse length of a path is considered to decrease the importance of long paths and increase the importance of short paths. Additionally, with a limitation on the length of a path added and with only short paths taken into consideration, the contribution of central nodes will be generally more significant. Moreover, such limitation permits us not to enumerate all paths between pairs of nodes, which is a computationally complex issue. So, the capacity function looks like
, (7)
where is a limitation on the length of a path (small value).
Moreover, another approximation method can be implemented here. This approach is based on a random choice of paths between pairs of nodes with the limitation on the length of a path. Then, the capacity function looks like
, (8)
where is some random subset of all possible paths between nodes x and y. The number of paths that are considered can be taken as some fraction of the estimation of the number of paths between two nodes in a subgraph, i.e. . For the upper bound of the number of simple paths between nodes i and j of the length less than in subgraph the number of all paths (including loops) between these two nodes is calculated as , where is an element of matrix , is an adjacency matrix of subgraph A. For small graphs (n ? 20) and maximal length of paths the number of simple paths between two nodes is insignificantly overestimated.
For the analysis of the approach based on formula (6) and its analogue (7) and (8) exact (with ) has been calculated, the ranks of exact with approximation based on formula (8) with have been compared. The accuracy (rank correlation) of calculations is analyzed with respect to the limitation on the number of paths. The criteria for the considered number of permutations is , where , is an optimal number of permutations. For subgraphs of size less than 10 nodes formula (7) is considered (without limitations on the number of paths). Figure 6 provides rank correlations of exact and with different parameter L calculated with random permutations (for one of the graphs).
Figure 6. Kendall and Spearman rank correlations of and depending on the number of considered paths
Figure 6 shows that the number of considered paths between two nodes in a subgraph gives different nodes' ordering as compared to exact values. Table 8 provides top-10 nodes for and for different fractions of the total number of paths between any two nodes.
Table 8. Top-10 nodes by and with different number of considered paths
Rank |
||||||||||||
10% |
20% |
30% |
40% |
50% |
60% |
70% |
80% |
90% |
100% |
|||
1 |
10 |
12 |
11 |
10 |
10 |
10 |
10 |
11 |
10 |
10 |
10 |
|
2 |
5 |
5 |
4 |
4 |
4 |
5 |
5 |
6 |
7 |
6 |
5 |
|
3 |
19 |
19 |
19 |
19 |
19 |
20 |
19 |
19 |
19 |
20 |
19 |
|
4 |
20 |
20 |
20 |
20 |
20 |
19 |
20 |
20 |
20 |
19 |
20 |
|
5 |
15 |
15 |
15 |
16 |
15 |
15 |
15 |
16 |
15 |
15 |
15 |
|
6 |
16 |
16 |
16 |
15 |
16 |
16 |
16 |
15 |
16 |
16 |
16 |
|
7 |
9 |
9 |
9 |
9 |
9 |
9 |
9 |
9 |
9 |
9 |
9 |
|
8 |
8 |
8 |
8 |
8 |
8 |
8 |
8 |
8 |
8 |
8 |
8 |
|
9 |
17 |
18 |
18 |
18 |
18 |
18 |
18 |
17 |
17 |
17 |
17 |
|
10 |
18 |
17 |
17 |
17 |
17 |
17 |
17 |
18 |
18 |
18 |
18 |
It can be concluded that the number of considered paths influences the accuracy even if all paths for small subgraphs are considered (i.e. exclusion of some paths can significantly change ranks of nodes). Experimental analysis shows that it is better not to rely on such approximation due to the fact that it sometimes gives inexact outcome.
As in the previous case this approach is appropriate only for small graphs. This function is applied to graphs with 20 nodes.
The criteria for the accuracy of calculation and maximal path length are similar to the one applied to the previous model. The graph and algorithm characteristics are as follows (information is given for one of the graphs):
Example Network 2:
Table 9. Graph characteristics
Number of nodes |
20 |
|
Number of edges |
107 |
|
Average degree |
5,35 |
|
Diameter |
3 |
|
Density |
0.282 |
|
Number of components |
1 |
Figure 7. Example Network 2:
the larger the degree of a node is the bigger the size of a node is.
Table 10. Top-5 nodes by Sv and classical centralities
Rank |
Nodes |
|||||
Degree |
Eigenvector |
Closeness |
Betweenness |
|||
1 |
20 |
20 |
20 |
20 |
20 |
|
2 |
16 |
16 |
16 |
8 |
8 |
|
3 |
15 |
8 |
15 |
16 |
16 |
|
4 |
8 |
15 |
8 |
9 |
11 |
|
5 |
11 |
11 |
11 |
15 |
15 |
Table 11. Kendall rank correlation
Degree |
Eigenvector |
Closeness |
Betweenness |
|||
0,663 |
0,663 |
0,432 |
0,379 |
|||
Degree |
0,474 |
0,453 |
0,358 |
|||
Eigenvector |
0,558 |
0,316 |
||||
Closeness |
0,484 |
|||||
Betweenness |
Table 12. Spearman rank correlation
Degree |
Eigenvector |
Closeness |
Betweenness |
|||
0,800 |
0,785 |
0,567 |
0,508 |
|||
Degree |
0,623 |
0,597 |
0,508 |
|||
Eigenvector |
0,713 |
0,441 |
||||
Closeness |
0,609 |
|||||
Betweenness |
Figure 8. Kendall and Spearman correlations traces
Figure 9. Nodes' ranks traces
The optimal number of permutation is The number of subgraphs is equal to . The algorithm considers approximately subgraphs.
3.1.3 The Number of the Edges Approach
The third approach is based on the number of edges in a subgraph or, in other words, the number of paths of length 1. In this case the capacity function looks like
, (8)
· is the indicator function of an edge between two nodes and :
The interpretation of the contribution of a node can be as follows: the bigger the value of a function is the more consolidated a node is. The difference between capacity functions with a node and without this node indicates the number of direct connections of this node. This approach is close to the degree centrality measure but the main difference is that considers different groups of nodes. This approach can be applied to friendship networks where nodes are people and edges are their relationships. High values of detects people that are communicative and active in different communities while the classical degree centrality detects only local central nodes.
For our small example nodes 1, 2, 3, 5 have the same and consequently they have one rank and node 4 has a lower rank.
Due to the fact that this function is not as complicated for the calculation as the other two mentioned above, this function can be applied to graphs of middle size (in this work graphs up to 100 nodes are considered). In total 60 exponentially distributed networks with different number of nodes have been analyzed. Hence, the complexity of calculations (i.e. the number of required permutations) can be estimated with respect to some graph characteristics. It seems reasonable to study the dependence of the optimal number of permutations on a network density: density includes information about both the number of nodes and the number of edges; a separation of these characteristics is unrepresentative. The criteria for the optimal number of permutations is represented in Table 13:
Table 13. Criteria for
Number of nodes, N |
Criteria |
Level of Reliability, LR |
|
The dependence of the optimal number of permutations on a network density is represented in Figure 10:
Figure 10. The dependence of the optimal number of permutations on a network density
As is seen from Figure 10 the higher the density of exponentially distributed graph is the less number of permutations is required for exact ranks reconstruction. Generally, an average density factor means dissimilarity of nodes characteristics, which leads to the faster rank specification. Contrary, when density is small many nodes have the similar structure and it is harder to define exact ranks for them.
As for previous functions an illustrative analysis of the 3rd function is provided.
The criteria for the accuracy of calculation is where is an optimal number of permutations. The graph and algorithm characteristics are as follows (information is given for one of the graphs):
Example Network 3:
Table 14. Graph characteristics
Number of nodes |
30 |
|
Number of edges |
149 |
|
Average degree |
4,97 |
|
Diameter |
4 |
|
Density |
0.171 |
|
Number of components |
1 |
<... |
Подобные документы
A theoretic analysis of market’s main rules. Simple Supply and Demand curves. Demand curve shifts, supply curve shifts. The problem of the ratio between supply and demand. Subsidy as a way to solve it. Effects of being away from the Equilibrium Point.
курсовая работа [56,3 K], добавлен 31.07.2013The profit function possesses several important properties that follow directly from its definition. These properties are very useful for analyzing profit-maximizing behavior. Outlining the properties of the profit function important to recognize.
анализ книги [15,2 K], добавлен 19.01.2009General characteristic of the LLC DTEK Zuevskaya TPP and its main function. The history of appearance and development of the company. Characteristics of the organizational management structure. Analysis of financial and economic performance indicators.
отчет по практике [4,2 M], добавлен 22.05.2015Establishing a favorable environment for investments, removing administrative barriers. Establishing high-technology parks. Formation of financial mechanisms to attract and support investments, tax stimulation measures. Brand promotion of Russian regions.
реферат [15,9 K], добавлен 04.06.2013A variety of economy of Kazakhstan, introduction of the international technical, financial, business standards, the introduction to the WTO. The measures planned in the new Tax code. Corporation surtax. Surtax reform. Economic growth and development.
реферат [27,2 K], добавлен 26.02.2012Identifing demographic characteristics of consumers shopping in supermarkets. Determine the factors influencing consumer’s way of shopping and the level of their satisfaction (prices, quality, services offered, etc in supermarkets and bazaars).
доклад [54,4 K], добавлен 05.05.2009The definition of term "economic security of enterprise" and characteristic of it functional components: technical and technological, intellectual and human resources component, information, financial, environmental, political and legal component.
презентация [511,3 K], добавлен 09.03.2014Organizational structure of "Samruk-Kazyna" JSC. Formation of financial resources of the Fund. Mining and power assets directorate. The characteristic stages of the process of registration of new legal entities. Cash flow from the operating activity has.
отчет по практике [2,6 M], добавлен 02.02.2015Models and concepts of stabilization policy aimed at reducing the severity of economic fluctuations in the short run. Phases of the business cycle. The main function of the stabilization policy. Deviation in the system of long-term market equilibrium.
статья [883,7 K], добавлен 19.09.2017Entrepreneurial risk: the origins and essence. The classification of business risk. Economic characteristic of entrepreneurial risks an example of joint-stock company "Kazakhtelecom". The basic ways of the risks reduction. Methods for reducing the risks.
курсовая работа [374,8 K], добавлен 07.05.2013Solving the problem of non-stationary time series. Estimating nominal exchange rate volatility ruble/dollar by using autoregressive model with distributed lags. Constructing regressions. Determination of causality between aggregate export and volatility.
курсовая работа [517,2 K], добавлен 03.09.2016The experiments related to alcohol and economic decision-making. First study attempting to test 3 sets of embedded hypotheses regarding how alcohol influences our choices. Conducting games, showing the effects of alcohol on the decision-making process.
статья [268,5 K], добавлен 04.11.2015Economic entity, the conditions of formation and functioning of the labor market as a system of social relations, the hiring and use of workers in the field of social production. Study of employment and unemployment in the labor market in Ukraine.
реферат [20,3 K], добавлен 09.05.2011Stereotypes that influence on economic relations between the European Union countries and Russia. Consequences of influence of stereotypes on economic relations between EU and Russia. Results of first attempts solving problem. General conclusion.
реферат [19,0 K], добавлен 19.11.2007Study of the basic grammatical categories of number, case and gender in modern English language with the use of a field approach. Practical analysis of grammatical categories of the English language on the example of materials of business discourse.
магистерская работа [273,3 K], добавлен 06.12.2015Применение глаголов в Present Simple, Past Simple Tense и Future Simple Tense. Образование повествовательных и вопросительных предложений. Формы настоящего времени глагола to do. Редуцированные (сокращённые) формы вспомогательных глаголов с частицей not.
контрольная работа [16,7 K], добавлен 16.06.2010The discovery of nouns. Introduction. Classification of nouns in English. Nouns and pronouns. Semantic vs. grammatical number. Number in specific languages. Obligatoriness of number marking. Number agreement. Types of number.
курсовая работа [31,2 K], добавлен 21.01.2008Investigation of the problem with non-local conditions on the characteristic and on the line of degeneracy . The solution of the modied Cauchy problem with initial data. The solution of singular integral equations. Calculation of the inner integral.
статья [469,4 K], добавлен 15.06.2015Context approach in teaching English language in Senior grades. Definition, characteristics and components of metod. Strategies and principles of context approach. The practical implementation of Context approach in teaching writing in senior grades.
дипломная работа [574,3 K], добавлен 06.06.2016The problem of category of number of nouns, Russian and English grammatical, syntactical and phonetic forms of expression. The general quantitative characteristics of words constitute the lexico-grammatical base for dividing the nounal vocabulary.
контрольная работа [40,6 K], добавлен 25.01.2011