On (De-)Centralized Communications: Part 2

On (De-)Centralized Communications: Part 1 discussed Baran’s terminology of “centralized”, “decentralized” and “distributed” networks and how it relates to how the Internet and the networks that it’s composed of are typically arranged.

The post concluded that both “decentralized” and “distributed” are valid descriptions, depending on whether you do or don’t take end nodes into account.

#Introduction

There’s reason I ended up colouring the diagram as I did, and it has to do with maths.

Figure: Networks with end, edge and core nodes

I wanted to lead up to graph theory – gently, that is. This isn’t going to be a particularly deep discussion, I’m just going to focus on notions of centrality.

The distinction in mathematics between a graph and a network is not particularly strong, depending on how you squint at it. A network is a graph, in which edges (links) and vertices (nodes) may have attributes.

Consider for example that every link in our little diagram here had a different line speed at which we can transmit data. That would be an edge attribute, and could determine the fastest path between any two nodes.

We’ll not go into that here, so I’ll use the terms interchangeably.

At any rate, there are ways to determine the centrality of both individual nodes, and entire network structures. This is what we’re going to do in this blog post.

#Centrality of Nodes

It makes sense to start with Baran’s “centralized” network. In graph theory that would be a star network. The point is, if you wanted to traverse the graph (i.e. communicate) from one of the outer nodes to another, you’d have to traverse the central node.

Figure: A centralized network or star graph

We can intuitively grasp that this central node must have a higher centrality than the other nodes.

For simplicity’s sake, let’s look at a few different types of networks or graphs. These are artificial examples, but you will find something of this shape also in real networks.

First, we’ll simplify the star graph to just five nodes. That way, we can compare it to other five node graphs. As in the above, more complex example, we can intuitively grasp that the central node must have higher centrality than the outer ones.

If we arrange the same number of nodes in a different way, in this case in a ring structure, it intuitively seems less centralized.

This is because to go from each node to another, there are two paths in either direction around the ring. If only one path is interrupted, the other still remains viable.

In addition, we can note that all nodes in this ring graph have the same centrality – or, if we turn this around, no node has a significantly higher centrality as would be the case in the star graph.

Figure: A 5 node graph forming a ring structure

But it still is possible to divide the network by losing two nodes. A fully connected graph has all nodes connected to all other nodes. Here, losing any number of nodes only affects communication with those nodes.

Intuition tells us that this an even less centralized graph, because each node’s centrality is even lower. There are always going to be paths from one node to another, even if all other nodes have been lost.

Figure: A 5 node graph with all nodes connected to each other

We can deduce that the star graph is the most centralized graph, while the fully connected is the least centralized. This also is compatible with Baran’s terminology of centralized and distributed; the ring graph would be another example of a decentralized, but not distributed network.

Graph theory, at least to the level we’re going to discuss this here, deals with notions of node centrality, i.e. determining how central a node is to the network. In the star graph structure, we can easily identify one node with high centrality, while the others seem to have equal, but low centrality.

In the other two graphs, the centrality of all nodes is the same. And yet, the fully connected graph seems less centralized.

Let’s first explore node centrality via three real-world examples. Note that

Consistency and differences between centrality measures across distinct classes of networks¹ lists at least 17 types, but there are even more! A single post cannot possibly dive into all of that, so we’ll have to make do with showing some of the more interesting cases.

#Centrality Functions

Much of the discussion will focus on centrality functions, usually named C(). In graph theory, it’s common to refer to network nodes as vertices, which leads to v as the symbol used for any given vertex, and C(v) as the centrality of that vertex.

There exists a common problem that a given centrality function is defined in a way such that the value is dependend on the number of nodes in the network. That makes it difficult to compare one network to another.

For that reason, there is often also a method for normalizing the function value to a standard range of (0, 1), where the value 1 is for the most centralized node(s).

#Degree Centralization

One of the first approaches to measuring node centrality is to simply base it on the node’s degree. The degree is simply the number of links that each node is connected by.

In the star graph, it becomes clear that the central node has a degree of N - 1, where N is the total number of nodes in the graph. That is because it is connected to every node but itself. Meanwhile all other nodes have a degree of 1.

By comparison in the ring graph, all nodes have degree 2. That is lower than the central node in the star graph, but also higher than that graph’s outer nodes.

It seems like a useful centrality measure, until you reach the fully connected graph. Here, all nodes have degree N - 1, just like the central node in the star graph. That seems wrong somehow.

Degree centralization isn’t useless because of this, but it clearly shows the limit of such an approach. We’ll skip discussing normalization here, because it makes more sense a little later on.

#Closeness Centralization

Another approach that is more complex to calculate is based on a notion of closeness. Closeness is the reciprocal of distance, and node distance is relatively easily determined. For any other node u != v, you just take the length of the shortest path between them. This is given by the function d(u, v), so closeness is given by 1 / d(u, v).

So far, so good. In order to determine the closeness value for v in the entire network, you calculate the reciprocal of the distances to all possible u in the graph:

1 / sum(u, d(u, v))

Because the divisor grows with the number of nodes in the network, the overall value shrinks. Normalization is done by multiplying the value by the number of possible u nodes, which is N - 1, so:

C(v) = (N - 1) / sum(u, d(u, v))

In the star graph, the central node is one hop away from all other nodes. That means the sum of its distances is N - 1, because it connects to N - 1 other nodes with a distance of 1 each. As a result, its C(v) is normalized to 1, as we’d expect.

The outer nodes are a little more complicated. They’re connected to the central node by distance 1, and to the other nodes by distance 2, via that central node. Of those other nodes, there are N - 2, because we’re already starting from on of the outer nodes, and can’t count the central node here. So the sum of the distances is 1 + 2 * (N - 2).

For our five node example, that would be 7. Normalized, it’d be 4 / 7, so around 0.57.

We do not have to calculate the closeness in the ring graph (to keep things simple here). We can see a number of things by just looking at the figure, namely that the closeness is not as simple here as in the star graph. The closeness of nodes depends on how close to the opposite sides of the ring they are positioned.

We can fairly easily see that nodes positioned on opposite sides of the circle have the maximum distance to each other, and neighbours the minimum distance. All nodes in between will have a distance a little higher or lower than those extremes, averaging out to a position right between them. That would be at about a quarter circle.

Intuition so tells us that the average d(u, v) is about (N - 1) / 4², and so their sum (N - 1) * (N - 1) / 4. The reciprocal, multiplied by N - 1 to normalize it, remains 4 / (N - 1)³.

The thing to note here is that the normalized value will shrink with the number of nodes in the graph. That’s what normalization is supposed to prevent!

But it makes some kind of intuitive sense for notions of closeness, because closeness of a node relative to all others does diminish if the path between them grows. In a ring graph, there is an unavoidable relationship between distance and number of nodes.

With respect to the fully connected graph, everything is at distance 1 to each other, so the sum of all distances a node can have to the others is N - 1. The reciprocal, normalized, is (N - 1) / (N - 1), which is 1. That would be the maximum theoretical value.

Comparing the above values, the closeness measure makes sense in itself: being close to many nodes does give a node higher centrality.

But is it the kind of centrality we’re looking for? I think not: what we want to know, after all, is if the loss of a node would impact communications in the network. It seems as if closeness centrality focuses on the wrong thing if nodes in a fully connected graph yield higher values than those in less connected graphs.

#Betweenness Centralization

It seems likely that betweenness centrality is a better measure of what we’re after. It yields higher values if a node is more “between” other nodes. And how does one calculate “betweenness”?

Like closeness centrality, the measure relies on a shortest path calculation. But instead of considering the shortest path between v and each of the other nodes, you pick sets of two other nodes s and t, and check if the node v lies on that path, so “between” the end points.

If D_st is the number of all shortest paths between nodes s and t, then D_st(v) is the subset that crosses node v. The “betweenness” of v relative to s and t then is simply D_st(v) / D_st, i.e. the percentage of the shortest paths crossing v.

Now – you guessed it – you do this for all possible s != t != v, and sum up the values. Normalization occurs by dividing the value by (N - 1) * (N - 2) / 2, because in an undirected graph that is the number of possible and relevant pairs not including v.

Applied to the star graph, outer nodes get a measure of 0 – no pairs will pass through them. The center node, on the other hand, lies on the shortest path of all node pairs, so will reach the maximum value of 1.

In the case of the ring graph, most node pairs will have one shortest path between them (only when there is an even number of nodes, and we select nodes on exactly opposing sides of the ring are there two options). Whether the shortest path passes through v depends on whether s and t are at most one quarter circle distant from v each, which is the case for roughly half of the node pairs.

So with one half of the node pairs we’ll get a “betweenness” of 1, with the other half 0, summed up and divided by the number of node pairs – we don’t have to put numbers into the formula to know it’s got to be 0.5.

In the fully connected graph, however, since all nodes are connected to each other, there is no shortest path other than the one directly from s to t. In other words, no shortest path can pass through v, and v’s “betweenness” is always going to be 0, for any v.

Now that seems like a centrality notion that reflects our idea of relevance to communications, huh?

#Network Centrality

Armed with this knowledge, surely we can now come up with a formula that describes how centralized the entire network is!

It turns out that is actually pretty hard…

For degree centrality, such a formula actually exists. It sums up the differences between the most central node’s degree, and the degrees of the outer nodes, and normalizes the value. This results in the star graph accurately being represented by the highest normalized value of 1.

However, in both the ring graphs and the fully connected graph, all nodes have the same centrality value, whichever measure one applies. That means their differences is always 0, and so their sum, etc. While it makes sense for the fully connected graph to have an overall centrality of 0, we’d also like to see a difference between the ring and fully connected graph, which this does not yield.

This is because some measures of centrality have more of a local than a global impact in the graph. This is well illustrated in our diagram above: any of the edge nodes have significant impact for the end nodes in their neighbourhood, but very little in other parts of the network.

A simple thing we could do, of course, is to calculate the arithmetic mean of all C(v) of all nodes. It does yield the desired values, and for the purposes of this blog post, perhaps that is enough!

If you’re interested enough, Exploiting graphlet decomposition to explain the structure of complex networks: the GHuST framework⁴ proposes a framework based on 12 metrics that supposedly yields values by which to compare different types of networks that takes those differences into account. It does so by decomposing the graph into so-called “graphlets”, i.e. small sub-graphs of specific shapes that can be counted and examined to derive information about the rest of the graph.

#Summary

What I hope this excursion into graph theory has shown is that intuitive notions of network centrality are actually rather difficult to express in sufficiently defined ways that one can have a neutral evaluation of various network structures, let alone compare those evaluations.

In part, this is because “centrality” is itself not a clearly defined concept. Instead, there are many competing concepts which yield useful information, but perhaps not the information we’re looking for in this discussion.

It should also have shown that some of those centrality concepts such as “betweenness” might get close. It certainly is the kind of thing I have in mind when I enter these kinds of conversations. And it seems to be what Baran was referring to when he contrasted “centralized” to “decentralized” and “distributed” networks.

The key point of “betweenness” is that nodes that sit between many other nodes make a network less resilient to failure. That is the principle upon which Baran argued for “distributed” networks.

But aside from different notions of what “centrality” even means, there is yet another dimension to this discussion that we’ll explore in the next part.

Stuart Oldham, Ben Fulcher, Linden Parkes, Aurina Arnatkevic̆iūtė, Chao Suo and Alex Fornito. 2019. Consistency and differences between centrality measures across distinct classes of networks. PLOS ONE 14, 7, July 2019, e0220061.
https://doi.org/10.1371/journal.pone.0220061
See also: References ↩︎
This doesn’t work well for smaller graphs, so if you’ll apply it to the five node ring graph, it’ll produce a rather different number from a real distance calculation. But the more nodes you add, the closer it gets. ↩︎
Here you can clearly see that unless N - 1 > 4, the calculation isn’t even normalizable in the described fashion! ↩︎
Rafael Espejo, Guillermo Mestre, Fernando Postigo, Sara Lumbreras, Andres Ramos, Tao Huang and Ettore Bompard. 2020. Exploiting graphlet decomposition to explain the structure of complex networks: the GHuST framework. Scientific Reports 10, 1, July 2020.
https://doi.org/10.1038/s41598-020-69795-1
See also: References ↩︎