Nigel Clay's Blog: MISG 2016

I spent the week of 1-5 February at the University of South Australia where I attended MISG 2016. This is a conference where representatives of industry present some thorny problems they are working on and we maths dudes and dudettes try to help them. There were several industry problems being worked on and I joined a group looking at inference on a knowledge base. I won't go into too much detail about what the industry client does (I'm sure you can work it out) but I will offer a tantalizing clue in that the introductory presentation was headed "Unclassified." The problem was pretty interesting though and could be summarized with two questions:

Entity resolution: How do we disambiguate a knowledge base?
Activity based intelligence: What conclusions can be drawn from a noisy knowledge base?

Given the problem domain it makes perfect sense that we were not given real information. Instead, we were given some manufactured GPS tracking data which some of us used and some of us didn't. I opted not to use the supplied data and instead used a publicly available data set of the network behind the Madrid Train Bombing.

The Madrid Train Bombing Network

My approach was graph based. Mainly because where you have a graph you can also represent it as a matrix, and where you have a matrix, you can use linear algebra to say some useful things about it. The graph is shown below. The vertices have been numbered as names would cause too much clutter. Also for my purposes I was looking at the structure of the graph and names do not assist with that.

The Madrid Train Bombing Network

So, as I said, if you have a graph you have a matrix. One of the useful things you can do with a matrix is a similarity transform. For those that are interested in a refresher on what this is, consider a matrix \(\mathbf{A}\). Find two matrices \(\tilde{\mathbf{A}}\) and \(\mathbf{P}\) such that: \[\mathbf{P}\mathbf{\tilde{A}P^{-1}}=\mathbf{A}\]
Some of you may know this as matrix factorisation, diagonalisation or eigenvector decomposition. The thing is, this transform tells you some interesting stuff. For example, if we use only the largest eigenvalue/eigenvector pair we can isolate the largest community structure in this graph:

The largest community structure in the Madrid Train bombing network

Adding Noise

Now, I wanted to know how much noise could be added to the edge weights on the graph before the largest structure was no longer able to be isolated. It turns out, for this graph at least, that the structure is fairly robust. Yes, you can swamp it with noise and make it unusable, but if the edge weights varied by \(\pm 40 \%\) I could still recover most of the useful structure with a few extra vertices added and at \(\pm 20\%\) I could recover it perfectly. There are still other things to check though, and I didn't have the time to do all of them: what out deeper structures? will it hold for other networks? how do the edge thresholds I used change the result?

Backing Noise into a Corner

If you sort the eigenvalue/eigenvector pairs from largest to smallest the similarity transform causes structure to bubble up and noise to be concentrated in the smaller eigenvalue/eigenvector pairs. It is then possible to consider the structure that manifests using the k-smallest pairs. I was interested in the order that vertices "leave" the graph. In the Madrid Train Bombing leaving out the two largest eigenvector/eigenvalue pairs causes six vertices to detach. Think about what this is saying for a bit. These guys participate in the larger structures in the graph but not the smaller structures. They're probably very important.

Remove the two largest eigenvalue/eigenvector pairs

We can continue to remove information. Keeping track of the order in which vertices detach from the graph suggests a way to prioritize any consequent actions. When 5 pairs were removed I got the following:

Removing the five largest eigenvalue/eigenvector pairs

Continuing to do this eventually the causes the graph to fall apart. The important structure is captured by the largest pairs while the smallest contain noise and unimportant structure. Effectively, the noise is backed into a corner. Or, put another way, the information in a graph can be approximated pretty well with fewer pairs than there are vertices.


Removing the 15 largest eigenvalue/eigenvector pairs

So, it is possible to deal to the noise that is present in the edge weights and it is possible to isolate vertices that are important to the structure and can use this to prioritise. But the real world is not that co-operative. In any graph of a network of individuals you can only record relationships (edges) between vertices if you actually observe them. What about those that you can't see?

Revealing what can't be seen.

Remember, I represent the graph as a matrix. If a relationship between two vertices is observed a weight is associated with the edge between them that, in a noisy and numerical sense, describes the importance of that relationship. This value appears in the matrix where the two vertices intersect. If no relationship is observed then the edge has a weight of 0 and this is also recorded in the matrix at the appropriate location. But there's structure in this particular graph - that has been shown. Does structure imply the existence of edges that were not observed? I can demonstrate this with a digital image. A digital image is nothing but a matrix where the cell values correspond to colour or to a greyscale. In the image below a (very) large proportion of the information has been removed randomly.

An image (matrix) with missing information

If you look at it you might see some areas of light and others that are dark but not much more than that. If you've got this far you might remember (its a theme I've been pushing) that the similarity transform concentrates information in the larger eigenvalue/eigenvector pairs. However you cannot do a similarity transform on a matrix with missing information. That is, unless you try to "learn" the eigenvalue/eigenvector pairs numerically. That's exactly what I did to the image above to obtain the image below.

The image (matrix) reconstructed from only those values (structure) which could be observed.

You now have a pretty good idea of what the image is. For comparison here's the original image.

The original image (matrix)

But can we do this to a graph? I removed an edge in the graph of the Madrid Train Bombing. It happens that this particular edge participates in the largest community structure and has an edgeweight of 1. By removing it we're saying we didn't observe that edge.

Running the numerical approximation to the similarity transform while treating the edge as 'missing' gave the estimate of the edge weight as 0.9812. Now, if an organisation were investigating potential relationships, the approximations could be ranked largest to smallest and used to prioritize any consequent activity. Of course, one edge does not prove anything. So I iterated though all of the edges, removing them one at a time, and testing if the edge could be put back. The result showed that important edges could be recovered and while unimportant edges were not.

Conclusions

In some graphs noise is present in the weights on the edges and in the structure of the graph itself. In concept, and provided there's not too much noise, it is possible to extract important structures in noisy graphs. It is also possible to filter out noise (or push into a corner) by using the similarity transform of the underlying matrix and this can be used to isolate important vertices using the order they detach to prioritize actions. But noise also manifests by the absence of edges. Provided there is structure in the matrix of a graph it is possible to use that structure to approximate the weights of unobserved edges. Moreover, using this estimate it is also possible to prioritize any consequent activity.

Outstanding Questions

In reality it would be unlikely that we would be presented something that looks as 'nice' as the example shown. The graph would be embedded deep within a larger structure of unimportant information. So an outstanding question is how can you isolate the important structure of a relatively small number of individuals in a vastly larger graph? This is like adding hundreds, thousands, or tens of thousands of additional vertices. Bystanders, that happened to be in the same coffee shop or stopped to tie their shoe in the wrong location. There may, or may not, be some basis by which edges in this larger graph form. Random attachment is one way, preferential attachment is another. Ramsey theory tells us that any sufficiently large set of elements will guarantee the existence of some property of interest. That, and apophenia (or, specifically in the case of visual phenomena, pareidolia), is why we see patterns in the stars, faces on Mars, Jesus on toast and subsribe to some conspiracy theories. So a sufficiently large graph would contain a lot of red herrings to sort though.

Lastly, it would be interesting to know how well the approaches I described work on other graphs. Does this just 'happen' to work on the graph I chose?

Nigel Clay's Blog

Saturday, 6 February 2016

MISG 2016

The Madrid Train Bombing Network

Adding Noise

Backing Noise into a Corner

Revealing what can't be seen.

Conclusions

Outstanding Questions

No comments:

Post a Comment