Just read two really cool books about recent scientific discoveries about the behavior of networks:
* Nexus, by Mark Buchanan, former editor of Nature magazine
* Linked, by Albert-Laslo Barabasi, one of the scientists whose team made some of the key discoveries
It’s a small world after all
There are many versions of the party game. The website, Six Degrees of Kevin Bacon, looks up the number of links connecting an arbitrary celebrity with Kevin Bacon. Will Smith was in Independence Day (1996) with Harry Connick, Jr., who was in My Dog Skip (2000) with Kevin Bacon. In another version of the party game, mathematicians boast of their “Erdos number” — how many close are they to a person who’s published a paper with Paul Erdos, the prolific and eccentric Hungarian mathematician. A variant called “Jewish geography” connects people via links through summer camps, social clubs and synagogues.
The intuitive insight that communities are “small worlds” has been quantified. Just in the last four years, scientists have developed models to describe the properties and behavior of “small-worlds” networks.
Networks can be characterized by several parameters:
* the level of clustering — how connected a given node is to nearby nodes. For example, social networks are highly clustered — one’s friends are likely to know each other
* the degree of separation, also called the diameter — how many links it takes on average to get from one node to another
* the level of hierarchy — how similar is the level of connectivity among different nodes. Do most nodes have about the same level of connection, or are some nodes much more connected than others?
A network can become “a small world” in one of two ways:
* a small number of long-distance connections. If you take a network where most connections are local, and add just a few long-distance connections, the network quickly “links up”, making it possible to traverse vast distances in just a few hops. For example, a coffee trader in Guatemala provides a link connecting a rural coffee growing family to an urban latte-sipper in just a few steps. Research by Duncan Watts and Steven Strogatz, published in 1998, modeled the role of long-distance connections in creating the “small world” effect.
* a small number of big hubs. On the worldwide web, the Yahoo news portal has lots of links to local news sites, making it easy to find local news in many of the world’s languages in just a few clicks. This type of network, in which a few members of a set have most links, and many members have few links, are called “scale free networks”, and can be described by a power law plotting the distribution of links among nodes. Research by Barabasi and his team, published in 1999 and more recently, modeled this pattern and found evidence of it in a variety of domains.
The “small worlds” patterns create networks that are highly resilient, yet vulnerable to certain kinds of failure.
* small worlds networks are invulnerable to random damage — if you randomly remove nodes from the internet, or species from an ecosystem, the system will continue to operate with little disturbance
* small worlds networks are vulnerable to attacks on connectors or hubs — if you take down a number of key internet hubs, or remove just a few linchpin species in ecosystem, the connections in the system will break down.
With the “small worlds” model in hand, scientists foraged for data sets and mapped the workings of small-worlds networks in a wide variety of domains:
* the web, which can be traversed with a few hyperlinks
* the internet — which can be crossed in a few hops
* electric power networks
* social networks
* ecosystems, in which a few “hub” species are predators or prey for many others.
* biochemistry — in which a few key chemicals catalyze many reactions.
* group behavior — in which fireflies start blinking in unison, and theater-goers unconsiously synchronize their applause
Relationship to other aspects of complexity theory
One of the fun things about reading the books is drawing relationships between “small worlds networks” and other aspects of complex, emergent systems, although these links are not well-developed in the books themselves.
Stuart Kauffman, a theoretical biologist, has developed a set of models to explain the natural emergence of order in open thermodyamic systems. According to Kauffman’s models, explained in his book, “At Home in the Universe”,
* self-replication is likely to emerge from a set of sufficiently diverse chemicals in high concentration
* genes code for a relatively small number of types of cells because of the network parameters of gene expression (Kauffman theorizes that it is the low coupling parameter that makes the state space of gene expression much lower than one might expect).
* the evolution of species can be modeled by adaptive walks across “fitness landscapes”, in which organisms with better-adapted traits outcompete others, and produce descendants with the opportunity to become even more fit. Key parameters of the model include the level of randomness within the fitness landscape (in a random landscape, a small change in an organism would cause a big change in fitness; in a non-random landscape, a small change in an organism would probably cause a small change in fitness); the level of coupling among genes in an organism (this models conflicting constraints — e.g. a gene that protects against malaria also increases vulnerability to blood disease); and the level of coupling among species in the landscape. In these models, extinctions follow a “power law” distribution, with frequent extinctions of small numbers of species, and infrequent catastrophes wiping out many species at once.
Kaufman’s theories about the emergence of organization and the mechanisms of evolution are fascinating and appealing. But in the absence of any but the sketchiest of empirical evidence, his work is vulnerable to criticism that it’s computer art — the properties of his models could just be artifacts of the parameters plugged into the models.
The empirical data analyzed by Barabasi’s team about chemical reaction networks and connection patterns in ecosystems seem like early evidence that nature works in the ways that Kauffman describes. Networks such as ecosystems and the world wide web have a small number of key nodes with many connections, and a great many nodes with fewer connections. According to the network model, this will lead to an evolutionary pattern with many small extinctions of non-hub species, and some mass disasters when key species are taken eliminated.
The Watts and Barabasi research suggests some alternate ways to configure Kaufman’s model, creating similar results with data that fit more closely with empirical evidence.
* Kauffman’s model accounts for long evolutionary jumps — the probability that a small change in an organism results in a large change in fitness — by tuning the “randomness” of the fitness landscape. Watts’ use a seemingly simpler to achieve similar results, by adding just a few nodes with long-distance coupbling behavior.
* Kaufman’s model tunes the average level of coupling up and down, reaching realistic behavior at a particular range of parameters. Barabasi’s model observes that level of coupling in a network varies by power law, and this distribution predicts the observed behavior.
Much more evidence is needed to confirm or disprove Kaufman’s theories, and to refine the models, in networks of gene expression; ecological networks, and evolution. The ongoing research and analysis models seems like it is on the right track to find these things out.
One of the key insights of Barabasi’s team is that a “scale-free network” can be created by a simple growth pattern — if new nodes add links with slight preference for popular nodes, the hierarchical pattern will emerge. It would be interesting to see future research that looked in more detail at models of evolution and growth.
In particular, the Watts and Barabasi models focus on patterns of network wiring — the number and distance of linkes. There are additional interesting questions about what this network architecture means with respect to the level of influence between nodes. What is the relationship between the architecture of the network and the way the network is used to transmit information?
That’s one of the most exciting things about studying this topic — the work is not near done.
The unfinished nature of the field shows up in some logical gaps in the books.
Both books explain how network growth patterns enable the rich to get richer, but that does not seem to me to be the most interesting part of the story. It is true that wealthy investors make more money, and really big sites like Yahoo and Amazon acquire the most links.
But the Pareto principle doesn’t explain whether and how the poor get rich. Google comes from nowhere, provides a better search engine, and rapidly emerges as the leading search site. And the Pareto principle doesn’t talk about impact of providing “small-worlds” connectivity to the remote and obscure. The interesting thing about the web is not that Yahoo is popular – it’s that a quick keyword search will find sites on medieval theologians and African cooking, and a couple of clicks on Yahoo News links will get you local media in Farsi.
Also, neither book has a strong discussion about limits to network growth, or differentiates between hub systems with obvious physical limits, like airports, and with few physical limits, like the information space of the web.
Comparison and contrast
The books are eerily similar, as if one of of the writers was looking over the other one’s shoulder as he wrote. The similarity in substance is not that surpising — after all, the books explain the same papers by the same set of scientists over a few year period of time. What is odder is that the books contain many of the same anecdotes — tales of Erdos, the eccentric Hungarian mathematician; the inspiration of Duncan Watts by synchronized fireflies, the creation of the Oracle of Kevin Bacon. Both books very similar sections on the internet and network economy, with a similar sweeping generalizations about impending change, and similar lack of substance.
Buchanan is a professional writer, and the book is a little better written. His magazine instincts show — each chapter is nicely structured, starting with anecdotes about people, and uncovering some new theme. The book does a decent job with transitions — it reads like a book rather than a collection of articles. Buchanan has a PhD in physics — he’s read the primary sources, he understands the math, he enjoys the subject and he doesn’t pander to the audience.
Barabasi is a participant, not a bystander — the unique strength of the book lies in the first-hand stories of his team and their discoveries. Barabasi is proud of his achievements; he makes it very clear that the topic was not properly understood until his team started their work. A typical sentence along these lines: “Uncovering and explaining these laws has been a fascinating roller coaster ride during which we have learned more about our complex, interconnected world than was known in the last hundred years.” This is not the place to look for humility.
Both books were definitely worth reading, with clear explanations, great references to the sources, and a lot of food for thought. It is quite a thrill to read about these developments as they are happening.
Computational Beauty of Nature, by Gary Flake, is a very nicely written walk through topics related to chaos and complexity, including fractals, chaos, artificial life, adaptive systems, and neural networks. This is THE one book for folks who want to dive into these topics one level deeper than the popular science books. Each chapter has references to the primary source books and articles, if you want to pursue the topics in greater depth.
The book’s website has a set of Java applets and C programs to run the simulations — for example, you can play with the parameters of L-System fractals to simulate different kinds of plant shapes. The source code is available to download and play with.
Flake does a lovely job of explaining the math and modeling concepts, in a manner that is comprehensible to those of us without extensive math backgrounds. Sometimes his one-page intros go a bit fast for me, but it’s easy enough to hit Google, find a relevant tutorial, then go back and finish the chapter. I needed to do this for the sections on matrix math and circuit design — this is a very pleasurable way to learn.