One of the things I have become interested in recently is the use of graph theory in drug discovery. I had taken a class in graph theory during my sophomore year and while I have forgotten some of the important things from that class, I am revising that information again from the excellent textbook that was then recommended- Alan Tucker's "Applied Combinatorics" which covers both graph theory and combinatorics.
The reason why graph theory has become exciting in drug discovery in recent times is because of the rise of the paradigm of 'systems biology'. Now when they hear this term many purists usually cringe at what they see as simply a fancy name given to an extension of well-known concepts. However, labeling a framework does not reduce its utility. The approach should be better named 'network biology' in this context. The reason why graph theory is becoming tantalizingly interesting is because of the large networks of interactions between proteins, genes, and drugs and their targets that have been unearthed in the last few years. These networks could be viewed as abstract graph theoretical networks possibly utilizing the concepts of graph theory and predictions based on the properties of these graphs could possibly help us to understand and predict. This kind of 'meta' thinking which previously was not much possible because of the lack of data can unearth interesting insights that may be missed by looking at molecular interactions alone.
The great power and beauty of mathematics has always been to employ a few simple principles and equations that can explain many diverse general phenomenon. Thus, a graph is any collection of vertices or nodes (which can represent molecules, proteins, actors, internet web pages, bacteria, social agents etc.) connected by edges (which represent interactions between the vertices). In the field of network analysis this has been manifested in a singularly interesting observation; the observation that many diverse networks, from protein-protein networks to the internet to academic citation networks, are scale-free. Scale free networks demonstrate a topology which as the name indicates is independent of the scale. From a mathematical standpoint the defining quality of scale free networks is that they follow a power law. That is, the probability of any node having k connections goes as k to some power -γ, where γ is usually a number between 2 and 3.
Thus, P(k) ~ k^-γ
The scale-free property has been observed for a remarkable number of networks, from the internet to protein-protein interactions. This property is counterintuitive since one would expect the number of connections to follow a normal or Poisson like distribution, with P(k) depending more or less exponentially on k, and nodes having a large number of connections being disproportionately small in number. The scale-free property however leads to a valuable insight; that there are a surprisingly large number of nodes or 'hubs' which are quite densely connected. This property can have huge implications. For instance it could allow us to predict the exact hubs in an internet network which could be most vulnerable to attack. In the study of protein-protein interactions, it could tantalizingly allow us to predict which protein or set of proteins to hit in order to disrupt the maximum number of interactions. A recent study on the network of FDA approved drugs and their targets suggests that this network is scale-free; this could mean that there is a privileged set of targets which are heavily connected to most drugs. Such a study could indicate both targets which could be more safely hit as well as new targets (sparsely connected nodes) which could be productively investigated. Any such speculation can of course only be guided by data, but it may be much more difficult to engage in it without looking at the big picture afforded by graphs and networks.
However the scale-free property has to be very cautiously inferred. Many networks which seem scale-free are subnetworks whose parent network may not be scale-free. Conversely a parent network that is scale-free may contain a subnetwork that is not scale-free. The usual problem with such studies is the lack of data. For instance we have still plumbed only a fraction of the total number of protein-protein interactions that may exist. We don't know if this vast ultimate network is scale-free or not. And of course, the data underlying such interactions comes from relatively indirect methods like yeast two-hybrid or reporter gene assays and its accuracy must be judiciously established.
But notwithstanding these limitations, concepts from network analysis and graph theory are having an emerging impact in drug discovery and biology. They allow us to consider a big picture view of the vast network of protein-protein, gene-gene, and drug-protein and drug-gene interactions. There are several more concepts which I am trying to understand currently. This is very much a field that is still developing, and we hope that insight from it will serve to augment the process of drug discovery in a substantial way.
Some further reading:
1. A great primer on the basics of graph theory and its applications in biology. A lot of the references at the end are readable
2. Applied Combinatorics by Alan Tucker
3. Some class notes and presentations on graph theory and its application in chemistry and biology
4. A pioneering 1999 Science paper that promoted interest in scale-free networks. The authors demonstrated that several diverse networks may be scale-free.