One of the questions seldom asked when building a model or assessing experimental data is "What's the error in that?". Unless we know the errors in the measurement of a variable, fitting predicted to experimental values may be a flawed endeavor. For instance when one expects a linear relationship between calculated and experimental values and does not see it, it could either mean that there is a flaw in the underlying expectation or calculation (commonly deduced) or that there is a problem with the errors in the measurements (not always discussed).
Unfortunately it's not easy to find out the distribution of errors in experimental values. The law of large numbers and central limit theorem often thwart us here; most of the times the values are not adequate enough to get a handle on the type of error. But in the absence of such concrete error estimation, nature has provided us with a wonderful measure to handle error; assume that the errors are normally distributed. The Gaussian or normal distribution of quantities in nature is an observation and assumption that is remarkably consistent and spectacularly universal in its application. You can apply it to people's heights, car accidents, length of nails, frequency of sex, number of photos emitted by a source and virtually any other variable. While the Gaussian distribution is not always followed (and strictly speaking it applies only when the central limit theorem holds), I personally regard it to be as much of a view into the "mind of God" as anything else.
In any case, it's thus important to calculate the distribution of errors in a dataset, Gaussian or otherwise. In biological assays where compounds are tested, this becomes especially key. An illustration of the importance in error estimation in these common assays is provided by this recent analysis of model performance by Philip Hajduk and Steve Muchmore's group at Abbott. Essentially what they do is to estimate the standard deviations or errors in a set of in-house measurements on compound activities and look at the effect of those errors on the resulting R values during comparison of calculated activities with these experimental ones. The R value or correlation coefficient is a time-tested and standard measure of fit between two datasets. The authors apply the error they have obtained in the form of "Gaussian noise" to a hypothetical set of calculated vs predicted activity plots with 4, 10 and 20 points. They find that after applying the error, the R-value itself adopts a Gaussian distribution that varies from 0.7 to 0.9 in case of the 20 point measurement. This immediately tells us that any such measurement in the real world that gives us, say, a R value of 0.95 is suspicious since the probability of such a value arising is very low (0.1%), given the errors in its distribution.
You know what should come next. The authors apply this methodology and look at several cases of calculated R values for various calculated vs measured biological activities in leading journals during 2006-2007. As they themselves say,
The take home message from all this is of course that one needs to be aware of errors and needs to apply those errors in quantifying measures of model assessment. God is in the details and in this case his name is Carl Friedrich Gauss, who must be constantly beaming from his Hanover grave.
Reference:
Brown, S., Muchmore, S., & Hajduk, P. (2009). Healthy skepticism: assessing realistic model performance Drug Discovery Today, 14 (7-8), 420-427 DOI: 10.1016/j.drudis.2009.01.012
Unfortunately it's not easy to find out the distribution of errors in experimental values. The law of large numbers and central limit theorem often thwart us here; most of the times the values are not adequate enough to get a handle on the type of error. But in the absence of such concrete error estimation, nature has provided us with a wonderful measure to handle error; assume that the errors are normally distributed. The Gaussian or normal distribution of quantities in nature is an observation and assumption that is remarkably consistent and spectacularly universal in its application. You can apply it to people's heights, car accidents, length of nails, frequency of sex, number of photos emitted by a source and virtually any other variable. While the Gaussian distribution is not always followed (and strictly speaking it applies only when the central limit theorem holds), I personally regard it to be as much of a view into the "mind of God" as anything else.
In any case, it's thus important to calculate the distribution of errors in a dataset, Gaussian or otherwise. In biological assays where compounds are tested, this becomes especially key. An illustration of the importance in error estimation in these common assays is provided by this recent analysis of model performance by Philip Hajduk and Steve Muchmore's group at Abbott. Essentially what they do is to estimate the standard deviations or errors in a set of in-house measurements on compound activities and look at the effect of those errors on the resulting R values during comparison of calculated activities with these experimental ones. The R value or correlation coefficient is a time-tested and standard measure of fit between two datasets. The authors apply the error they have obtained in the form of "Gaussian noise" to a hypothetical set of calculated vs predicted activity plots with 4, 10 and 20 points. They find that after applying the error, the R-value itself adopts a Gaussian distribution that varies from 0.7 to 0.9 in case of the 20 point measurement. This immediately tells us that any such measurement in the real world that gives us, say, a R value of 0.95 is suspicious since the probability of such a value arising is very low (0.1%), given the errors in its distribution.
You know what should come next. The authors apply this methodology and look at several cases of calculated R values for various calculated vs measured biological activities in leading journals during 2006-2007. As they themselves say,
It is our opinion that the majority of R-values obtained from this (small) literature sample are unsubstantiated given the properties of the underlying data.Following this analysis they then apply similar noise to measurements for High-Throughput Screening (HTS) and Lead Optimization (LO). Unlike HTS, LO usually deals with molecules sequentially synthesized by medicinal chemists that are separated by small changes in activity. To investigate the effect of such errors, enrichment factors (EFs) are calculated for both scenarios. The EF denotes the percentage of active molecules found or "enriched" in the top fraction of screened molecules relative to random screening, with a higher EF corresponding to better performance. The observation for HTS is that small errors give large EFs, but what is interesting is that even large errors in measurement can give modest enrichment, thus obscuring the presence of such error. For LO the dependence of enrichment on error is less, reflecting the relatively small changes in activity engendered by structure optimization.
The take home message from all this is of course that one needs to be aware of errors and needs to apply those errors in quantifying measures of model assessment. God is in the details and in this case his name is Carl Friedrich Gauss, who must be constantly beaming from his Hanover grave.
Reference:
Brown, S., Muchmore, S., & Hajduk, P. (2009). Healthy skepticism: assessing realistic model performance Drug Discovery Today, 14 (7-8), 420-427 DOI: 10.1016/j.drudis.2009.01.012
No comments:
Post a Comment
Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS