This article was first published in the April 2019 issue of the Forestry Source.
In these Biometrics Bits articles, my colleagues and I spend a lot of time thinking about measurements, statistics, and computer simulations. But there’s another source of information that we shouldn’t forget about: field experience and local knowledge.
When a timber cruise comes back higher than we’d expect, something doesn’t feel quite right. It doesn’t match up with our experience. We might be inclined to shave the estimates down a bit. Is this just fudging the numbers? In this article, I explore a rigorous and justifiable way to incorporate our experience and prior expectations into our inventory estimates. Using the power of Bayes’ Theorem (developed in the 1700s), we will gain a quantitative tool for incorporating our local forestry knowledge into our cruise workups.
Before even setting foot in a stand, we probably know at least something about it. For instance, if we’re talking about a 25-year-old natural pine stand in the Southeast, we might anticipate a basal area of about 80 ft2/acre. And our experience might lead us to expect values as high as 110 or as low as 50 ft2/acre. Basal area values outside this range might occur, but they’d be a bit unusual. An experienced forester familiar with a property could likely provide even more precise expectations without cruise data.
We can begin to transform this experience into the language of statistics. We now have an initial expectation for the average (or mean) of the basal area (80 ft2/acre). We also have a sense for what the variability (the standard deviation) of our basal area estimate might be +/- 5 ft2/acre. This intuition is represented graphically in Figure 1. The horizontal axis represents the amount of basal area we’d expect to find in the stand, while the vertical axis represents the probability that we expect to find that exact amount of basal area.
In forestry school, we all learned how to calculate the mean and standard deviation from a set of cruise data. These are derived purely from the cruise data and do not incorporate the expert knowledge we discussed above.
Imagine we cruised the stand described above and found a different mean, or higher standard deviation, than our prior expert opinion expected. Our immediate reaction may be to say that our prior opinion was wrong, or that perhaps stand conditions had changed since we were last out there. But what if there were issues with the cruise? It doesn’t make sense to cast aside all of our expert knowledge based on a single sample. Instead, we can get the best of both by integrating our prior knowledge with the cruise data.
Combining Cruise Data and Expert Knowledge
We can go about this thanks to a simple (and quite old) statistical concept called Bayes’ Theorem, which provides a way to update our prior expert knowledge with in-formation from cruise data to develop a new expectation that incorporates both sources of information.
This article takes a high-level look at the results of applying Bayes’ Theorem to a particular cruise. All the data and analysis code for the example below are available at github.com/SilviaTerra/BiometricsBits.
The example cruise consists of 50 plots collected in a mature pine stand in the southeastern United States. To illustrate the impact of combining expert knowledge and data, we’ll estimate basal area for this stand in three ways:
- Using only our expert knowledge: Simply estimate the mean and uncertainty using the distribution defined in Figure 1.
- Using only the cruise data: Assume we know nothing about the stand beforehand, and develop a mean and variability estimate using the methods we learned in forestry school.
- Using Bayes’ Theorem: Integrate both expert knowledge and the cruise data. In technical terms, we will be “placing an expert knowledge of prior distribution on our cruise estimate.”
The histograms in Figure 2 represent the output of the three cases. Each is the result of thousands of simulations. Imagine we are rolling dice, but rather than the die having six faces, there are lots of different faces representing different amounts of basal area (a face for 0, 10, 20, 30—all the way up to 250 BA). Additionally, while the six sides of a die have equal probability, not all basal areas have the same chance of coming up. The chances of a particular basal area being “rolled” on our die are related to statistical distributions like those in Figure 1. In that particular distribution, we see that a basal area of 80 has the highest likelihood, and the probability of other basal areas declines the farther we get from 80.
The experience-only and data-only histograms were created by “rolling dice” according to the statistical distributions described by the means and standard deviations described above. For the third case, data and experience, we used Bayes’ Theorem to create a new combined distribution and used that to “weight” the die.Examining histograms of these three simulations shows some stark differences. Recall that prior to looking at the data, we had a fairly strong expectation that mean basal area would be close to 80 ft2/acre, and this distribution reflects that with a large peak near that value. Both the data-only and data-plus experience simulations have a higher mean than we expected given our experience (see Table 1), but note that in both cases the peak in this value isn’t nearly as strong. Incorporating cruise data provides much more weight for values that are close, but not exactly equal, to our mean estimate. Note also that including prior experience shifted the mean in the direction of the “experience only” mean (Table 1), though the change is not particularly large. This is probably owing to the fact that this cruise is a robust sample of 50 plots within a single stand. A smaller, more variable dataset would have resulted in more weight being given to our prior expectation.
In addition to the difference in the mean, the combined distribution has a smaller standard deviation, and therefore, lower uncertainty, than the other two estimates. Recall that the experience-only distribution (Figure 2) had a lot of weight at our expected mean of 80 ft2/acre. Since applying Bayes’ Theorem integrates the distribution of the data with our prior expectations, the high precision we expected results in higher precision in our final estimate.
So, what should we do with these results? In this case, the fact that the cruise mean was higher than our expected value, even after integrating our expertise, would lead us suspect that there’s something different about this stand compared to those we’ve inventoried in the past. We may decide we are convinced by the evidence, or we may elect to collect additional information before deciding whether these results lead us to update our prior belief. It’s up to you to decide. The beauty of Bayes’ Theorem is that it gives us a rigorous framework to ask critical questions about our data, and to weigh the evidence it presents against our expert knowledge.
You may be thinking, “That’s how my intuition works already.” That’s great! The framework outlined here may help you codify this approach and make it part of your regular workflow. The intuitive insights encoded in Bayes’ Theorem may seem simple, but they have far-reaching consequences for modern forest statistics. This article has just scratched the surface of Bayesian biometrics—I’d be happy to continue the conversation if you have questions or comments. If you are interested in learning more about Bayes’ Theorem and Bayesian statistics, I recommend checking out Statistical Rethinking by Richard McElreath.