There Must be 50 Ways to Share your Model
Every author must balance the trade-offs between the conciseness and readability of a results section. Gain practice with this simple exercise (and some examples)!
The reproducibility crisis in science isn’t just a matter of research methodology and design - seemingly insignificant decisions made during statistical analysis matter as well. These decisions range from mathematical modifications to a dataset (e.g., outlier exclusion, log transformation, means-centering, etc.) to the structure of a statistical analysis or model. Some days, this feels like a delicate balancing act: thoroughly describe a statistical approach or engage readers with an entertaining narrative. Fortunately, there’s no need to choose one over the other! With some practice, you’ll find a way to balance them both.
Today, I’ll be sharing the exercise I developed to strengthen my statistical writing during graduate school, where I had the privilege of conducting and reporting many analyses under the watchful eyes of Jerry Frieman, Mike Young, and Jin Lee. Today, I invite my students to practice this exercise and consider the impact their words have on the clarity of the statistical narrative they hope to convey.
The Dataset
The data I’ll use to illustrate this exercise comes from an open-source dataset available from the carData package in R. The airquality dataset contains daily measurements of ozone (NY Department of Conservation) and various meteorological (NWS) variables from May 1, 1973 to September 30, 1973. Today, I’ll focus on a handful of the variables:
Ozone levels (ppb) from 1:00 - 3:00 pm on Roosevelt Island
Solar.R, solar radiation levels (Å) from 8:00 am - 12:00 pm in Central Park
wind speed (mph) from 7:00 - 10:00 am at La Guardia Airport
temp, the maximum temperature (F) at La Guardia Airport
Month (1-12) during which the measurements were captured
Day (1-31) of the month on which the measurements were captured
Explaining a Model for the Reader
“If you can't explain it simply, you don't understand it well enough.”
- Albert Einstein
As a statistician, I strive to write for two audiences: other statisticians and the public. Statisticians care about the technical details. They want to know that I value their expertise; they may also wish to reproduce my analyses. The public seeks interpretation. What do the results mean? How do we know they mean what you say they do? Writing for both audiences provides the best of both worlds: technical details are presented intuitively, allowing anyone (experts, students, the public) to understand the research results and analytic approach. Therefore, I recommend that a write-up include the following pieces of information:
A Note about Statistical Jargon
Every author must balance the trade-offs between conciseness and readability. On the one hand, statistical terminology can greatly reduce the length of a write-up. On the other hand, the public may not be familiar with such terminology. An author must decide how much weight to put on each of these and strike a balance that will meet the needs of their target audience.
To illustrate these trade-offs, I’ll be using statistical terminology in some of my examples. Here are a few short definitions:
main effect: part of the analysis looked at the influence of a single predictor on the outcome.
(higher-order) interaction: part of the analysis looked at the influence of two or more variables in combination on the outcome.
full-factorial: the analysis included the main effects and all higher-order interactions
covariate: the analysis included a predictor as a main effect to statistically control for its influence on the outcome. The statistician can now report the influence the other predictors have on the outcome above and beyond the influence that is attributable to the covariate.
There Must be 50 Ways to Share Your Model
When I was in graduate school, I read that Ernest Hemingway encouraged aspiring authors to write the same paragraph in different ways. Although I’ve never confirmed Hemmingway as the source of this advice, I believe applying the practice greatly improved my statistical writing. Not only did it allow me to detach from writing (“murder your darlings,” as Arthur Quiller-Couch suggests) but it produced and encouraged me to reflect on the strengths and weaknesses of a variety of narrative structures. So, in the spirit of Paul Simon, here are (almost) 50 ways to share your model.
For this exercise, I’ll be reporting a mixed-effects linear regression:
Although this model is more complex than a regression, readers can adapt the content here to report simpler models by omitting information about the “fixed and random effects.”
Example One
This write-up is similar to what I see in academic papers. It follows a formulaic structure (analysis, research question, model structure) and uses the past perfect tense (e.g., “was used”). The write-up shares that the analysis contains higher-order interactions, which means that it also includes all the two- and three-way interactions between the variables. The phrase “main effect of day” is set apart to let us know that this variable was not involved in the interactions.
Strengths: the write-up is accurate; the structure will be familiar to academic readers.
Growth Opportunities: the research question could be made more specific (i.e., Why would readers care about ozone?); the model structure could be described more overtly (i.e., referencing the fixed and random effect structure).
Example Two
This write-up inverts the typical structure of a write-up to emphasize the model structure. It also ends with the research question, making this aspect more salient in the reader’s mind (which may be helpful before reporting the significant findings). Here, the author has used the term mixed-effects instead of multi-level. Both are accurate ways to refer to the analysis, but one phrase may be more familiar to readers than another.
Strengths: the write-up is accurate; the structure may facilitate reader interpretation by bookending the paragraph with reminders about the research question.
Growth Opportunities: the write-up could be more overt in how it shares that day was only included as a main effect; the model structure could be described more overtly (i.e., referencing the fixed and random effect structure).
Example Three
The statistical jargon of this example produces a more concise write-up: full-factorial simply means that the model included all possible main effects and interaction terms. The example also assumes a certain amount of mind-reading on the part of the reader: it says that it “controls for fluctuations in ozone throughout the month and year” but does not specify how that was accomplished.
Strengths: the write-up is concise.
Growth Opportunities: the write-up could be improved by providing additional information about the model structure; the write-up could be modified for a public audience.
Example Four
This write-up addresses concerns about model structure by introducing new statistical jargon: the phrase covariate is included to explain how the statistician controlled for variance associated with the time of month. It also clarifies that the model intercept (i.e., the random effect structure) was allowed to vary by month.
Strengths: the write-up is concise; the write-up provides details about all aspects of the model structure; the write-up explains the rationale for various statistical decisions.
Growth Opportunities: the write-up could be more overt about the model structure by mentioning the fixed and random effect structures directly; the write-up could be modified for a public audience.
Example Five
Here, the paragraph’s structure addresses some of the growth opportunities noted above: it mentions the fixed and random effect structures overtly and justifies additional predictors in the model’s structure (see last sentence).
Strengths: the write-up is concise; the write-up includes all information necessary to reproduce the results.
Growth Opportunities: the write-up could be modified for a public audience.
Example Six
This structure emphasizes readability and directs statistically-minded readers to supplemental materials. While this structure may be found lacking by an academic audience, it may be appropriate for boardroom settings or public-facing announcements.
Strengths: the write-up is concise; the write-up is readable.
Growth Opportunities: the write-up may not be well-received by academic audiences.
Other Relevant and Important Statistical Details
It is also important to provide information about any mathematical modifications made during data cleaning. For example, all of the examples provided here could be prefaced with the following:
I conducted statistical analyses in R (R Core Team 2023) using the tidyverse (Wickham et al., 2019) and lme4 (Bates et al., 2023) packages. All continuous variables were means-centered and scaled prior to analysis. These steps reduced structural multicollinearity and improved the convergence of the analysis, respectively.
Closing Thoughts
The language we use to share our statistical results can determine the reach of our work. Unclear analytic approaches will be, at worst, distrusted and at best, ignored. Clear communication ensures our work has maximal reach and impact. I hope this post illustrates some of the many ways you can describe your statistical analyses.
Happy writing!
Dr. V