We’ve all asked questions about the world around us. For instance, just last week I considered whether Ibuprofen was the best medication to take to soothe my feet, which ached from 10 miles of walking in Chicago. My partner and I rode the L and debated whether it was healthier to adopt a vegetarian diet. When we arrived back at our Airbnb, we pondered which documentary we’d be most likely to enjoy on Netflix.1 These are just a few of the many questions we asked ourselves during our weekend in the City. And the number of questions we asked ourselves is a small fraction of the total number of questions that all the readers of this post could generate!
Usually, when we try to answer questions about the world around us, we consider stories about our own experiences or those of the people we know. What happened the last time I took ibuprofen? Do I know anyone on a vegetarian diet? This anecdotal evidence provides an accurate representation of a single person’s experiences and helps us to make sense of the world around us. If another person - one exactly like us or the people we know - were to encounter an identical circumstance, this anecdotal evidence could serve as a good prediction of what might happen to them. Sometimes, though, we want to apply the available evidence to answer a question about another person.
Limitations of Anecdotal Evidence
Two things make it hard to use anecdotal evidence to answer a question about another person: generalizability and measurement error.
Generalizability
You - and the people you know - probably have more in common with one another than you do with people you don’t know. Maybe you’re around the same age or live in the same country. You might enjoy the same hobbies or have the same religious beliefs. All of these factors - and others, both seen and unseen - affect your experiences and the way you view and interact with the world around you. When we ask questions about the world around us, it’s important to make sure that our conclusions generalize well. When they do, they serve as a good prediction of what could happen to other people.
You probably intuitively understand the idea of generalizable results. Let’s imagine that you’re looking after both a snake and a baby. Research has shown that snakes really like to eat live mice. Should you feed the baby a live mouse? Of course not! Research about snakes’ favorite foods doesn’t generalize to humans because we have different taste preferences and dietary needs. If we wanted to answer questions about babies’ favorite foods, we would need to study this by collecting evidence from infants. And given how picky babies can be, we would probably want to gather this evidence from as many different infants as possible. By doing so, we could gather evidence from a representative sample (i.e., many different infants) of the population of interest (i.e., all babies around the world).
In the same way, when we answer research questions - or read about other people who did - it’s important to consider who evidence was collected from and whether this group includes the people that conclusions are being drawn about. We call this “collecting a representative sample from the population of interest.” This information is usually found under the “participants” heading in the method section of a research paper or technical report. For example, this method section taken from my own paper strongly suggests that my conclusions answer questions about college students around 22 years old who take psychology classes (Vangsness et al., 2022). I wouldn’t want to use this evidence to draw conclusions about mid-career professionals or retired adults.
Generalizability has been a major concern of scientific research, especially as it relates to marginalized populations who are less likely to be included in a sample. For example, women (Daich et al., 2022) and people of color (Kwaitkowski et al., 2013) were historically left out of clinical trials, which meant that medications, devices, and treatments sometimes were not as effective for these populations. Fortunately, the American Medical Association has made it a mission to reverse this problem, and now recommends that all clinical trials include diverse representation from the population of interest (AMA, 2014).
Measurement Error
Many things can influence the way that humans provide evidence. Think back to the last time you had to tell someone how much you weighed. Did you give an honest response? Or was your answer influenced by a desire to appear larger or smaller than you actually are?2 Did the weight you shared reflect your weight with your clothes and shoes on or off? Do you think that your weight would be the same if you used a different scale, or after you drank a lot of water? These are just a few examples of the many ways that measurement error can creep into the evidence that we collect about weight. Anecdotal evidence rarely takes this measurement error into account.
To avoid measurement error, it’s important to engage in a careful, systematic process to collect our evidence. This means taking steps to improve internal validity by removing or reducing the influence of factors that could impact our measurements. For example, we might provide anonymity to the people we collect data from to reduce concerns that someone would identify them. We could also make sure our information was always collected in the same way (e.g., from the same scale, with clothes and shoes on). It also means choosing measures with high construct validity, that are most representative of what we want to know more about (e.g., asking people to report their weight in lbs, as compared to the size of their favorite blue jeans). Think about it for a few minutes: can you come up with five other things that would affect the accuracy of your weight?
When we (or others) answer research questions, it’s important to provide information about how we collected our evidence. Since gathering evidence is an involved process, this information is often found in large parts of the method section labeled “materials” or “procedure”. For example, the procedures reported in my own paper take up three, single-spaced pages (Vangsness et al., 2022)! Here are just 5 of the things I considered when collecting my evidence:
using questions from researcher-validated surveys;
presenting the surveys, and the questions within them, in a random order;
including special survey questions to make sure participants were paying attention;
allowing people to respond on the same 1-7 scale so I could compare responses across surveys; and
evenly administering surveys over a 16-week period so that many different people could respond.
Empirical Evidence
As you can see, there are many factors to consider when collecting evidence! In fact, there are entire research methods classes dedicated to teaching people how to go about it. We don’t have space to go into all those details; for now, it’s enough to know that empirical evidence is both generalizable and takes steps to reduce measurement error.
As you can imagine, empirical evidence is harder to obtain than anecdotal evidence. For one, it is time-consuming to gather information from people besides just ourselves and those we know. It can also be costly - both in time and money - because we need to be very careful in how we gather that information. Despite these drawbacks, empirical methods are the best way to ensure that the evidence we collect can answer questions about other people.
Continue the Journey
Want to test your knowledge of the topics featured in this article? Take our five-question quiz!
Want to support more blog posts, practice questions, and featured content? Buy me a coffee.
For what it’s worth… (1) yes, this was a great choice; (2) we decided to incorporate vegetarian meals because we like them, health benefits aside; and (3) we are currently enjoying The Turning Point: The Cold War.
Fun facts for the next time you’re at trivia: women tend to under-report while men tend to over-report their weight (Johannson et al., 1998). Additionally, women who are underweight over-report and those who are overweight tend to under-report (Lin et al., 2012).