There have been quite a few articles out about a new study that claims that running barefoot, as opposed to running shod, improves your memory. I thought I’d take a look at the study to see what it really said, and just what it means to improve your working memory.
Barefooters probably won’t like what I have to say.
It seems there is always some new study coming out about this, that, or the whatever. And there are a lot of studies about the effects of running (or even walking) barefoot. But all studies are not created equal. Sometimes they come from quackery, as with the “Earthing” studies. But even regular, scientific studies that don’t rely on mystical electrical effects are not necessarily rigorous. Also, often the news reports regarding the studies don’t accurately describe them. So I wanted to look at this one.
Some of the articles about the study include Working memory is better after a barefoot run; A barefoot run might be a brain-booster, research shows; and A Barefoot Run Might Be a Brain Booster.
The study itself is An Exploratory Study Investigating the Effects of Barefoot Running on Working Memory, by Ross G. Alloway, Tracy Packiam Alloway, Peter M. Magyari, and Shelley Floyd. It appeared earlier this year in the journal “Perceptual and Motor Skills”. I will also note that the study also appeared in Shelley Floyd’s Master’s Thesis, so the paper seems to be the official write-up of that.
The result of the study at least seems plausible. If you are barefoot there is a good chance that you’ll be paying closer attention to how you are placing your feet in order to avoid obstacles that you would ignore if you are shod. This brain exercise could possibly improve one’s memory.
It is interesting to note that this study really was about “barefoot”. All too often we see people claiming something was “barefoot” when the subjects were really wearing minshoes of some sort. That is not the case here. “Barefoot” really means barefoot. And “shod” means real running shoes, not some sort of minshoe, either.
All of the participants in the study were runners. However, the study did not note whether any of them regularly ran barefoot or in minshoes, so I think it is probably safe to assume that they did not have experience running barefoot.
Let me next address their “working memory” test. It is basically somebody reading out a random list of numbers (single digits) and the duty of the testee is to recite them back, backwards. They start with 2 numbers and, as long as the testee is successful in reciting them back (backwards), they keep adding a number (newly randomized set) until the testee fails. The testee gets two chances (again, new randomized set) to get it right. Their score is the highest number they get right.
Here is the procedure they used for testing barefoot running:
The experiment took place over two days. On the first day, as the runners went around the track they were instructed to step on poker chips that had been placed on the track. On the second day, there were no chips so they were able to run without worrying as much about their foot placement.
Before the running, all runners were instructed in how to run barefoot (presumably about not heel-striking, though that was not stated).
The runners were then divided into two groups. One group was the BF/Shod group; the other group was the Shod/BF group. The way they were utilized was that the BF/Shod group had their working memory tested, they then ran barefoot for 8 minutes, stopped and had their working memory tested again, they ran again (this time shod) for another 8 minutes, and they ended having their working memory tested again. The Shod/BF group did the same thing, except that they ran shod first and barefoot second. [They also collected data on speed and heart rate, and kept track of how many chips the runners accidentally missed as they ran. However, there were no interesting results on that data so I will ignore it for this discussion.]
Quite frankly, this strikes me as pretty odd. Everybody ran barefoot; everybody ran shod. How could they tell a difference? Or maybe the researchers have already decided that working memory changes just don’t stick, so that only what happened last mattered? Seriously, this experiment design drives me nuts—how did they expect to find out anything useful? Well, let’s continue and look at the results.
Here is their money graph:
This is a box plot. Each box contains the middle two quartiles of the data (in other words, the middle half of the data defines the boxes). The “whiskers” show the extremes of the data (the outer quartiles). The line across the middle shows the median.
The second thing I noticed in this plot is that the data is not well-labeled. Notice that the boxes are labeled B-S2 and B-S3 for each day. The labels don’t seem to be attached to the BF/Shod or Shod/BF categories.
I do understand the minus signs: the plot is described as showing the difference between the baseline (B) working memory test and the final working memory test. As far as I can tell, they just never analyzed anything using the middle working memory test. Even more confusing, in the paper they described the middle working memory test as Session 2 and the final working memory test as Session 3, so one is tempted to think that the S2 and S3 in the plot stand for Session 2 and Session 3, but the text of the paper says otherwise:
Baseline WM scores were subtracted from Session 3 scores and compared as a function of running condition (barefoot first vs. shod first; Figure 1).
Thus, the only thing I can figure is that the first box is Day 1 (chips) BF/Shod, the second box is Day 1 (chips) Shod/BF, the third box is Day 2 (no chips) BF/Shod, and the fourth box is Day 2 (no chips) Shod/BF. That does support the claim of the paper that Day 1 Shod/BF contained the interesting result. (It also agrees better with the numerical data presented in a table—see below.)
Also, while the text claims that the baseline scores were subtracted from Session 3 scores, the plot labels say the reverse (Session 3 subtracted from baseline). Thus, in that plot, any improvement would show as a negative number.
I said that the second thing I noticed was the poor labeling. The first thing I noticed was the incredible spread of the data. How the heck can one come to any sort of valid conclusion with such huge error bars? Yeah, there is something peculiar with the second box, but how can one know if it means anything? (I also notice that the medians in all of the boxes is simply zero—no difference at all.)
Fortunately, they also included a table with some data. Here it is (I truncated the table to remove the uninteresting, to us, heart rate and speed data):
One thing we learn from that is that working memory scores average around 6½ and have a standard deviation of a bit less than 1½ (at least for college students). We also get to see some data about the working memory scores between the two running sessions. For instance, for the BF/Shod group we see a score from right after running barefoot.
But what is so disturbing is that this data reinforces my grave concerns about the spread of the data. Those standard deviations are huge. There is no way that one can realistically get valid conclusions out of that. There is so much overlap that any differences are assuredly due to chance.
We can also look at this to see what causes the effect that the researchers claim to have seen.
Their claim of a effect comes from WM1 – WM3 for the Shod/BF Day 1 data. (Note that WM1 is the same as the baseline.) The average score went from 5.06 (WM1) to 5.87 (WM2), and improvement of 0.81. But that all comes from an unusually low baseline! The baseline there is around 5, when all other scores are over 6. The Shod/BF group started with an abnormally low score (but even then that score was within one standard deviation).
Looking further along that Shod/BF line, we also see that the bulk of the improvement happened between WM1 and WM2 (that is, when they had only run shod and hadn’t run barefoot yet). That went from 5.06 to 5.62, an improvement of 0.56. After they then ran barefoot, that added just 0.25 to their scores. Is it a valid claim to say that you can get 2/3 of your memory improvement from just the fear of running barefoot, even if you don’t actually do the barefoot running?
Of course that’s silly. It’s clear that this is just from the spread of the data.
There is no way that they have demonstrated what they claim to have demonstrated. Forgive me, but this is shoddy research (even if it claims a result we would like to believe in).
The researchers ran this data through some sort of data analysis package and claimed some (somewhat) significant results. But I wonder if this is a case of social scientists (who often don’t have much of a math background) not quite understanding just how to use the package. They use the results without having the bigger picture of understanding. It’s kind of like using a calculator and not being able to recognize that results cannot be right because the general size is wrong (having mispunched some numbers). But with those huge standard deviations, there is no way that a validly-used analysis package could see significant results.
No way.
So, I don’t believe the results at all. The experiment had poor design (the BF/Shod, Shod/BF division), it was poorly reported, and significance was claimed with huge error bars.
Is it still possible that there is an effect? I suppose so, but if so, what I see here suggests that any effect would be quite small and nearly impossible to tease out during any experiment (even a better designed one).
I’d like to believe the claimed results. We’d all like to think that going barefoot is the greatest thing since sliced bread and allows us to leap tall buildings in a single bound.
But I also don’t want to make false claims. And I will not make the claim that it has been demonstrated that running barefoot can increase one’s working memory. It’s just not in the data.
So did they provide an F value or anything? Psychologists are used to seeing small effects. Though I agree with you that it doesn’t look like anything. And I wonder why they even tested this at all. People get research funds for this?
They did show p-values, but based on what they showed, it’s hard to believe they really meant anything. There was a mention of pairwise comparisons giving those good p-values, but I’m still suspicious. (And the whole bit about everybody running barefoot, just in different order, seems pretty specious.)
I’ve emailed you a copy of the study. See what you think.
Thanks for the paper. What I’m seeing is that working memory only improved when they had to run on poker chips, and that it improved more while barefoot than while shod. So presumably having to hit a target while running is what made the difference. (This is sort of comparable to running on a rooty rocky trail, I suppose.)
My experience with that working memory test is that it is partly about how well you can concentrate. When I was younger I tested at ceiling but now that I’m exhausted all the time, I only perform above average.
So what I’m seeing is that running barefoot on poker chips increased concentration the most. I’m wondering if caffeine would have the same effect on working memory. (:googles: yep, it does, for extraverts)
I assume this is a temporary improvement in working memory. And yeah, a really small effect. (Also, I counted 19 F tests, and with that many, you’re increasing your chances of getting significant results purely by chance.)
The statistics look fairly standard for psychology, though. One of my problems with psychology is that they do too much data crunching like this, and not enough general mapping.
Thank you for looking at the paper.
They did claim that it was the running on chips that made the difference, but to justify their barefoot claim (when both groups ran barefoot, one BF/Shod the other Shod/BF) was that the BF effect only took effect after 15 minutes, and that’s why they didn’t see anything in the second group.
I think it is highly significant that the baseline average for the BF/Shod group on Day 1 was abnormally low. What was the cause of that? Because the whole effect they see is the result of that baseline starting particularly low. It’s just horribly suspicious (and to my mind says they should have started the whole experiment again right from the beginning).
In terms of getting things by chance, yeah, they are p-hacking here. To do decent science once you think you see this effect while p-hacking, YOU DO IT AGAIN to isolate the specific effect you think is there. (But, they do say that this is an “exploratory” study. On the other foot, though, in their “Discussion”, they never say that that is what is necessary to see if the effect is real.)
Again, this is the standard reason for Why Most Published Research Findings Are False.
I don’t know what to make of this, but I found references a study on Japanese students going barefoot vs shot that showed some impressive results…. on a shoe website.
http://www.skeanie.com.au/blog/barefoot-in-japan/133