Everyone who spends time with children knows how incredibly much they learn. But how can babies and young children possibly learn so much so quickly? In a recent article in Science, I describe a promising new theory about how babies and young children learn and the slew of research that supports it. The idea is that kids learn by thinking like Nate Silver, the polling analyst extraordinaire at the New York Times.
I suspect that most people who, like me, obsessively click his FiveThirtyEight blog throughout the day think of Nate as a semi-divine oracle who can tell you whether your electoral prayers will be answered. But there is a very particular kind of science behind Nate’s predictions. He uses what’s called Bayesian modeling, named after the Rev. Thomas Bayes, an 18th-century mathematician. The latest studies show that kids, at least unconsciously, are Bayesians, too.
The Bayesian idea is simple, but it turns out to be very powerful. It’s so powerful, in fact, that computer scientists are using it to design intelligent learning machines, and more and more psychologists think that it might explain human intelligence. Bayesian inference is a way to use statistical data to evaluate hypotheses and make predictions. These might be scientific hypotheses and predictions or everyday ones. If you’re Nate, they could be about whether 50- to 60-year-old voters in suburban Iowa prefer Obama or Romney for president. If you’re a 1-year-old, they could be about whether Mom would prefer to eat Goldfish crackers or raw broccoli for a snack. (In my lab, we showed that children learn about such preferences between the ages of 14 and 18 months.)
Here’s a simple bit of Bayesian election thinking. In early September, the polls suddenly improved for Obama. It could be because the convention inspired and rejuvenated Democrats. Or it could be because Romney’s overly rapid response to the Benghazi attack turned out to be a political gaffe. Or it could be because liberal pollsters deliberately manipulated the results. How could you rationally decide among those hypotheses?
Well, if the pollsters deliberately manipulated the results, that would certainly lead to a change in the numbers—in Bayesian terms, there is a high likelihood that the poll numbers will change given deliberate manipulation. If the convention was inspiring, that also usually leads to a rise in the polls—there’s also a high likelihood that the polls will change given a successful convention. It turns out that gaffes, though, especially foreign-policy gaffes, rarely lead to changes in the polls—not a high likelihood there. So conventions and manipulation are more likely to lead to changes in the polls than gaffes are.
On the other hand, it’s much less likely to begin with that the pollsters deliberately altered the polls than it is that the convention was inspiring or that Romney indeed made a gaffe (at least, unless you’ve been watching Fox News). This is what Bayesians call “the prior”—how likely you think the hypotheses are before you even consider the new data.
Combining your prior beliefs about the hypotheses and the likelihood of the data can help you (or Nate) sort through the possibilities. In this case, the inspiring convention idea is both likely to begin with and likely to have led to the change in the polls, so it wins out over the other two. And once you have that hypothesis, you can make predictions. For example, if the poll increase really was the result of the convention, you might predict that it would fade as the convention recedes.
Bayesian reasoning lets you combine these kinds of information and draw conclusions in a precise mathematical way. Bayesian reasoning is always about probabilities, not certainties. If logical deduction gives you proofs of truths, Bayesian inference tells you the probabilities of possibilities. You won’t jump to an unlikely hypothesis right away. Still, if enough data accumulate, even an initially unlikely hypothesis can turn out to be right. The “47 percent” gaffe, unlike the Benghazi gaffe, really did seem to move the numbers, but it took especially strong and convincing data to get Nate to draw that conclusion.
It turns out that even very young children reason in this way. For example, my student Tamar Kushnir, now at Cornell, and I showed 4-year-olds a toy and told them that blocks made it light up. Then we gave the kids a block and asked them how to make the toy light up. Almost all the children said you should put the block on the toy—they thought, sensibly, that touching the toy with the block was very likely to make it light up. That hypothesis had a high “prior.”
Then we showed 4-year-olds that when you put a block right on the toy it did indeed make it light up, but it did so only two out of six times. But when you waved a block over the top of the toy, it lit up two out of three times. Then we just asked the kids to make the toy light up.
The children adjusted their hypotheses appropriately when they saw the statistical data, just like good Bayesians—they were now more likely to wave the block over the toy, and you could precisely predict how often they did so. What’s more, even though both blocks made the machine light up twice, the 4-year-olds, only just learning to add, could unconsciously calculate that two out of three is more probable than two out of six. (In a current study, my colleagues and I have found that even 24-month-olds can do the same).
There are other examples of kids thinking like Nate. All polls depend on the idea of sampling. If you poll a relatively small number of voters in a truly random way, then you can figure out the choices of the other voters you didn’t poll. Even 8-month-olds seem to understand something about sampling. Fei Xu at the University of California-Berkeley and her student Vashti Garcia used a “looking-time” technique, which depends on the fact that babies look longer at unexpected events. The experimenter showed the infants an opaque box, and then she closed her eyes, randomly took some colored pingpong balls from the box, and put them in a small bin. She might take out four red balls, for example, and only one white ball. Then the babies got to see inside the box. Sometimes they saw that the box mostly had red balls in it with just a few scattered white ones—that makes sense if the 4-to-1 sample was truly random. But sometimes the box didn’t match the sample—the kids might see the experimenter take mostly red balls out of a box that was mostly white. Babies consistently looked longer when the sample wasn’t random than when it was.
The unlikely events in this experiment weren’t impossible—you could, after all, pull mostly red balls from a mostly white box. But they were very improbable. It’s as if the infants said to themselves: “Aha! There’s a less than .05 probability that this occurred by chance!”
The babies seemed to recognize whether the pingpong ball sample was random or not, but would they use that statistical pattern to actually test hypotheses and draw deeper conclusions? Would they, like Nate, be able to tell the difference between random noise and a genuine indication about how someone would act in the future?
Tamar Kushnir and colleagues did an experiment rather like the ping-pong ball one with 20-month-olds. An experimenter took out five toy frogs from a box of all frogs or she took five toy frogs from a box of almost all toy ducks. Then she left the room and an assistant gave the child a small bowl of frogs and a separate bowl of ducks. When she came back the experimenter exclaimed, “Just what I wanted! Can you give me some!” and put her hand out between the two bowls.
When she had taken frogs from a box of all frogs, children were equally likely to give her a frog or a duck. When she had taken frogs out of the box that was almost all ducks, children gave her a frog. The children seem to have figured out that the experimenter’s choices were just the result of random chance in the first case, but that she would consistently vote for frogs over ducks in the second. They used that discovery to predict what to give her next.
All this statistical brilliance raises a puzzle, of course. If kids are so smart, why are adults so stupid? Why aren’t we all like Nate? Of course, Nate Silver has much more data than the rest of us, and he analyzes it using more sophisticated tools than simple Bayesian inference itself. Moreover, he knows that he’s doing Bayesian analysis and the kids don’t—they just do it.
But I think there’s something deeper involved. Bayesian inference depends on the balance between “priors,” the beliefs we bring to a problem, and data. As we get older our “priors,” rationally enough, get stronger and stronger. We rely more on what we already know, or think we know, and less on new data. In some studies we’re doing in my lab now, my colleagues and I found that the very fact that children know less makes them able to learn more. We gave 4-year-olds and adults evidence about a toy that worked in an unusual way. The correct hypothesis about the toy had a low “prior” but was strongly supported by the data. The 4-year-olds were actually more likely to figure out the toy than the adults were.
As adults, the strength of our pre-existing beliefs, whether they involve the iniquities of Rasmussen or the malice of the MSM, may make those beliefs impervious to data. People often complain about the childishness of American politics. But maybe a bit more real childishness would be a good idea.