In research documented in a forthcoming paper, University of California, Berkeley, Ph.D student Shiry Ginosar and her team demonstrate how computers could be used to aid historians who want to carry out analysis of large amounts of visual data, using more than a century of yearbook photos of American high school seniors as their test case. In this video, made by Slate producer Aymann Ismail, you can see how their dataset’s composite images of graduating 18-year-olds change over time.
“We’re always on the lookout for visual datasets where we can separate one thing out of everything else,” Ginosar told me. “Those are always gold mines for us.” The regular nature of yearbook photos—schools have been asking students to face forward and be recorded for posterity since the early 20th century—made them a good candidate for this kind of machine-driven visual analysis, which can catch small variations in repetitive images.
Ginosar looked for images already digitized and available on the Web and found yearbooks uploaded through the Internet Archive (they have a lot) and various public library websites. After the researchers excluded photos that wouldn’t be easily legible by machine, the final data set is made up of 37,921 forward-facing portraits. The population represented in the dataset is from 115 high schools, in 26 states. Ginosar stresses the fact that the team tried to regularize the dataset as much as possible, distributing the photos across historical time and geographical region, but adds that the amount of data is still relatively small; caveats about its completeness apply.
Even with this preliminary dataset, the researchers were able to demonstrate some interesting trends. The researchers created a delightfully named “lip curvature metric” to measure smile intensity, finding that while everyone smiled more as time went on, girls always smiled more than boys. Clusters of photos of female hairstyles, included in the paper, show the prevalence of dominant fashions over time.
I asked Ginosar what other questions future researchers might apply to such a set. She said that eventually a more complete dataset of yearbook photos could answer questions about the changing ethnic makeup of a neighborhood’s populations or the geographic spread of fashions within the United States. Sure, people in these photos smile more as the century proceeds—but did the smiling trend start in urban or rural areas? How did changes in women’s hairstyles spread? “If we could collect and process more data,” Ginosar says, “we could normalize our distribution and answer more questions like these.”
Ginosar and her team are looking for submissions of data to add to their set. Her contact information is available here.