The great social media cleanup of the past two years hasn’t seemed to leave it much cleaner. As some of the most powerful companies in the world have struggled to ferret out viral false news, harassers, conspiracy theories, and foreign agents, Congress and the American public have begun to lose faith in internet platforms like Facebook, YouTube, Google, and Twitter. But one social website—the web’s fifth-largest site, by some metrics—has dodged the brunt of the ire: Wikipedia.
One might think Wikipedia would be a top target for those peddling conspiracy theories and counterfactual narratives. Yet somehow, the massive online encyclopedia has managed to retain its reputation for reliability, at least generally speaking. And that’s thanks to a sprawling online network of editors who work for free to pull fact from fiction in crafting the articles that provide the answers that float to the top of Google search queries.
Of course, Wikipedia has its problems. The most glaring may be that the editor community is overwhelmingly male, and likely white too. And that leads to erasures and omissions that reflect the worldview and concerns of the editor community. For instance, when Canadian physicist Donna Strickland won the Nobel Prize, it turned out Wikipedia didn’t have an entry about her. One had apparently been submitted prior to her winning the award, but the edit community apparently didn’t consider her sufficiently noteworthy to warrant an entry.
To talk more about how Wikipedia, a volunteer-run project, manages to be a sort of second screen for the entire internet, and to delve into some of the problems the community is facing, we spoke with the Wikimedia Foundation’s executive director, Katherine Maher, for Slate’s tech podcast If Then. An expert on technology policy across the globe, Maher discussed the role Wikipedia plays in the current debates about healthy online platforms and what the community is doing to diversify its contributors.
April Glaser: Wikipedia’s the fifth-most-popular website on the internet, according to Alexa Ratings from earlier this year. And that popularity is in part due to the fact that the most popular website on the internet, Google, regularly directs people to Wikipedia at the top of its search pages. It’s a symbiotic relationship where people search Google for answers and Wikipedia is the answer that they get. More recently, YouTube and Google have begun linking to Wikipedia to provide info on topics that tend to attract false news and conspiratorial theories in their efforts to be a more reliable source of information. According to Wikipedia itself, there are over 5.7 million articles on the English version of the site thanks to nearly 35 million users, of whom fewer than 200,000 are considered active editors. That means about 200,000 people make at least one edit a month. There are 300 active Wikipedias in different languages across the world, 48 million articles written worldwide. And this whole project is made possible thanks to volunteers who write the entries and thanks to grants and donations from the readers and the editors who use the site. I want to start by discussing the phenomenon that is Wikipedia, and that it actually seems to be largely correct. Is that correct? Am I correct about that?
Katherine Maher: Yeah. There have been numerous different studies that have shown that Wikipedia is on average as correct as any traditional encyclopedia would be, in part simply because of the volume of articles that we have—that when you do have inaccuracies they tend to be very few and far between. But also they tend to get corrected really quickly, and so as you take a look across the sites, the majority of content is correct at any given time.
Glaser: So if I were to go on there and change the birthday of President Obama, that would get corrected really quickly?
You wouldn’t be able to change the birthday of President Obama because you probably do not have enough of an edit-contribution history to be able to touch an article that is as highly scrutinized as something like Obama. So anytime we have articles that are either of top interest to folks at any given time or are in the news in any given moment, our editors take them very seriously and will protect them to make sure they don’t go ahead and get vandalized. So it would be tough to change his birthday.
Glaser: Who is editing Wikipedia? It seems that everyone wants to use it, but not everyone wants to edit it. And my understanding is that something like 90 percent of the editors are male. I don’t know the racial background of editors, but I think it’s safe to say that most are probably white or come from some kind of white-collar background. So in addition to who’s editing Wikipedia, I’d like to know also the consequences of homogeneity in the edit community.
We don’t actually know much of the background of Wikipedia editors either. We have pretty strict privacy policies. In fact, you don’t need to give us really any information to edit Wikipedia. You don’t even need an account—you can just do it anonymously. And over the years, we’ve tried different ways of surveying and sampling editors to get a better sense. I think our best-case scenario is about 20 percent of the editors identify as female, but worst case would be closer to about 10. And then in terms of ethnic and racial makeup, obviously that really depends based on what Wikipedia we’re talking about. Our Indic-language Wikipedias are primarily going to be edited by people from probably South Asia. But it is true that we tend to assume that folks editing Wikipedia have what we think of as disposable time, and disposable time tends to correlate with higher socio-economic status. How does this play out for Wikipedia? It means that we tend to have biases that reflect the composition of our editors, and I will say that those biases also tend to reflect the broader world around us.
So we talk about ourselves as a mirror held up to the world. Wikipedia is a tertiary source that is based on secondary sources, and when we go to create articles on Wikipedia, we’re very reliant on what’s already been published and what exists in the world. And so if there is a dearth of secondary sources about female scientists or African novelists, it’s going to be very hard for us to then create articles that reflect those individuals on Wikipedia itself.
Glaser: When a new public figure comes on the scene, everyone jumps to visit Wikipedia, it becomes a second screen, and a bunch of editors also jump in to get their version of the truth up there. I wrote about this for Wired a couple years ago when Merrick Garland was nominated [for the Supreme Court]. The traffic of his page soared because nobody really knew who he was unless you’re in the court scene. And then behind the scenes, the editors fought over whether to call him a judicial moderate or a strong liberal. And with so many people coming to Wikipedia for information on Garland, these descriptions really matter. How do Wikipedia editors grapple with attempts to insert their own ideological leanings?
I think that this is one of those things where the more people who have an eye on the Wikipedia article, the more accurate and neutral it tends to be. Don’t take my word for it. There’s been lots of research on this subject. The more volume of traffic, the more likely it is that someone’s going to make an edit, the more editors who are involved in the conversation, the more compact and neutral and accurate the content is going to be, the more citations, the less verbose or adjectival a description is going to be. And so it’s likely in the case of Garland—and I’m not familiar with that particular article and how that moment in time affected its composition—but it’s likely in that case, or in the case of anybody who’s under the spotlight, that if they couldn’t decide on how to describe him, they would either say, “Some people describe as ‘citation, citation, citation.’ ‘Others describe as the opposite, citation, citation, citation.’ ” Or they would not make a determination about how that description actually plays out.
And so Wikipedians will tend to present information and ask you instead to draw your conclusions rather than draw their own inferences or conclusions on topics that are difficult to be neutral around.
Will Oremus: And a lot of that discussion happens on what’s called a talk page.
That’s right. It’s almost like the newsroom behind any Wikipedia article. One of the things we like to say is, “If you’re curious about what a talk page is, go to your hometown and look at the fights that people are having about the history of the town, the town hall, local celebrities, things like this.” Because it can give you the best and most immediate understanding of how talk pages actually work. There are places where Wikipedia editors take the conversation, not offline. It all happens in public. It’s just behind the curtain. Anyone can click on the talk tab and take a look at it. Anyone can contribute to that discussion, but it’s where these differences of opinion get hammered out while articles might be paused for editing, or while folks are having robust difficult conversations about how to frame or present something or whether something should be included in an article at all.
Oremus: You talked about how Wikipedia is better when there are more people involved in this editing process. That makes a lot of sense. How is the health of the Wikipedia editor community these days? In what direction is it trending? Is it getting livelier and healthier, or are the ranks thinning out? Is there a crisis of Wikipedia editorship? How’s it doing?
There is no crisis of Wikipedia editorship. Our editors are alive and well. No, I think that there was this interesting moment in time where people were very concerned about the trajectory of editorship, and it happened around, I want to say, 2010, where Wikipedia grew very rapidly in popularity between 2001 and 2010, and then what ended up happening was a lot of that original content was filled out, at least in some of the major languages, and we started to see a decline in casual editing. But what is happening is that our numbers have really stabilized to the point where we have about a quarter-million editors every single month, and about 80,000 of those come back month on month on month and make significant contributions to the site. So overall our editor health is really good. What we would love to see is an increased diversification of that, and we’d love to see some of the languages that are perhaps not as robust as they should be relative to the size of a million speakers and the like. We’d love to see some of that grow.
So for us, it’s about maintaining the health of our current editing community but then also thinking about how do we reach people for whom we’re not there yet in their language, in their geography, or representing their sense of identity.
Glaser: I imagine if you come from a community that is not well represented in the editor community that you may be prone to harassment or feel somewhat ostracized in these tightknit talk pages where a lot of difficult conversations happen. So I’m curious about harassment on Wikipedia. Have we seen coordinated attempts to insert ideological bias or to harass people to the point where they stop maintaining certain pages?
Yeah. Absolutely. These things happen, and happen in places that you wouldn’t necessarily expect. I think that the instances of extreme harassment—the kind that you see on some of the other social platforms—we see a lot less of that because Wikipedia has rules around civility that determine whether you can participate as an editor, and if you violate those rules you will get blocked and banned by our community members. I think the bigger issue for us tends to really focus on tonality, so we’ve done some interesting research around conversational failure, and it turns out that if you start a sentence in dialogue with another editor with the word please, it actually is a really high predictor that that conversation is going to fail. Because it tends to be followed by “Please stop doing that” or “Please don’t do something that you don’t know anything about,” and so please is actually not an indicator of a necessarily positive outcome.
So what we’re trying to understand is in a community and in an ecosystem where harassment and unfriendly spaces look very different than harassment on, say, the comment section of YouTube or in a Twitter channel. What can we do to facilitate more civil and respectful conversations when we can’t necessarily automate to be able to understand because of the use of bad words, for example. And so it’s really about how we create a culture of friendly interaction as opposed to certain instances of harassment, ’cause we just don’t have that problem in quite the same way, which is not to say it doesn’t exist. I do want to be really careful to acknowledge that we have had instances of people who’ve been harassed on Wikipedia. It tends to be somebody gets a bone between their teeth and really goes after an editor or a group of editors. The times that we’ve seen these things happen in a targeted way have tended to be around things that you would expect to be controversial. We were one of the sites to be enveloped in the whole Gamergate controversy, and we absolutely saw people really go head-to-head over what that particular discussion meant, and we had a number of Wikipedia editors on all sides of the conversation who found themselves sanctioned for the way that they participated in those conversations.
Glaser: I write a lot about harassment on social media, and obviously Wikipedia is a social place where people can interact. We do not hear as much about creating a culture where people will be less prone to harass each other. It’s more about moderation, so this is really interesting.
We don’t do moderation in the same way that other social platforms do. We don’t have armies of folks sitting offshore going through content posting trying to determine if it’s harassing language or if it violates our terms of service. Our community, because it is truly a community, engages in that conversation directly, and then they have modes and means of policies to refer conversations for review and sanction as appropriate. I think that harassment is a problem, but for us it is a relatively small problem relative to the challenge of how do you create a truly inclusive space for folks when we come from a certain culture, and we come from a certain demographic background. How do you open that up so that it becomes a place where more people feel welcome?
Oremus: That’s so interesting to hear you say you don’t have moderation. I understand what you mean: You mean that Wikipedia, or the Wikimedia Foundation, isn’t going in and moderating what the editors can say or what people can add to an article. But in another sense, the whole project of Wikipedia is a project of moderation, where people are moderating what each other can say and regulating each other’s speech in various ways. It reminds me, you talked about the social platforms, and it reminds me of the difficulties that the big social networks are having right now with misinformation, conspiracy theories, fake news, all that sort of thing. And they talk about, Well, we can’t be an arbiter of truth. Or maybe in Facebook’s case: We’re trying, but it’s really hard to be an arbiter of truth. Wikipedia is at its core an arbiter of truth—that’s what you guys do. So why do you think they’re having such a tough time with it, and would you have any advice for the people running those platforms?
I think one thing that’s really different from us is, from the beginning, it’s been a community-driven project. We don’t set editorial policies for Wikipedia. The community sets that, and the community has evolved over time with these editorial policies in order to assess information quality and also the standards that they want in their spaces, to tie it back to the conversation around friendly spaces and contribution. But specifically for content moderation there are a couple of really core policies that drive the way that Wikipedia articles are created, and I think the reason that they are effective is that they’re clear, there are only three of them, they’re fairly easy to understand, there are tons of examples for how they work, there are lots of different eyeballs that focus on insuring that those policies are upheld, and it all happens in the open. The polices around accuracy of information, it requires that we site back to what we call reliable sources. It means that people can’t just put out fringe theories based on what their interests are. They have to find citations and information. It has to be peer-reviewed, or published, or have some editorial scrutiny.
These are the policies that have created a sense of accuracy and accountability on Wikipedia, and accountability not just for the editors but accountability to the public who reads this content. And I think that’s just so completely different from the way these other platforms work. Another thing that I point to is we don’t have divergent forking narratives or feeds that you sign up for. When you come to a Wikipedia article, you’re looking at the exact same thing whether you’re sitting on the other side of the continent from me or if you’re sitting in the next office over. That doesn’t afford us the space to shift narratives based on what your interest is or what an algorithm suggests that you might like. We have to be open and publicly accountable for what is published no matter what your perspective or viewpoint actually is.
It’s funny you mentioned, or refer to, Wikipedia as an arbiter of truth. We actually don’t agree with that characterization. What we would say is that Wikipedia reflects knowledge as it exists at any given moment in time. That is, knowledge is constantly being constructed, and it’s constantly being deconstructed. And so edits are made to Wikipedia, content is removed from Wikipedia, knowledge changes dramatically over time, and what Wikipedia offers is just an aggregate understanding of what we know about a topic at any given moment based on what’s been published or what common consensus says. I always use the example of Copernicus or Galileo. However many hundreds of years ago, had they written an article, we’d have some really strong articles about how the sun revolves around the Earth. But we, as hopefully humanity, have learned a lot more about our solar system, and now we know that the Earth revolves around the sun. So knowledge is a living thing, but it’s not necessarily about trying to get to some understanding of truth. It’s more just about representation about what we can all agree on at any point in time.
Where I start to find this really powerful is less on things that are settled, like heliocentricity versus geocentricism, but more about how our history—and understanding of culture, and understanding of politics, and understanding of representation—Is constantly evolving. Wikipedia’s edited 350 times a minute, which essentially means that every minute there are 350 opportunities to challenge what it is that we know, and how it is that it’s been assembled, and who has contributed to that knowledge base, and whose voices are included, and how it is that we might change that over time. So I think of Wikipedia not as an arbiter of truth but really a living contestation for how knowledge is formed and created, which is why we always say, “Don’t trust Wikipedia. Read it with a critical eye. Check the citations. And if you see something, contribute to it.” Because the way that we form knowledge is by contributing to it together and building on what’s come before.
Glaser: It’s true that there are all of these ways to gain social clout and to gain social trust in the Wikipedia community, but sometimes people edit Wikipedia anonymously, or it’s their first time, and sometimes it’s funny. And it’s something that you guys call vandalism. And I want to ask about that, because I saw this meme going around a few years ago—I think it was an actual screenshot from Wikipedia, and it was edited to say that Charlie Sheen was half-man, half-cocaine. And it was changed quickly back, I’m sure, but I think this happens a lot, and I’m curious: How does Wikipedia contend with vandalism, and is there any particularly funny one that comes to mind?
Just the other day, somebody tweeted about the fact that there’s a Wikipedia article that’s a list of fictional states, and nestled in there was Wyoming, a fictional state made up for tourism revenue by Idaho. I thought that was funny. As soon as I retweeted it, it was gone. But I think that often we see these sorts of vandalism. I think that they are funny. I also know that they can be quite annoying for our Wikipedia editors. I like to think of them as ways of demonstrating that Wikipedia is a living project and as a reminder to folks that you can go ahead and get in there and edit. We actually have more Wikipedians than we’d like to admit who first started because of vandalism. They came in to mess with the site, and they realized, “Oh, that vandalism didn’t stay up for very long. I’m curious how that works,” and then get involved in that way. In general, there are different ways that vandalism works. I don’t think this will surprise anyone, but some of the highest volume of vandalism tends to happen in school hours, and it tends to be bad words.
But we have bots that scrub the site from end to end and remove instances of poop that shouldn’t belong in a sentence and the like. The other forms of vandalism tend to get reverted very quickly. I think Congress is notorious for vandalizing Wikipedia, in fact I think it’s blocked this week; you can’t edit with a congressional IP this week because people abuse their privileges. And we’ve seen that happen. Wikipedia editors tend to keep a very close eye on what we call our recent-changes feed, and there are folks who consider themselves just to be vandalism patrollers who are always looking for things that are a little suspicious. We give this a boost by having machine learning systems that are able to identify what is likely a good edit or a bad edit and help editors triage in order to keep pace with the 350 edits a minute. Because it’s a pretty huge volume of activity on the sites at any given time.
Glaser: With Wikipedia now serving as a fact-checker for YouTube’s most polarizing conspiracy-theory videos, is there a fear that people will see these videos about how climate change isn’t real and then click on the Wikipedia link and edit the article to incorporate the counterfactual information they just saw?
This was something that we were concerned about. Obviously any time a major platform turns the worst of the internet against our sites, we worry about what the implications are for our editing community. Our editing community actually took it all in stride. They said, “We’ve got means by which we monitor these pages. We know how to deal with vandalism. We’ve been doing this for 17 years. We’ll let you know when it’s a problem.” And we went back and said pretty much the same thing to YouTube. We said, “We’ll let you know when it’s a problem, and if it does become a problem we’d appreciate some support around this. But overall it seems as though it’s something that is working out.” Our mission overall is to get knowledge out there and to be the correct place for information—to have it be as accurate as possible. And if it is a tool in the arsenal of insuring a more accurate and fact-based internet, then I think we’re probably all for it.
Glaser: It’s fascinating how the community is able to morph and absorb more responsibility as more people start to use the internet and as more large platforms start to rely, and continue to rely, on Wikipedia as a source of information.
I think what I would say there is that because it works in this instance, it doesn’t mean that we’re going to be the catchall for all of the worst bits of the internet. In reality, anytime that there is an intermediary layer between people who are reading Wikipedia content and where the content itself is created, we see that as a risk. Part of the promise of Wikipedia is that anybody anywhere can go in and check where does the information come from, when was it added, what’s the edit history, what’s the discussion on the talk page, where does the citation go, and so whenever there’s an intermediary layer that sits between our readers and our contributors, we view that as a breakdown of the trust and the promise model that Wikipedia offers in terms of accountability and transparency, but we also view it as a risk factor to the sites as a whole because of how Wikipedia works when people stop by to read it. Wikipedia works when a reader’s like, “Oh, I think that information isn’t accurate. That probably needs an update,” or “That probably could use a different citation.”
And so the volume of traffic to our sites is actually the way in which Wikipedia stays up to date and makes sure that our content is constantly expanding, and so if that information is being siphoned off and presented in different ways and in different places, that actually does create a risk for us. So I think there’s a tension between how do we make sure information’s available in the most useful ways, such as the referral back with the YouTube videos, but also making sure that information is not taken completely out of context and presented in a way that ultimately chokes off the way that Wikipedia works.
Oremus: You actually just answered part of the question that I was going to ask, which is: Part of the future of computing right now on the internet seems to be this move toward voice assistance. Whether it’s on a smart speaker, like an Amazon Echo or Google Home, or it’s Siri on your phone, or talking to the Google Assistant on your phone, for a lot of those, when you ask a question—when you ask Siri or Alexa or Google a question—the answer you get will be content from Wikipedia. So that’s a way that people are gonna be using Wikipedia more and more, presumably, but maybe not even know that they are using Wikipedia. Certainly not visit the site and maybe run into a fundraising appeal or get involved in that way. Is that a big concern for you going forward, and are you hopeful that donations from those big platforms … I saw that Amazon recently gave $1 million to the Wikimedia Foundation maybe partly for that reason.
Does that have to become more of your business model now if these platforms are going to be siphoning off your information or siloing your information in that way?
I think I have many thoughts and responses to your question. One thing to note is that there’s the immediate value that these platforms get out of having Wikipedia as a resource from which they can pull answers and information to provide to their users. But the other part that most people aren’t as aware of is that Wikipedia is also this massive computational resource for many different platforms in terms of the way that they’re developing machine learning, the way that they’re training their A.I assistants, in the way that they treat natural language processing. And so we view ourselves as a resource that should be supported by industry as a whole—not just because we create a transactional value to them, because Amazon or Siri or whatever can answer our question, but because we’ve actually created a tremendous resource just in terms of data modeling and support that these companies can go out and train and do advanced computational science around. And there is nobody who is contributing back because of the value we’ve created in that space.
The reason that we think that should engender long-term support is because we are essentially the commons as a resource for the entire internet that entire business models have been created around. And if you don’t actually support that commons, it’s not going to exist at some point, and that’s going to be really problematic for a business model that depends on its existence, particularly as companies are pushing into new and different markets. It’s increasingly the case that they’re looking to sites like Wikipedia, which have content that’s available in those local languages as a baseline to assess the market maturity, and whether people are using the open web, and whether they’re creating content in those languages. So overall supporting Wikipedia so that we’re out there and accessible to more people and accessible to more users in more languages, and so that our content is diverse and reflective of the entire world and not just North America or the male experience, is ultimately a good thing, not just for us, and not just for our readers, but for the internet as a whole.
In terms of what that support looks like, I think that it’s not just about monetary support. We are very proud of the fact that 85 percent of our donations come from small-dollar donations from individual users. The average donation is $15, and the remaining 15 percent tends to be more traditional foundation donations. We don’t want to be entirely beholden to large corporations giving us money, but we do feel as though having some sort of sustainable model of support, whether it is engineering support, or in-kind support, or just thinking about what the product decisions actually mean in terms of their implications for how people can contribute to and access Wikipedia, that’s the conversation that we want to be having with these different platforms. At the end of the day, we create a tremendous amount of value in the world, and we want to make sure that value is being recognized and being supported and sustained, because it’s very easy to make a series of decisions that in aggregate could really damage that value, and I don’t think anybody intentionally wants to hurt Wikipedia.
Glaser: No, Wikipedia has a tremendous amount of value for so many different parties. Can you tell us about efforts to expand Wikipedia in other languages? Also, what’s the second-biggest language after English? And is Wikipedia available in China?
So there are a whole host of Wikipedias that are smaller languages. They tend to be secondary languages within countries, or indigenous languages, or noncolonial languages, and we at the Wikimedia Foundation actually place a great deal of emphasis on supporting communities that are doing work in these languages. We have grants that are available to community members who are organizing and doing events around outreach and growth of smaller-language projects. We call them our emerging community projects. And the whole idea is that we don’t make a decision based on winners and losers on language. The fact that Spanish and German and English and French are enormous doesn’t mean that that is sufficient to cover the entire world. We want people to be able to access content in Zulu and in whatever the indigenous languages and identities are that they’re seeking information.
We know that’s actually critical to the way that cultures continue to live, the way that identities continue to live. There’s a whole example of Welsh Wikipedia where the national government of Wales has a Wikipedian in residence who is just there to make sure that Welsh continues to be a living thriving Wikipedia language in large part because they know that having Wikipedia exist in Welsh is actually a marker as to whether other internet services will index Welsh as a living digital language. And of course we can all understand that if your language goes away or doesn’t exist in a digitized form, your identity starts to go away too. So we’re really supportive of this. We think this is part of our cultural diversity in recognizing that the sum of all knowledge requires all sorts of different understandings of knowledge, all sorts of linguistic bases for knowledge.
Glaser: Is Wikipedia available in China?
Wikipedia is not currently available in China or in Turkey. But of course Wikipedia does continue to be built by Chinese speakers. We have Chinese speakers who are outside of China, like Taiwan and Hong Kong of course, but then other Chinese speakers including folks from mainland China who edit Wikipedia from outside of the country as well as from inside the country using circumvention technologies. So Wikipedia in Chinese is not as large, and we would like to see the relative number of Chinese speakers, but we remain optimistic that it will continue to be built and will be there for when and if China ever decides to unblock us. We would love that.