The head of global security for Pfizer Inc. told Marketplace last month that “Viagra is probably the number one spammed product out there.” This statement reminded the Explainer of one of the column’s “Unanswered Questions” from 2010: Do they have special spam filters at Pfizer?
Not exactly. All spam filters are customized to a certain extent based on usage patterns; beyond that, Pfizer seems to employ the same old bag of tricks. The pharmaceutical company told the Explainer that 80 percent to 90 percent of email sent to employees is filtered out as junk, which is about par for the course. Presumably, the legitimate messages Pfizerites receive include more spammy words like Viagra, sale, and penis. To make sure these get through, the company says it “uses a mix of pattern matching, content matching, and sender authenticity measures.” That last technique is probably the first line of defense: white lists to approve emails sent by approved senders (like those ending in @pfizer.com), no matter the content, and blacklists to jettison emails from known spam offenders.
There are two main types of spam filter: Source-based filters, like white lists and blacklists, and content-based filters, which examine the message itself, identifying suspicious phrases (cheap rolex), patterns (mortgage being used near pre-approved), or attachments (e.g., those ending with .exe). Content-based filters depend on specialized dictionaries and also learn from user preferences (when users mark a message as “spam” or “not spam”).
An unspoken rule for spam-filter designers is that “people should be able to talk about Viagra if they want to.” For this reason, major email providers, such as Gmail, use tons of different signals—not just keywords—to determine whether a message is spam. So just because your biochemistry professor’s latest email uses the word Viagra doesn’t mean the message is heading straight to your junk folder.
Even so, Pfizer faces a big challenge getting its outbound email into its recipients’ inboxes. Because so many filters learn user preferences, marking a scam Viagra email as spam increases the chance that an email from Pfizer about the drug will also end up labeled as junk. There’s not a lot Pfizer or any other company can do to avoid this, aside from making its own emails as different from spam as possible. Techniques include reminding the recipients that the message is an advertisement they once requested (in some fashion) and adding a prominent “unsubscribe” link.
Some spammers, bordering on forgers, now try to replicate the look and wording of legitimate pharmaceutical advertisements, say by including a fake “unsubscribe” link. And although not even the most agilely worded spam can get through if the sender’s email or IP address lands on one of the many popular blacklists, especially ruthless spammers hijack “clean” computers to send junk mail by proxy. (In fact, this reportedly happened to a slew of Pfizer computers in 2007.) On the other hand, the cat-and-mouse game spammers play can sometimes benefit legitimate senders. To evade filters, spammers often use creative corruptions of blacklisted words, such as v|agra or vi@gra. This, in turn, makes nonspam emails more likely to find their recipients.
Got a question about today’s news? Ask the Explainer.
Explainer thanks Jennifer Kokell of Pfizer Inc.; Pradeep Kyasanur of Google; and Jonathan Zdziarski, author of Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification.