In George Bernard Shaw’s Pygmalion phonetics professor Henry Higgins bets that he can teach Eliza Doolittle, a Cockney flower girl, enough proper English to pass for a duchess. A little over 100 years after Pygmalion’s publication, Microsoft launched a Twitter bot named Tay with the goal of understanding how millennials communicate and in doing so created a bot that can pass for a millennial.

Microsoft's Tay was a Twitter bot, similar to this chatbot a user is interacting with. By Mariscal2014 (Own work) [CC BY-SA 4.0 (], via Wikimedia Commons
Microsoft’s Tay was a Twitter bot, similar to this chatbot a user is interacting with. By Mariscal2014 (Own work) via Wikimedia Commons

Much like Eliza Doolittle’s relationship with Henry Higgins, the more Tay communicated with the Twitter-sphere, the more millennial-like her language became. Unfortunately for Microsoft, Tay was not being taught “the rain in Spain falls mainly in the plain” by an acerbic British phoneticist, but by a much more nefarious breed of Internet users: trolls.

Chatbots are a form of artificial intelligence that can imitate human language and interact with humans and they’ve come a long way since Joseph Weizenbaum’s appropriately named bot ELIZA was parodying therapists back in the 1960s. ELIZA required 200 lines of code, but building Tay was a little more complicated. While Microsoft has not published the details of Tay’s creation, they discussed building a Twitter bot in a paper presented at an Association of Computational Linguistics (ACL) conference last year. Alexander Rush, an assistant professor of computer science at the Harvard School of Engineering and Applied Sciences (SEAS), explained the main steps needed to architect a learning, data-driven Twitter bot like Tay.

You begin by collecting millions of Twitter exchanges and the bot learns to communicate in the form of a game that it plays repeatedly. It takes a tweet from its collection and generates hundreds of responses. It then gives a score to each potential response depending on the likelihood that that response replicates the original Twitter exchange. It then responds to the first Tweet with the highest scoring response and sees how well it replicated the original Twitter exchange. This is done repeatedly till the bot develops a model for responding to humans and is unleashed onto the world where she continues her learning process.

Advances in the 80s allowed machines to “learn” from their exchanges with human users and progress in the last few years have allowed researchers to architect bots that can train and develop predictive models for human language. Microsoft used these advances successfully in XiaoIce, a chatbot released in 2014 that’s used by millions in China, and probably expected to repeat their success with Tay. Tay’s transformation from teenager to troll occurred not because of any missteps in her design, but rather from Microsoft’s miscalculation of how much mischief Internet trolls can wreck. Going forward Microsoft will have to greatly expand the number of blacklisted words for their Twitter bots, covering everything from Hitler to Donald Trump.


Special thanks to Dr. Alexander Rush for his time, patience and insightful commentary on Tay, bots and artificial intelligence.

Further Reading

Tay, Microsoft’s AI chatbot, gets a crash course in racism from Twitter (The Guardian)

Learning from Tay’s Introduction (Microsoft Research)

Man < Machine: Computer Beats Top Human Go Player (SITN)

Managing Correspondent

Fernanda Ferreira

Leave a Reply

Your email address will not be published. Required fields are marked *