By Paul Flamburis
It’s that time of year again: Babel Street Hackathon 2022 has come and gone, and it’s safe to say it was a resounding success! Now, for the first time, we are sharing the story of each team’s project. First up is a Rosette®-integrated AI agent trained to play text adventure games. This project comes from Team 7, also known as TAG Team, a mix of Babel Street employees and co-ops.
A sort of precursor to the modern video game, a text adventure game is one in which the player navigates a world and completes goals using only a text-based user interface. If you’d like an example, you can play The Hitchhiker’s Guide to The Galaxy, a famous and famously difficult text adventure game from 1984.
While AI agents can be trained to play many computer games remarkably well, they are notoriously bad at playing text adventure games. An agent playing this type of game has to generate valid input (such as “take key”), make choices that result in game progress, and make multiple correct choices in the correct order to fulfill a larger goal. We human players take these things for granted, but how does an AI agent learn, for example, that it needs to put food inside an oven before the command “cook food with oven” becomes valid? Furthermore, if you change the oven to a campfire, you now need to put food over it to cook it. Ovens and campfires can serve the same function, but they are not synonymous. An AI must know the difference between them to know how to interact with them.
Teaching text games to AI might seem like all play and no work, but it’s actually a great way to train AI to use language more meaningfully; they require AI agents to understand not only what words mean, but how the things they represent can interact. In fact, TAG Team’s game of choice, TextWorld, was chosen for this very reason. TextWorld is an open-source text adventure game generator created by Microsoft Research Montreal in 2018 for the explicit purpose of “train[ing] and test[ing] AI agents in skills such as language understanding, affordance extraction, memory and planning, exploration and more” using reinforcement learning.
It works like this: the team uses TextWorld to generate a number of text games. They then train their AI agent by letting it play these games. During training, the agent’s performance doesn’t matter; reinforcement learning is a trial-and-error process by which an AI learns how to make decisions that maximize reward. The team then uses TextWorld to generate new games for the AI to play; its performance on these games indicates how “smart” it is after being trained. TAG Team set out to execute this process twice: once with this basic agent, and again with their own version of the AI agent.
What made TAG Team’s agent special was integration with Rosette® Base Linguistics (RBL), specifically the part-of-speech tagging feature. RBL uses statistical modeling to identify a word’s part of speech during the lemmatization process. With this upgrade under the hood, the team’s special agent identified all nouns, verbs, and adjectives in the game’s response text (results of the last command, what the room looks like, things like that). It then filtered out all other words before using this text as input for its next choice. The team’s hope was that this would reduce “noise” and allow the agent to look only at the important information, making it more efficient.
At the end of the 3-day project, TAG Team was unable to create an AI agent that performed better than the default agent. However, they emphasized the monumental difficulty of the task. Teaching AI to read and generate natural language well enough to play these games is an enormous challenge, and not one that can be solved by filtering out some words and calling it a day. But it was an interesting experiment that provides insight on the relationship between input verbosity and agent performance. Perhaps the ingredients to a smarter agent lie elsewhere.
The team had no shortage of ideas for things they would have implemented given more time. One idea that was discussed early in the project was to use a semantic similarity endpoint to help the agent figure out which items could potentially interact with other items. For example, a key would have a higher semantic similarity with a lock than, say, a coat hanger. In a room with a locked door, a key, and a coat hanger, the agent could use semantic similarity to determine that it should probably try unlocking the door with the key. On the other hand, this would require a great deal of thinking about potential conflict with the goals of the game. What if, in the same scenario, the goal is to hang the coat hanger on the doorknob? One would have to figure out how to weigh the semantic similarity of two objects against the likelihood of an object contributing to the overall goal.
Teaching AIs to play text adventure games is a serious undertaking, but if TAG Team’s lofty goals and enthusiasm in the face of impossible odds are any indication, it’s also one that won’t be abandoned anytime soon. Zork masters beware: someday you may find yourself dethroned by your very own computer.