By Paul Flamburis

Rebuses have captivated puzzle fanatics for centuries. Until recently, all of those fanatics have been human. At Babel Street Hackathon 2022, Team 🎱 changed the game by creating a program that solves emoji rebuses, and in doing so paved the way to creating software that can truly understand emoji.

Program to understand emoji — From left to right: Lance Nathan, Julien Cherry, Tulasi Holdridge, Pooja Shantaram Nangude

Even if you don’t know what a rebus is, you’ve almost certainly encountered one. It is a series of pictures and single letters that, taken together, represent words or phrases. For example, a rebus might consist of a picture of a bee and a picture of a leaf, and the solution would be “belief.” The first known rebus was inscribed on a 5000-year-old Sumerian cuneiform tablet, and their use as puzzles dates back at least as far as 16th century France. These days it’s not uncommon to find them anywhere from bottle caps to email forwards.

A particularly interesting new medium for rebuses is emoji, an increasingly common component of natural text. While rebuses have been around for millennia, emoji weren’t invented until the late 90s, and they still represent a new frontier for natural language processing. Since solving a rebus requires juggling multiple possible interpretations of each image, designing a program to solve emoji rebuses can be seen as a microcosm of a much larger problem: teaching software to meaningfully interpret emoji as they appear in natural text. Take the bee emoji, for example. How do we teach a program when to interpret this semantically (as the noun “bee”), and when to interpret it phonetically (the sound “be,” as in the “belief”)? Team 🎱 made it their mission to rise to this challenge.

The front end of the team’s program consisted of an emoji picker that allowed the user to create and submit a rebus. While processing the submitted emoji, the program would perform some basic parsing for emoji with modifier or ZWJ sequences. These are special sequences used to specify a specific variant of an emoji. In Unicode, which has been standardizing emoji since 2014, every emoji is represented by a code, such as U+1F44D (👍). This code can be followed by a modifier sequence to specify something like skin color. A special invisible character called a zero width joiner (ZWJ) can also be used to build a sequence that combines multiple emoji to represent a variant of an emoji. For example, 👩+ZWJ+🦰 would be displayed as 👩‍🦰. (Rosette Base Linguistics can actually “lemmatize” certain emoji by removing modifier sequences and certain emoji following a ZWJ.)

Once processed, keywords would be picked for the emoji using the emoji names from Unicode and emojipedia. Possible solutions based on these keywords would be generated and scored based on their semantic similarity to the keywords used to generate them. Another important factor of the score was the overall relevance of the solution, which was determined using a combination of the wordfreq Python library and a good old-fashioned list of things, like popular idioms and movie titles. It was critical to balance these factors against each other; if a solution’s overall relevance outweighs its similarity to the keywords, it might end up being a common phrase that has little to do with the emoji in question. This was the key for the program to understand the emoji themselves, as well as the intention behind them.

The end result excelled at solving rebuses that use multiple emoji to represent the components of a compound word, such as 🍯🌙 (honeymoon) or 🐝🍁 (belief). It even generally succeeded when a rebus used a combination of phonetic and semantic similarity to its solution, such as 👀🐢(sea shell).

However, the program struggled with unclear word boundaries, such as three-emoji rebuses with two-word solutions (🔗📥🏞️, Linkin Park). It also fared poorly with rebuses containing stopwords, such as 🌧️🐱🐶 (raining cats and dogs). Even so, it was a fantastic accomplishment, even more so considering Team🎱 had the lowest combined tenure at Babel Street of all teams.

The team also had no shortage of ideas for improvements they might have implemented with more time. A more complete database of possible solutions would be an obvious, although time-consuming, improvement. A particularly enticing improvement would be the use of word embeddings to identify words that are semantically similar to emojis. A word embedding is a representation of a word in vector space; the closer two word vectors are in that space, the more semantically similar they are. This is how the Rosette semantic similarity REST endpoint works. Emoji actually have their own word embeddings, and while it would take a bit of work to get these to play nicely with English word embeddings, their implementation in the team’s program could help avoid losing information in the jump from emoji to keyword.

Emoji may be relatively new on the language scene, but they aren’t going anywhere soon. As they become more common, it’s going to be more important for the NLP world to create software that can understand what emoji mean in context. When that happens, you might have the Babel Street Hackathon 2022 to thank.