Wait a Second: Playing Hanabi without Giving Hints

# Wait a Second: Playing Hanabi without Giving Hints

## Markus Eger

### University of Costa Rica www.slothlab.info

## Daniel Gruss

### Graz University of Technology

---

# A Short Introduction

---

# Who are we?

---

# Side-Channels

<blockquote class="twitter-tweet">Most definitions of "side channel" that I've seen so far are not very useful. Here's a proper definition of the word "side channel" that no one challenged so far: Side channel = "Obtaining meta-data and deriving data from it."— Daniel Gruss (@lavados) <a href="https://twitter.com/lavados/status/1156982866414379008?ref_src=twsrc%5Etfw">August 1, 2019</a></blockquote>

</center>

---

# How did this paper come about?

* I have been working on Hanabi and other games for a while

* Daniel is working on side-channel attacks, including those involving timing

* We were playing a game of Hanabi, when Daniel remarked how the timing of actions could (and does!) convey information

* So we decided to figure out how well this could work

---

# What this paper isn't

* A new breakthrough in AI research

* ... or security research

* An attack on the integrity of the CIG/CoG (or any other) Hanabi competition

---

# What this paper is

* A collaboration between game AI and security research

* An interesting take on Hanabi

* Surprisingly effective

* (Hopefully) thought-provoking

---

# Hanabi

---

# Hanabi

* Cooperative card game

* Cards in 5 colors, with ranks 1 to 5

* Players can't see their own cards

* Communication is restricted
]

---

# Hanabi

* The goal of the game is to play the cards in each color in ascending order

* Players have to decide which card to play without knowing which cards they have

* A player may give another player a *hint*:
    - Tell them about all cards of one color
    - Tell them about all cards of one rank
    
* The number of hints is limited

* The game ends when the players run out of cards in the deck

---

# Challenges

* Players have to determine which hints to give, and when

* They also have to determine how to interpret other player's hints

* Human players are really good at this

* Google Brain and Deep Mind have called Hanabi "a new frontier for AI Research"

* There is also a Hanabi competition at CIG/CoG

---

# A Timing Side-Channel In Hanabi

---

# Another Perspective

* Let's assume a two-player game for now

* Hints in Hanabi are used to encode a message to the other player

* That message is "this card can/should be played" or "this card can/should be discarded"

* For 5 cards in the other player's hand that gives us (at least) 10 possible messages

* What if we encode these same messages in another way?

---

# Timing

* The CIG/CoG Hanabi competition used to give competitors 40ms to complete their turn

* We split these 40ms into 10 time slices of 4ms each, each corresponding to one message

* To transmit a message, the agent intentionally waits a certain amount of time, until their action is performed in the time slice corresponding to that message

* To receive a message, the agent measures how long the other player needed for their action, and which message corresponds to that time slice

* What action does an agent perform? The one they were "told" to perform

---

# An Example

<img src="/assets/img/hintstatus.jpg" width="100%"/>
.smallc[(Image Source: rulesofplay.co.uk)]

.mmediumt[We want to tell the other player to play their 3rd card, which corresponds to the time slice of 8-12ms, so we sleep for 8ms and then perform our action.

The other player than knows that they should play the 3rd card, they determine which card we should play/discard and then play their third card after the corresponding time.
]

---

# Which Message to Transmit?

* The first message corresponds to the time slice of 0-4ms

* The agent therefore has very little time to actually make a decision

* We use a simple priority list: Prefer playable cards in ascending order over discardable cards in descending order

* "Discardable" cards are those that were already played (preferred) or those of which duplicates still exist in the game

---

# Duplicate Plays

* In some cases it may happen that the message an agent transmits by playing a card tells the other player to play the exact same card

* We can avoid such cases by avoiding playing two cards consecutively

* If an agent is told to play a card by the other agent playing a card, they will give a random hint instead (if there are hints available)

* Since we do not use the hints for anything, this is basically a "skip" operation

---

# More Than Two Players

* When there are more than two players, there are multiple possible receivers for a message

* For our agent, we fixed the communication to always move in turn order, i.e. each player sends a message to the next player in turn order

* Agents need to measure how long each individual player takes, and decode the message from that information

---

# Results

---

# Results

* We performed several experiments with our agents

* Main response was the score the agents achieved

* We compared different numbers of players, time given to the agent, and if the agent used "skip" actions or not

* Since system scheduling can interfere with the timing, we were also interested in the percentage of messages that were received correctly

---

# 2 Player Game

<img src="/assets/img/histogram.png" width="100%"/>
2 Players, 40ms/turn not using skips (left), and using skips (right), 10000 games each

Mean score without skips: 19.4 (stddev: 5.34) 
Mean score with skips: 23.2 (stddev: 2.1)

---

# Transmission Error Rates (2 players)

More time results in lower error rates, but the effect is more pronounced with less than 20ms (requiring an accuracy of more than `\(\pm 1\)`ms)

---

# Other Results

* For comparison, we let the agents communicate the messages explicitly, also resulting in 23.2 points on average (stddev: 2.1)

* The state of the art in 2 player Hanabi is 23.9 points (stddev: 2.8) by Foerster et al., using learned action encoding

* Our agents achieved a mean 23.3 (stddev 1.94) score with 3 players, 22.64 (stddev 1.95) with 4 players and 21.7 (stddev 1.9) with 5 players

* The main challenge with more players is the end game and planning how to play the last few cards optimally, which our agent does not do

---

# Conclusion

---

# Conclusion

* Timing can be used as a very powerful communication channel

* In Hanabi we can even completely ignore the built-in communication system/hints

* Our agent achieved almost state-of-the art performance

* 40ms are more than enough to encode a variety of messages

---

# Discussion

* We don't suggest to use our agent or method in the CoG Hanabi competition

* If you use Machine Learning to learn a communication model, be careful not to inadvertantly include timing information in the inputs

* Note that timing can (and is) also be used by humans in communication (including in Hanabi!)

* Could we utilize this to improve communication between an AI agent and a human player?

---

# Thank you

## www.github.com/yawgmoth/pyhanabi