+ - 0:00:00
Notes for current slide
Notes for next slide

Wait a Second: Playing Hanabi without Giving Hints

Markus Eger

University of Costa Rica
www.slothlab.info

Daniel Gruss

Graz University of Technology

1 / 27

A Short Introduction

2 / 27

Who are we?


Daniel Gruss
Security Researcher


Markus Eger
Game AI researcher

3 / 27

Side-Channels




4 / 27

How did this paper come about?

  • I have been working on Hanabi and other games for a while

  • Daniel is working on side-channel attacks, including those involving timing

  • We were playing a game of Hanabi, when Daniel remarked how the timing of actions could (and does!) convey information

  • So we decided to figure out how well this could work

5 / 27

What this paper isn't

  • A new breakthrough in AI research

  • ... or security research

  • An attack on the integrity of the CIG/CoG (or any other) Hanabi competition

6 / 27

What this paper is

  • A collaboration between game AI and security research

  • An interesting take on Hanabi

  • Surprisingly effective

  • (Hopefully) thought-provoking

7 / 27

Hanabi

8 / 27

Hanabi

  • Cooperative card game

  • Cards in 5 colors, with ranks 1 to 5

  • Players can't see their own cards

  • Communication is restricted

9 / 27

Hanabi

  • The goal of the game is to play the cards in each color in ascending order

  • Players have to decide which card to play without knowing which cards they have

  • A player may give another player a hint:

    • Tell them about all cards of one color
    • Tell them about all cards of one rank
  • The number of hints is limited

  • The game ends when the players run out of cards in the deck

10 / 27

Challenges

  • Players have to determine which hints to give, and when

  • They also have to determine how to interpret other player's hints

  • Human players are really good at this

  • Google Brain and Deep Mind have called Hanabi "a new frontier for AI Research"

  • There is also a Hanabi competition at CIG/CoG

11 / 27

A Timing Side-Channel In Hanabi

12 / 27

Another Perspective

  • Let's assume a two-player game for now

  • Hints in Hanabi are used to encode a message to the other player

  • That message is "this card can/should be played" or "this card can/should be discarded"

  • For 5 cards in the other player's hand that gives us (at least) 10 possible messages

  • What if we encode these same messages in another way?

13 / 27

Timing

  • The CIG/CoG Hanabi competition used to give competitors 40ms to complete their turn

  • We split these 40ms into 10 time slices of 4ms each, each corresponding to one message

  • To transmit a message, the agent intentionally waits a certain amount of time, until their action is performed in the time slice corresponding to that message

  • To receive a message, the agent measures how long the other player needed for their action, and which message corresponds to that time slice

  • What action does an agent perform? The one they were "told" to perform

14 / 27

An Example

(Image Source: rulesofplay.co.uk)

We want to tell the other player to play their 3rd card, which corresponds to the time slice of 8-12ms, so we sleep for 8ms and then perform our action.

The other player than knows that they should play the 3rd card, they determine which card we should play/discard and then play their third card after the corresponding time.

15 / 27

Which Message to Transmit?

  • The first message corresponds to the time slice of 0-4ms

  • The agent therefore has very little time to actually make a decision

  • We use a simple priority list: Prefer playable cards in ascending order over discardable cards in descending order

  • "Discardable" cards are those that were already played (preferred) or those of which duplicates still exist in the game

16 / 27

Duplicate Plays

  • In some cases it may happen that the message an agent transmits by playing a card tells the other player to play the exact same card

  • We can avoid such cases by avoiding playing two cards consecutively

  • If an agent is told to play a card by the other agent playing a card, they will give a random hint instead (if there are hints available)

  • Since we do not use the hints for anything, this is basically a "skip" operation

17 / 27

More Than Two Players

  • When there are more than two players, there are multiple possible receivers for a message

  • For our agent, we fixed the communication to always move in turn order, i.e. each player sends a message to the next player in turn order

  • Agents need to measure how long each individual player takes, and decode the message from that information

18 / 27

Results

19 / 27

Results

  • We performed several experiments with our agents

  • Main response was the score the agents achieved

  • We compared different numbers of players, time given to the agent, and if the agent used "skip" actions or not

  • Since system scheduling can interfere with the timing, we were also interested in the percentage of messages that were received correctly

20 / 27

2 Player Game

2 Players, 40ms/turn
not using skips (left), and using skips (right), 10000 games each

Mean score without skips: 19.4 (stddev: 5.34)
Mean score with skips: 23.2 (stddev: 2.1)

21 / 27

Transmission Error Rates (2 players)

More time results in lower error rates, but the effect is more pronounced with less than 20ms (requiring an accuracy of more than \(\pm 1\)ms)

22 / 27

Other Results

  • For comparison, we let the agents communicate the messages explicitly, also resulting in 23.2 points on average (stddev: 2.1)

  • The state of the art in 2 player Hanabi is 23.9 points (stddev: 2.8) by Foerster et al., using learned action encoding

  • Our agents achieved a mean 23.3 (stddev 1.94) score with 3 players, 22.64 (stddev 1.95) with 4 players and 21.7 (stddev 1.9) with 5 players

  • The main challenge with more players is the end game and planning how to play the last few cards optimally, which our agent does not do

23 / 27

Conclusion

24 / 27

Conclusion

  • Timing can be used as a very powerful communication channel

  • In Hanabi we can even completely ignore the built-in communication system/hints

  • Our agent achieved almost state-of-the art performance

  • 40ms are more than enough to encode a variety of messages

25 / 27

Discussion

  • We don't suggest to use our agent or method in the CoG Hanabi competition

  • If you use Machine Learning to learn a communication model, be careful not to inadvertantly include timing information in the inputs

  • Note that timing can (and is) also be used by humans in communication (including in Hanabi!)

  • Could we utilize this to improve communication between an AI agent and a human player?

26 / 27

Thank you

www.github.com/yawgmoth/pyhanabi

27 / 27

A Short Introduction

2 / 27
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow