The Thread I’m About to Pull
Every society we know about going back to the dawn of recorded history has played competitive games. Competing against each other is an integral part of human culture. It doesn’t matter if you grew up thousands of years ago in Sumeria, in what is today Kyrgyzstan in 1341, or in modern times in Vancouver, a significant fraction everyone who has ever lived has played games.
Relatively recently, games have taken on a new role as testbeds for software agents. A software agent is an artificial intelligence software program designed to take actions (ie have agency), in response to the state of their environments. The wikipedia article defines agency like this:
Agents are goal-directed entities that are able to monitor their environment to select and perform efficient means-ends actions that are available in a given situation to achieve an intended goal.
Agency is of primary interest in artificial intelligence, as choosing to take certain actions instead of others to try to achieve a goal is an important part of what we mean when we use the word intelligence. I also snuck in a loaded word there — ‘choosing’. For people choosing how to act is related to our sense of self, and our feeling of being a thing that has free will and the capacity to choose among options. I’m using the same word for the selection of actions for a software agent. Whether it’s the right word, I don’t know!
Games are superb arenas for trying out new ways to create software agents. They provide us with a customizable ‘pocket universe’ that is simpler than the real world, but still has some of its key features. DeepMind has a long history of leading in this area using reinforcement learning as the basis for its agents, from its early work on Atari games to mastering games like Go.
Unlike agents in the real world (like you!), game-playing agents have:
One clear and simple goal — win the game!!! Sometimes our own goals are simple… like when we are also trying to win a game. But in general, it’s hard to know how to think of people’s behaviors in terms of goals. We obviously have them, and somehow they are grounded in our evolutionary history, but … it’s complicated.
An easy way to score how ‘good’ their agency is (do they beat other agents at the game? What’s their Elo score? For people, scoring how good their agency is (maybe through IQ tests or SAT scores) is a poorly understood thing, because intelligence is about how well we achieve goals, and see point 1 about not having clarity on what people’s goals really are in practice).
A simple (compared to reality!) world to understand and navigate
A small discrete set of actions that they can… choose… from
For example in chess, an agent’s goal is clear (win the game) and singular — it has no other considerations to deal with, just win and that’s it. The world a chess playing agent inhabits is just the chess board, the chess pieces, the rules of the game, and the sequence of moves having been made up to the current game state. It doesn’t need to know anything about the world other than this in order to figure out what move to make to maximize its chances of winning. It doesn’t have to know there are three spatial dimensions, or what cheese is, or what roses smell like, or anything else. The agent gets to make a choice of one of the allowed moves at any given board state (which is a very small number). And that’s it!
Alright. So games are cool, and also awesome testbeds for building and testing software agents, and provide a way we can directly measure a number (eg Elo score) related to a kind of goal-seeking intelligence. But what does any of this have to do with quantum computation?
First let me introduce some terminology. We are going to focus on games that are:
Fully Observable: The agent knows everything about the game state, including all the moves made in the game up to the current point
Zero-Sum: One player’s win is the other player’s loss; draws are possible
Deterministic: A specific move from a specific current state always leads to the same next state
Model-Based: We can ‘hallucinate’ a game from any state (we can construct a perfect digital model of the game and its rules)
Games like chess, checkers, and Go have all these properties. A game like poker doesn’t (we don’t see our opponents’ cards, so it’s only partially observable, not fully observable).
I’ve mentioned the concept of game state a couple times, let me try to be more specific about what this is. In games that satisfy the four points above, we can always write down information that perfectly specifies everything an agent needs to know to be able to make its next move. For example in chess, I can provide the current game state by giving you a list of moves that the players have made — check out this excellent description of how game state is written down for chess. We can do something similar for all of the kinds of games that fit the four criteria I listed above.
In these kinds of games, there are special game states called terminal states. A terminal state is defined by the property that for any game in a terminal state, the game has ended, and it’s time to figure out who won the game.
What I would like to do is to introduce a new kind of game, which has the four properties above, but adds just one more additional property:
5. Given a terminal state, evaluating who won the game requires computing the solution to a problem where quantum supremacy has been claimed.
Why do this?
If it’s hard to determine who won a completed game, it’s difficult to build software agents to play the game well. Any gap between classical and quantum performance on evaluating who won a game is massively amplified, as this function must be called many, many times to train an agent.
An agent with access to a quantum computer with a valid quantum supremacy claim can learn to play these kinds of games much faster than any agent trained only using conventional computers, because it’s a lot cheaper for the quantum computer to evaluate terminal states, and we’ve designed the game such that terminal state evaluation is exactly the place where the quantum supremacy claim is.
I like this formulation as it converts an abstract difficult to understand thing (who solves a specific exotic seemingly useless quantum computing problem, like random circuit sampling or simulating magnets, better) into a very concrete easy to understand thing (who wins a game).
By the way, these types of games should also be very difficult for people (or more generally, any agent that doesn’t use quantum computing to think). If a quantum computer is exhibiting quantum supremacy for computing the winner, then presumably no person can learn to play the game as well as an agent trained using a quantum computer either.
This is pretty intense. Agents trained using quantum computers should be able to learn to play these games better than is possible even in principle without access to them. An agent with access to a quantum computer should always beat any classical agent at the game.
Because of the relationship of goal-seeking with intelligence, if we can show certain categories of goals can only be achieved if an agent has access to a quantum computer, then we can define a new sort of intelligence that is categorically different from, and superior to, any intelligence that doesn’t have access to a quantum computer.
If I can show you one example where an agent with access to a quantum computer will always beat one that doesn’t, who knows how many other things like this there are. It’s possible that most things we might be in competition with agents of this type for are like this. Something to think about …
That’s the connection between quantum supremacy and game-playing agents I’m going to try to unravel here, and the core idea behind the kinds of games I am going to build agents for.