Quantum Puzzles on Utility Scale Devices

13 min readAug 21, 2024

2018 was an exciting time in the field of quantum computing. New examples of quantum hardware were popping up all the time, and there were lots of claims being made about the impressive things that they’d soon be able to do.

One problem was, it was not very easy for non-specialists to assess these claims, or to compare how good one quantum computer was in comparison to another. For that reason I came up with something that I hoped would help: puzzles to help people get hands-on experience with a piece of quantum hardware. It was a game I called Quantum Awesomeness.

Now it’s 2024, and my description of 2018 might seem strangely familiar. There is still a steady stream of new quantum devices, and the public could probably still use some help to understand how well they’ve improved in the last six years. So I dusted off my old puzzle generating software, and ran it on the biggest and best of today’s devices.

It was a quick and preliminary set of jobs. Though I saw many ways I could improve the process, I just stuck with the old version. I also didn’t tinker with fancy settings to make it run in the most perfect way. This allows us to get the clearest comparison of the IBM Quantum devices of 2018 and 2024, at least at the specific task of making the puzzles that I’ll describe below.

To present these results, I made a talk. But for those of you who won’t hear me talk the talk, I have turned it into a blog. Here are the slides, with each followed by what I’d say about them.

This is the title slide. So hello and thanks for coming! This work was done back when I was at IBM, in the dim and distant times of last month. As such, for this image and all that follow: Reprint Courtesy of IBM Corporation ©.

I already told you the backstory in the introduction to this blog. But this slide nevertheless gives me the opportunity to give you links to my old blog post about this work, and the paper I wrote with the details of how the puzzles are made.

Before we start talking about qubits and entanglement, let’s look at the puzzles. As you can see from the examples above, they look like a grid of numbers.

The grid itself is one that will be very recognizable to quantum computing experts, but let’s leave that to the next slide and focus on the numbers. If you look closely, you should see that each of the numbers is the same as, or very similar to, one of its neighbours. By going through the whole puzzle you can see that they form a set of pairs. The aim of the game is simply to look at the numbers and figure out the pairs.

The game has multiple rounds with increasing difficulty. This difficulty comes from the numbers drifting from their ideal values, which makes the pairing more ambiguous. You can see this in the Round 4 example above. Though it is quite lightly perturbed, the four numbers on the right do take a moment to think about.

Now let’s look at how the game creates the puzzles, and how the numbers are generated with the help of a quantum computer.

First let’s talk about the grid. This is a representation of the quantum device itself, with the coloured circles representing its qubits. The connections of the grid represent which pairs of qubits can be manipulated in entangling operations. In case this is all new to you, a qubit is the quantum version of bit and is the building block of quantum hardware. The entangling operations are the building blocks of quantum software. So just by looking at this grid, the player gets a lot of information about the device.

To generate a puzzle we then make use of those entangling operations. First the game randomly chooses a set of non-overlapping pairs. Next it chooses a set of random angles, with one for each pair. Then for each pair it applies a set of quantum gates, which make use of the corresponding angle. These have the effect of creating a particular set of entangled states.

This is all very nice, of course. But to get the benefit of the entanglement here we need to look at it. So we will measure the qubits, which forces them to commit to being a nice simple 0 or 1. The entangled states we create have two important features when measuring the qubits:

1. The probability of returning the output 1 is the same for each qubit (and depends on the angle chosen for the pair);

2. Their outputs will always agree.

For now we’ll just use the first of these. We run the process many times to calculate the probability of a 1 on each qubit, turn these probabilities into percentages, and then use them as the numbers in our puzzle. So when we see two coloured circles with 20s in them, this means that an entangled pair was created for which it happens 20% of the time that the qubits both output a 1.

For Round 2 the process of creating a puzzle is very similar, but with some extra steps. For the simplest version of the game, which is the one we are considering here, these extra steps are just to give the quantum computer more chances to go wrong. Any imperfections will cause the numbers to drift from their intended values, and hence increase the difficulty.

To be specific, here’s how the game sets up Round 2:

1. Apply the gates that create the entangled pairs of Round 1;

2. Apply gates to undo the entangled pairs of Round 1;

3. Choose new random pairs and angles for Round 2;

4. Apply the gates that create the entangled pairs of Round 2.

The same logic holds for all subsequent rounds. They all consist of remaking and unmaking all previous rounds before finishing with the one that’s actually wanted.

In practice, the game actually does something slightly different to what is described above. Rather than steps 1 and 2 being completely separate, it partly combines them. Rather than go into the details of how this is done, I’ll just explain why. When steps 1 and 2 are done separately, four entangling gates must be done in succession. When partially combined, only two are required. Since these gates are one of the main sources of imperfections in a quantum computation, partially combining the two steps reduces the errors by a significant amount. This was an important reduction when I first ran these puzzles in 2018, since the quality of the gates was a lot lower. Though it’s not so necessary now, I didn’t want to make any big updates. So this is also the way I ran it on current devices too.

As I’ve said, I wanted to make these puzzles so that people can play them and get a sense of what a quantum device is like. Nevertheless, the temptation to calculate things and plot graphs is too strong, and I must yield!

So here’s what I calculate. The pairs consist of two numbers that should be the same. So how similar are they? I calculate the difference between the two numbers and average it for all the pairs. I call this the fuzziness of the numbers.

The pairs also have a randomly chosen angle, which should determine the numbers that they display. Conversely, we should be able to infer what the angle is by looking at the numbers and doing some maths. So how similar is this inferred angle to the actual value? I again calculate the differences and average them. Rather unimaginatively, I call this the difference.

Finally, I look at how solvable the puzzles are. I have a background in quantum error correction, where we love to solve a problem called minimum weight perfect matching. Whatever actual problem we need to solve, we will try to fit into a MWPM-shaped hole. So I used it to make an algorithm to solve the puzzles, and look at how many pairs it gets right. This is the matching success.

Now we know what we want to graph, let’s graph them. Here are some results from a simulations, where we get a normal computer to pretend to be a quantum computer. The puzzles here were those for a 5 qubit device, so that it was small enough to easily simulate. I even added in some imperfections, with each element of the process doing something incorrect with a probability q. By repeating the simulations for different values of q, we can see the effects of errors on the results. For each of these plots, results are shown for different numbers of rounds, with the number of rounds shown in the x axis of the plots.

For me, the most noticeable landmark occurs in the fuzziness plot. The fuzziness is low in Round 1, because there hasn’t yet been many errors to perturb the numbers. It then begins to increase for the first few rounds, as the errors build up and the numbers start to drift. But for very high rounds, there will be so much noise that each qubit becomes a coin flip. The numbers will therefore all come out around 50%, whether they are part of the same pair or not. The fuzziness therefore becomes small again, even though it is for a bad reason.

Between the low fuzziness of Round 1 and Round ∞, there is a peak. I take this as an informal marker of where things really start to go wrong. I call it, The Peak of Doom!

As we see from the simulated results, the strength of the errors determines where we find the peak of doom. For q<5%, the peak doesn’t even appear on our plots. For q=10% it is at Round 4. For q=20% it is as early as Round 2!

Now let’s see what happens for a real device. These are results from a device initially known as ibm_qx5, a 16 qubit device from 2017 accessible over the cloud. This was one of the first devices to be named after a place, which has now become the tradition for IBM Quantum devices. So despite the fact that these 16 qubits lived in upstate New York, it was renamed ibm_rueschlikon after the research lab in Switzerland. Since that’s where I started working soon after taking these results, that’s the name I’ll use.

You’ll notice there are two sets of results, in blue and orange. Those in blue are for the raw values of the numbers, where we directly use the percentages for getting the outcome 1 on each qubit as described above. Those in orange are for when I attempt to clean them up a bit, using the fact that the qubits of the same pair should be perfectly correlated. This process is explained in the original paper, so check that out if you want to know more.

In any case, both sets of results tell the same story. The device makes a heroic attempt, but nevertheless doesn’t make it much past Round 2 before the puzzles get pretty unsolvable. Even if you just randomly guessed pairs, you’d get them right around 40% of the time. We see our algorithm doing the same from around Round 3 onward, so the numbers obviously aren’t providing many clues about what the qubits should be doing.

But this project is all about looking at the actual puzzles, not just graphs. So let’s see some examples.

Here’s Round 2, both with and without the cleanup. If you look at the correct solution, you can maybe convince yourself that it is indeed correct. But it’s definitely not as clear as it should be for such an early puzzle.

Now let’s see some results from 2024. There are some similarities between the way these puzzles are set up and the way the layer fidelity is measured on IBM Quantum devices. This is a measure of how noisy the systems are, so we might expect that the device with the best layer fidelity should perform the best. At the time of running, this was ibm_torino.

The progress made since 2018 can be easily seen in the results, but also in the number of qubits: Rather than a mere 16 qubits, this device has 133! The peak of doom has been pushed out to almost Round 20. Since each round requires two layers of entangling gates to be applied, that’s an almost 10x improvement since 2018 in both the number of qubits and the depth of entangling gates for which we can get good results.

I also tried it on a 127 qubit device: ibm_kyiv. Despite having a higher layer fidelity, the results were even better. Here we see the peak of doom pushed out as far as Round 50. That’s a depth of 100 entangling gates across a 127 qubit device. It’s very encouraging to see such a large quantum computation getting such nice results!

We also see this in the success probability of our algorithm. Again with this device, random guesses would be correct around 40% of the time. Even for the raw values, our algorithm is noticeably more successful than this even up to Round 50.

While I was looking through the results from ibm_torino and ibm_kyiv, a brand new device came online: ibm_fez. Like ibm_torino, this is an example of IBM Quantum’s new Heron architecture. But this time it raises the bar to 156 qubits. So how well does it do playing Quantum Awesomeness?

Well, you probably know already because the results are in the slide above. It does basically just as well as ibm_kyiv. Perhaps even a smidge better. But with even more qubits!

Again, we are getting carried away with graphs. Let’s look at some puzzles instead. These are from ibm_kyiv, because the slightly small size makes them a little more manageable on the page.

One thing to note before we look at a puzzle: for some devices it isn’t possible to pair up all the qubits. In those cases, the game will pair some qubits, and leave others unpaired. The aim of the game is then not just to figure out the pairing, but also figure out which qubits are left out. This is true for the heavy-hex layout of current IBM Quantum devices, so keep this in mind when solving these puzzles.

Now let’s begin with an example from Round 1.

Here everything is nice, clear and obvious. In the top-left corner, for example, the pairs labelled AK and AR are clearly correct, and the qubits with the values 11 and 12 are clearly unpaired.

Now let’s skip to Round 10.

Still pretty nice and clear. The qubits of the top-right seem to have accumulated a lot of errors and are already hinting at the problems to come. But otherwise, all is well.

Now let’s see what Round 50 looks like.

The numbers here are all approaching 50%, which shows that errors are definitely introducing a lot of randomness. Nevertheless, there’s enough signal left to pick out some likely pairs. The ones I’ve circled are those that agree by at most 1%, and which disagree with their neighbours by more than this. This already gives us a lot to get started with.

These are just a few rounds from a single game. If you want to see other game and other samples, you can find the data here.

So now we come to the conclusion slide. I ran something back in 2018 on all the quantum devices I could get my hands on. I found that those old quantum computers back then weren’t really up to the task. This was despite the fact that it wasn’t even a very hard task: The device spends most of its time applying gates that should have no effect, given the initial state of the system.

Then, in 2024, I thought I’d quickly run the same process again on modern devices, finding much better results. With over 100 qubits and a depth of 100 entangling gates, we finally have Quantum Awesomeness!

IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “IBMCopyright and trademark information” at www.ibm.com/legal/copytrade.shtml.

JRW was sponsored in part by the Army Research Office under Grant Number W911NF-21–1- 0002. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Office or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.

Quantum Puzzles on Utility Scale Devices

2018 was an exciting time in the field of quantum computing. New examples of quantum hardware were popping up all the time, and there were lots of claims being made about the impressive things that they’d soon be able to do.

Written by Dr James Wootton