## How to benchmark a quantum computer

Quantum machines offer the possibility of performing certain computations much faster than their classical counterparts. However, how to define and measure quantum speedup is a topic of debate. Rønnow *et al.* describe methods for fairly evaluating the difference in computational power between classical and quantum processors. They define various types of quantum speedup and consider quantum processors that are designed to solve a specific class of problems.

*Science*, this issue p. 420

## Abstract

The development of small-scale quantum devices raises the question of how to fairly assess and detect quantum speedup. Here, we show how to define and measure quantum speedup and how to avoid pitfalls that might mask or fake such a speedup. We illustrate our discussion with data from tests run on a D-Wave Two device with up to 503 qubits. By using random spin glass instances as a benchmark, we found no evidence of quantum speedup when the entire data set is considered and obtained inconclusive results when comparing subsets of instances on an instance-by-instance basis. Our results do not rule out the possibility of speedup for other classes of problems and illustrate the subtle nature of the quantum speedup question.

The interest in quantum computing originates in the potential of a quantum computer to solve certain computational problems much faster than is possible classically. Examples are the factoring of integers (*1*) and the simulation of quantum systems (*2*, *3*). In these examples, a quantum algorithm is exponentially faster than the best known classical algorithm. According to the extended Church-Turing thesis, all classical computers are equivalent up to polynomial factors (*4*). Similarly, all proposed models of quantum computation are polynomially equivalent, so that a finding of exponential quantum speedup will be model-independent. In other cases, in particular on small devices or when the quantum speedup is polynomial, defining and detecting quantum speedup becomes more subtle.

Denoting the time used by a specific classical device or algorithm to solve a problem of size *N* by *C*(*N*) and the time used on the quantum device by *Q*(*N*), we define quantum speedup as the asymptotic behavior of the ratio
(1)for *N* → ∞. Subtleties appear in the choice of classical algorithms and in defining *C*(*N*) and *Q*(*N*) if the runtime depends not just on the size *N* of a problem but also on the specific problem instance and in extrapolating to the asymptotic limit.

Depending on our knowledge of classical algorithms for a given problem, we may consider four different types of quantum speedup. The optimal scenario is one of a provable quantum speedup, where there exists a proof that no classical algorithm can outperform a given quantum algorithm. The best known example is Grover’s search algorithm (*5*), which, in the query complexity setting, exhibits a provable quadratic speedup over the best possible classical algorithm (*6*). A strong quantum speedup was defined in (*7*) by using the performance of the best classical algorithm for *C*(*N*), whether such an algorithm is known or not. Unfortunately, the performance of the best classical algorithm is unknown for many interesting problems. In the case of factoring, for example, a proof of a classical super-polynomial lower bound is not known. A less ambitious goal is therefore desirable, and thus one usually defines quantum speedup (without additional adjectives) by comparing it with the best available classical algorithm instead of the best possible classical algorithm.

A weaker scenario is one where a quantum algorithm is designed to make use of quantum effects, but it is not known whether these quantum effects provide an advantage over classical algorithms or where a device is a putative or candidate quantum information processor. To capture this scenario, which is of central interest to us in this work, we define limited quantum speedup as a speedup obtained when compared specifically with classical algorithms that “correspond” to the quantum algorithm in the sense that they implement the same algorithmic approach, but on classical hardware. A natural example is quantum annealing (*8*, *9*) or adiabatic quantum optimization (*10*) implemented on a candidate physical quantum information processor versus corresponding classical algorithms such as simulated annealing (SA) (*11*) (which performs annealing on a classical Monte Carlo simulation of the Ising spin glass and makes no use of quantum effects) or simulated quantum annealing (SQA) (*12*, *13*) (a classical algorithm mimicking quantum annealing in a path-integral quantum Monte Carlo simulation). In this comparison, a limited quantum speedup would be a demonstration that quantum effects improve the annealing algorithm.

To illustrate the subtleties in detecting quantum speedup, even after a classical reference algorithm is chosen, we compared the performance of an experimental 503-qubit D-Wave Two (DW2, D-Wave Systems, Incorporated, Burnaby, Canada) device to classical algorithms and analyzed the evidence for quantum speedup on the benchmark problem of random spin glass instances. Specifically, we considered the problem of finding the ground state of an Ising spin glass model described by a “problem Hamiltonian”
(2)with *N* binary variables . The local fields {*h _{i}*} and couplings {

*J*} are fixed and define a problem instance of the Ising model. The spins occupy the vertices of a graph with edge set . We will consider the distributions of the time to solution over many random spin glass problems instances with zero local fields on the Chimera graph realized by the DW2 device [see supplementary materials (

_{ij}*14*) for a definition of that graph and why we choose

*h*= 0]. This problem is NP-hard (

_{i}*15*), and all known classical algorithms scale super-polynomially not only for the hardest but also for typical instances. Although quantum mechanics is not expected to reduce the super-polynomial scaling to polynomial, a quantum algorithm might still scale better with problem size

*N*than any classical algorithm.

The D-Wave devices (*16*–*19*) are designed to be physical realizations of quantum annealing using superconducting flux qubits and programmable fields {*h _{i}*} and couplings {

*J*}. Quantum annealing is implemented by initializing the system in the ground state of a transverse magnetic field , where the σ’s denote the Pauli matrices;

_{ij}*H*is turned off adiabatically, while

_{X}*H*

_{Ising}is turned on simultaneously. A measurement of each qubit in the computational basis then determines the ground state of

*H*

_{Ising}with a probability affected by many factors, including thermal excitations and implementation errors (

*20*). Tests on D-Wave One (DW1) (

*21*) and DW2 devices (

*22*) have shown that for small problem sizes the device correlates well with the predictions of a quantum master equation. For larger problem sizes, a 108-qubit DW1 device correlated well with SQA (

*20*), which indicates that, despite decoherence and coupling to a thermal bath, the behavior of the device is consistent with it actually performing quantum annealing (

*23*,

*24*). However, for these benchmarks both SQA and the DW1 device are also well described by a semiclassical mean-field version of SQA (

*25*), which raises the question of whether quantum effects play an important role. The approach adopted here, of seeking evidence of a (limited) quantum speedup, directly addresses the crucial question of whether large-scale quantum effects create a potential for the devices to outperform classical algorithms. To test this possibility, we compared the performance of a DW2 device to two “corresponding” classical algorithms: SA and SQA.

Because quantum speedup concerns the asymptotic scaling of *S*(*N*), we considered the subtleties of estimating it from small *N* and inefficiencies at small *N* that can fake or mask a speedup. In the context of annealing, the optimal choice of the annealing time *t*_{a} turns out to be crucial for estimating asymptotic scaling. To illustrate this, we first considered the time to solution using SA and SQA run at different fixed *t*_{a} values, independent of *N*. Figure 1A shows the scaling of the median total annealing time [over 1000 different random instances on the D-Wave Chimera graph; see supplementary materials (*14*)] for SQA to find a solution at least once with probability *P* = 0.99. Corresponding times for SA are shown in fig. S10. We observed that at constant *t*_{a}, as long as *t*_{a} is long enough to find the ground state almost every time, the scaling of the total effort is at first relatively flat. The total effort then rose more rapidly, once one reached *N* values for which the chosen *t*_{a} is too short, and the success probabilities were thus low, requiring many repetitions. Extrapolations to *N* → ∞ need to consider the lower envelope of all curves, which corresponds to choosing an optimal annealing time for each *N*.

Figure 1B demonstrates that, when using fixed *t*_{a} values, no conclusion can be drawn from annealing (simulated or in a device) about the asymptotic scaling. The initial slow increase at constant *t*_{a} is misleading, and instead the optimal annealing time needs to be used for each *N*. To illustrate this, we show (Fig. 1B) the real “speedup” ratio of the scaling of SA and SQA (actually a slowdown) and a fake speedup resulting from a constant and excessively long *t*_{a} for SQA. Because SA outperforms SQA on our benchmark set, it is our algorithm of choice in the comparisons with the DW2 reported below.

A related issue is the scaling of hardware resources (computational gates and memory) with problem size, which must be identical for the devices we compare. A device whose hardware resources scale as *N* can achieve an intrinsic parallel speedup compared with a fixed size device. Such is the case for the DW2, which uses *N* (out of 512) qubits, couplers, and classical logical control gates to solve a spin glass instance with *N* spin variables in time *T*_{DW}(*N*). Considering quantum speedup for *N* → ∞, we need to compare a (hypothetical) larger DW2 device with the number of qubits and couplers growing as to a (hypothetical) classical device with gates or processing units. Because SA (and SQA) are perfectly parallelizable for the bipartite Chimera graphs realized by the DW2, we can relate the scaling of the time *T*_{C}(*N*) on such a device to the time *T*_{SA}(*N*) for SA on a fixed-size classical central processing unit (CPU) by *T*_{C}(*N*) ~ *T*_{SA}/*N* and obtain

We lastly addressed the question of how to measure time when the time to solution depends on the specific problem instance. When a device is used as a tool for solving computational problems, the question of interest is to determine which device is better for almost all possible problem instances. If instead the focus is on the underlying physics of a device, then it might suffice to find a subclass of instances where a speedup is exhibited. These two questions lead to different quantities of interest.

To illustrate the considerations, we now turn to our results. Figure 2 shows the scaling of the time to find the ground state for various quantiles, from the easiest instances (1%) to the hardest (99%), comparing the DW2 and SA. We chose the values of the couplings *J _{ij}* from 2

*r*discrete values {

*n*/

*r*}, with

*n*∈ ±{1, …,

*r*− 1,

*r*}, and call

*r*the range. Because we do not a priori know the hardness of a given problem instance, we have to assume the worst case and perform a sufficient number of repetitions

*R*to be able to solve even the hardest problem instances. Hence, the scaling for the highest quantiles is the most informative. Here, we considered only the pure annealing times, ignoring setup and readout times that scale subdominantly [see (

*14*) for wall-clock results].

We observed for both the DW2 and the SA, for sufficiently large *N*, that the total time to solution for each quantile *q* scaled as (with *c _{q} *> 0 a constant) as reported previously for SA and SQA (

*20*). The origin of the exponent is well understood for exact solvers as reflecting the treewidth of the Chimera graph (

*14*,

*26*). Whereas the SA code was run at a for each

*N*, the DW2 has a minimal

*t*

_{a}= 20 μs, which is longer than the optimal time for all problem sizes (

*14*). Therefore, the observed slope of the DW2 data can only be taken as a lower bound for the asymptotic scaling. With this in mind, we observed similar scaling for SA and the DW2 for .

How can we probe for a speedup in light of this similar scaling? With algorithms such as SA or quantum annealing, where the time to solution depends on the problem instance, it is impractical to experimentally find the hardest problem instance. If instead we target a fraction of *q*% of the instances, then we should consider the *q*th quantile in the scaling plots shown in Fig. 2. The appropriate speedup quantity is then the ratio of these quantiles (RofQ). Denoting a quantile *q* of a random variable *X* by [*X*]* _{q}*, we define this as
(4)Plotting this quantity for the DW2 versus SA in Fig. 3, A and B, we found no evidence for a limited quantum speedup in the interesting regime of large

*N*and large

*q*(almost all instances). That is, although for all quantiles and for both ranges the initial slope was positive, when

*N*and

*q*became large enough we observed a turnaround and eventually a negative slope. Although we observed a positive slope for quantiles smaller than the median, this is of limited interest because we have not been able to identify a priori which instances will be easy. Taking into account that because of the fixed suboptimal annealing times the speedup defined in Eq. 4 is an upper bound, we conclude that there is no evidence of a speedup over SA for this particular benchmark.

* measures the speedup while comparing different sets of instances for DW and SA, each determined by the respective quantile. Now we consider instead whether there is a speedup for a (potentially small) subset of the same problem instances. To this end, we study the scaling of the ratios of the time to solution for individual instances and display in Fig. 3, C and D, the scaling of various quantiles of the ratio (QofR)
(5)For **r* = 7, all the quantiles bend down for sufficiently large *N*, so that there is no evidence of a limited quantum speedup. There does seem to be an indication of such a speedup compared to SA in the low quantiles for *r* = 1, that is, for those instances whose speedup ratio was high. However, the instances contributing here are not run at , and more work is needed to establish that the potential *r* = 1 speedup result persists for those instances for which one can be sure that *t*_{a} is optimal.

Next we consider the distribution of solution times at a fixed problem size. This does not address the speedup question, because no scaling can be extracted, but illuminates instead the question of correlation between the performance of the DW2 and SA. To this end, we performed individual comparisons for each instance and show in Fig. 4, A and B, the time to solution for the same instances for the DW2 and SA. We observed a wide scatter [in agreement with the DW1 results of (*20*)] and found that, although the DW2 is sometimes up to 10 times faster in pure annealing time, there are many cases where it is ≥100 times slower.

It is not yet known whether a quantum annealer or even a perfectly coherent adiabatic quantum optimizer can exhibit (limited) quantum speedup at all, although there are promising indications from theory (*27*), simulation (*13*), and experiments on spin glass materials (*9*). Experimental tests are thus important. There are several candidate explanations for the absence of a clear quantum speedup in our tests. Perhaps quantum annealing simply does not provide any advantages over simulated (quantum) annealing or other classical algorithms for the problem class we have studied (*28*), or perhaps the noisy implementation in the DW2 cannot realize quantum speedup and is thus not better than classical devices. Alternatively, a speedup might be masked by calibration errors, improvements might arise from error correction (*29*), or other problem classes might exhibit a speedup. Future studies will probe these alternatives and aim to determine whether one can find a class of problem instances for which an unambiguous speedup over classical hardware can be observed.

Although we used specific processors and algorithms for illustration, the considerations about a reliable determination of quantum speedup presented here are general. For any speedup analysis, use of the same scaling of hardware resources for both quantum and classical devices is required to disentangle parallel and quantum speedup. And, for any quantum algorithm where the runtime must be determined experimentally, a careful extrapolation to large problem sizes is important to avoid mistaking inefficiencies at small problem sizes for signs of quantum speedup.

## Supplementary Materials

## References and Notes

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵See supplementary materials on
*Science*Online. The supplementary materials provide details about the classical simulation algorithms, the DW2 device, the procedure used to determine annealing times, and additional data on scaling with problem size. - ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
**Acknowledgments:**We thank N. Allen, M. Amin, E. Farhi, M. Mohseni, H. Neven, and C. McGeoch for useful discussions and comments and I. Zintchenko for providing the an_ss_ge_nf_bp simulated annealing code before publication of the code with (*30*). This project was supported by the Swiss National Science Foundation through the National Center of Competence in Research Quantum Science and Technology, the Army Research Office (ARO) Multidisciplinary University Research Initiative (MURI) grant no. W911NF11-1-0268, ARO grant no. W911NF-12-1-0523, the Lockheed Martin Corporation, and Microsoft Research. Simulations were performed on clusters at Microsoft Research and ETH Zurich an on supercomputers of the Swiss Center for Scientific Computing. We acknowledge hospitality of the Aspen Center for Physics, supported by NSF grant PHY-1066293.