## Reducing quantum overhead

A quantum computer is expected to outperform its classical counterpart in certain tasks. One such task is the factorization of large integers, the technology that underpins the security of bank cards and online privacy. Using a small-scale quantum computer comprising five trapped calcium ions, Monz *et al.* implement a scalable version of Shor's factorization algorithm. With the function of ions being recycled and the architecture scalable, the process is more efficient than previous implementations. The approach thus provides the potential for designing a powerful quantum computer, but with fewer resources.

*Science*, this issue p. 1068

## Abstract

Certain algorithms for quantum computers are able to outperform their classical counterparts. In 1994, Peter Shor came up with a quantum algorithm that calculates the prime factors of a large number vastly more efficiently than a classical computer. For general scalability of such algorithms, hardware, quantum error correction, and the algorithmic realization itself need to be extensible. Here we present the realization of a scalable Shor algorithm, as proposed by Kitaev. We factor the number 15 by effectively employing and controlling seven qubits and four “cache qubits” and by implementing generalized arithmetic operations, known as modular multipliers. This algorithm has been realized scalably within an ion-trap quantum computer and returns the correct factors with a confidence level exceeding 99%.

Shor’s algorithm for factoring integers (*1*) is one example in which a quantum computer (QC) outperforms the most efficient known classical algorithms. Experimentally, its implementation is highly demanding (*2*–*7*) because it requires both a sufficiently large quantum register and high-fidelity control. Such challenging requirements raise the question of whether optimizations and experimental shortcuts are possible. Optimizations, especially system-specific or architectural optimizations, are certainly possible, but for a demonstration of Shor’s algorithm in a scalable manner, special care must be taken to not oversimplify the implementation—for instance, by employing knowledge about the solution before the actual experimental application (*8*).

How does Shor’s algorithm work? First, we consider a classical factoring recipe, assuming that the number we want to factor is *N* = 15. We pick a random number (the base)—say, *a* = 7. We evaluate whether the greatest common divisor gcd(*a*, *N*) = 1; if not, a factor is already determined. This is the case for *a* = {3, 5, 6, 9, 10, 12}. Next, we calculate the modular exponentiation *a ^{x}* mod

*N*for

*x*= 0, 1, 2… and find its period

*r*: the first value of

*x*> 0 such that

*a*mod

^{x}*N*= 1. Given

*r*, finding the factors of

*N*requires calculating the greatest common divisors of

*a*

^{r}^{/2}± 1 and

*N*, which is efficiently possible with a classical approach—for instance, using Euclid’s algorithm. For our example (

*N*= 15,

*a*= 7), the modular exponentiation yields 1, 7, 4, 13, 1,…, which has a period of 4. The greatest common divisors of

*a*

^{r}^{/2}± 1 = 7

^{4/2}± 1 = {48, 50} and

*N*= 15 are {3, 5}, the nontrivial factors of

*N*. In this example, the cases

*a*= {4, 11, 14} have period

*r*= 2 and require a single multiplication step (

*a*

^{2}mod

*N*= 1), which is considered an “easy” case (

*8*). Note that the periodicity for a chosen

*a*cannot be predicted.

How can this recipe be implemented in a QC? A QC also has to calculate *a ^{x}* mod

*N*in a computational register for

*x*= 0, 1, 2… and then extract

*r*. Using the quantum Fourier transform (QFT) applied to the period register, the period of

*a*mod

^{x}*N*can be extracted from a number of measurements not increasing with the size of the number to be factored.

What are the requirements and challenges of implementing Shor’s algorithm? We first focus on the period register and subsequently address modular exponentiation in the computational register. Factoring *N*, an *n* = [log_{2}(*N*)]-bit number (with the quantity in brackets rounded up to next integer number), requires a minimum of *n* qubits in the computational register (to store the results of *a ^{x}* mod

*N*) and generally about 2

*n*qubits in the period register (

*9*,

*10*). Thus, even a seemingly simple example, such as factoring 15 (an

*n*= 4-bit number), requires 3

*n*= 12 qubits. These qubits then have to be manipulated with high-fidelity gate operations. Given the current state-of-the-art control over quantum systems (

*11*), such an approach would probably yield an unsatisfactory performance. However, a full quantum implementation of this part of the algorithm is not necessary. As noted by Kitaev (

*12*), if only the classical information of the QFT (such as the period

*r*) is of interest, 2

*n*qubits subject to a QFT can be replaced by a single qubit. Still, this approach requires qubit recycling (specifically, in-sequence single-qubit readout and state reinitialization) paired with feed-forward behavior to compensate for the reduced system size.

In the following, Kitaev’s QFT will be referred to as KQFT^{(}^{M}^{)}. It replaces a QFT acting on *M* qubits with a semiclassical QFT acting repeatedly on a single qubit. Similar applications of Kitaev’s approach to a semiclassical QFT in quantum algorithms have been investigated (*13*–*15*). For the implementation of Shor’s algorithm, Kitaev’s approach provides a reduction from the previous *n* computational qubits and 2*n* QFT qubits (in total, 3*n* qubits) to only *n* computational qubits and 1 KQFT^{(2}^{n}^{)} qubit (in total, *n* + 1 qubits).

The second key ingredient of Shor’s algorithm—and a notably more challenging aspect than the QFT—is modular exponentiation, which admits the following general simplifications.

1) Considering Kitaev’s approach (Fig. 1), the input state (in decimal representation) is subject to a conditional multiplication based on the most significant bit *k* of the period register. At most, there will be two results after this first step. It follows that, for the very first step, it is sufficient to implement an optimized operation that conditionally maps . Considering the importance of a high-fidelity multiplication (with its performance being fed forward to all subsequent qubits), this efficient simplification improves the overall performance of experimental realizations.

2) Subsequent multipliers can similarly be replaced with maps by considering only possible outputs of the previous multiplications. However, using such maps will become intractable, as the number of input and output states to be considered grows exponentially with the number of steps: After *n* steps, 2* ^{n}* >

*N*possible outcomes need to be considered, a numerical task as challenging as factoring

*N*by classical means. Thus, controlled full modular multipliers should be implemented. Figure 2 shows the experimentally obtained truth table for the modular multiplier 2 mod 15 [see also (

*16*) for modular multipliers with bases {7, 8, 11, 13}]. These quantum circuits can be efficiently derived from classical procedures by using a variety of standard techniques for reversible quantum arithmetic and local logic optimization (

*17*,

*18*).

3) The very last multiplier allows one more simplification: Considering that the results of the modular exponentiation are not required for Shor’s algorithm (as only the period encoded in the period register is of interest), the last multiplier only has to create the correct number of correlations between the period register and the computation register. Local operations after the conditional (entangling) operations may be discarded to facilitate the final multiplication without affecting the results of the implementation.

4) In rare cases, certain qubits are not subject to operations in the computation. Thus, these qubits can be removed from the algorithm entirely.

For large-scale quantum computation, optimization steps 1, 3, and 4 will only marginally affect the performance of the implementation. These steps represent merely a small subset of the entire computation, which mainly consists of the full modular multipliers. Thus, the realization of these modular multipliers is a core requirement for the implementations of a scalable Shor algorithm.

Furthermore, Kitaev’s approach requires in-sequence measurements, qubit recycling to reset the measured qubit, feed-forward behavior of gate settings on the basis of previous measurement results, and controlled quantum operations—tasks that have not been realized in a combined experiment to date.

We demonstrate these techniques in our realization of Shor’s algorithm in an ion-trap QC, with five ^{40}Ca^{+} ions in a linear Paul trap. The qubit is encoded in the ground state and the metastable state (where *m* denotes the Zeeman sublevel) [for more details, see (*16*, *19*)]. Unitary operations, illustrated in Fig. 1, are decomposed into primitive components, such as two-target controlled-NOT (C-NOT) and C-SWAP gates (or gates with global symmetries, such as the four-target C-NOT gate employed here), from which an adaptation of the gradient-ascent pulse engineering algorithm (*20*) can efficiently derive an equivalent sequence of laser pulses acting on only the relevant qubits. The problem with this approach is that the resulting sequence generally includes operations acting on all qubits. Implementing the optimized three-qubit operations for a five-ion string therefore requires decoupling the remaining qubits from the computation space. We spectroscopically decouple qubits by transferring any information from to and from to . Here, the subspace serves as a readily available “quantum cache” to store and retrieve quantum information for the purpose of facilitating quantum computations.

Finally, to complete the toolbox necessary for Kitaev’s approach to Shor’s algorithm, we also implement (i) single-qubit readout, by encoding all other qubits in the subspace and subsequent electron shelving (*21*) on the transition; (ii) feed-forward behavior, by storing counts detected during the single-qubit readout (*22*) in a classical register and subsequent conditional laser pulses; and (iii) state reinitialization, using optical pumping for the ion, and Raman cooling (*23*, *24*) for the motional state of the ion string. The pulse sequences and additional information on the implementation of the modular multipliers are available in (*16*).

The measurement results for base *a* = {2, 7, 8, 11, 13} with period *r* = {4, 4, 4, 2, 4} are shown in Fig. 3. To quantify the performance of the implementation, previous realizations focused mainly on the squared statistical overlap (SSO) (*25*), the classical equivalent to the Uhlmann fidelity (*10*). Although we achieved an SSO of {0.968(1), 0.964(1), 0.966(1), 0.901(1), 0.972(1)} for the case of *a* = {2, 7, 8, 11, 13}, we argue that this does not answer the question “What is the period?” Shor’s algorithm allows one to deduce the period with high probability from a single-shot measurement, as the output of the QFT (*x*) is, in the exact case, a ratio of integers, where the denominator gives the desired period. This period is extracted by using a continued fraction expansion applied to *x*/2* ^{k}*, a good approximation of the ideal case when

*k*, the number of qubits, is sufficiently large. In our realizations with bases

*a*= {2, 7, 8, 11, 13}, the probabilities (and their error estimates in parentheses) to obtain output states that allow the derivation of the correct period are {56(2), 51(2), 54(2), 47(2), 50(2)}%. Thus, to obtain a confidence level of >99% for the periodicity, the experiment has to run about eight times.

We have presented the realization of Kitaev’s vision of Shor’s algorithm based on scalable building blocks with three-digit resolution to factor *N* = 15, using bases {2, 7, 8, 11, 13}. To do this, we successfully employed a semiclassical QFT combined with single-qubit readout, feed-forward behavior, and qubit recycling. Compared with the traditional algorithm, our realization of Shor’s algorithm reduces the required number of qubits by nearly a factor of 3. Furthermore, the entire quantum register has been subject to the computation in a “black-box” fashion. Employing the equivalent of a quantum cache by spectroscopic decoupling facilitated the derivation of the necessary pulse sequences to achieve high-fidelity results. We envision that our scalable algorithm implementation will be combined with a scalable trap architecture (*26*) and quantum error correction to enable arbitrary long quantum computation.

## Supplementary Materials

## References and Notes

**Acknowledgments:**We acknowledge support from the Austrian Science Fund (FWF), through the SFB FoQus (FWF project no. F4002-N16); the European Commission (AQUTE), the NSF Interdisciplinary Quantum Information Science and Engineering (iQuISE) Integrative Graduate Education and Research Traineeship (IGERT); and the Institut für Quantenoptik und Quanteninformation. E.A.M. is a recipient of a DOC Fellowship of the Austrian Academy of Sciences. This research was funded by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), through Army Research Office grant W911NF-10-1-0284. All statements of fact, opinion, or conclusions contained herein are those of the authors and should not be construed as representing the official views or policies of IARPA, the ODNI, or the U.S. government. T.M., D.N., and P.S. developed the research, on the basis of theoretical ideas derived with R.R., S.X.W., and I.L.C; T.M., D.N., E.A.M., M.F.B., P.S., and S.X.W. performed the experiments; T.M. and D.N. analyzed the data; T.M., D.N., E.A.M., P.S., M.F.B., and R.B. contributed to the experiment; T.M., D.N., R.R., M.F.B., I.L.C., and R.B. wrote the manuscript; and all authors contributed to discussions about the results and the manuscript. We declare no competing financial interests.