Debunking algorithmic qubits

March 1, 2024

Executive Summary: Quantinuum’s H-Series computers have the highest performance in the industry, verified by multiple widely adopted benchmarks including quantum volume.  We demonstrate that an alternative benchmark called algorithmic qubits is deeply flawed, hiding computer performance behind a plurality voting trick and gate compilations that are not widely useful.

Recently a new benchmark called algorithmic qubits (AQ) has started to be confused with quantum volume measurements. Quantum volume (QV) was specifically designed to be hard to “game,” however the algorithmic qubits test turns out to be very susceptible to tricks that can make a quantum computer look much better than it actually is. While it is not clear what can be done to fix the algorithmic qubits test, it is already clear that it is much easier to pass than QV and is a poor substitute for measuring performance. It is also important to note that algorithmic qubits are not the same as logical qubits, which are necessary for full fault-tolerant quantum computing.

Fig. 1: Simulations of the algorithmic qubits (AQ) test with only two-qubit gate errors for two hypothetical machines.  The machines are identical except one has much higher two qubit gate fidelity. The test was run with three different options: (Base) Running the exact circuits as specified by the algorithmic qubits Github repository, (Gate compilation) Running circuits with custom Pytket compiler passes to reduce two-qubit gate counts, and (Gate compilation + plurality voting) Running the compiled circuits and also applying plurality voting error mitigation with voting over 25 random variants each with 100 shots. Note that the quantum volume (QV) of the machines most closely tracks to the “base” case without compilation and plurality voting, but even that base case of AQ can overestimate the QV of the machine.  

To make this point clear, we simulated what algorithmic qubits data would look like for two machines, one clearly much higher performing than the other. We applied two tricks that are typically used when sharing algorithmic qubits results: gate compilation and error mitigation with plurality voting. From the data above, you can see how these tricks are misleading without further information. For example, if you compare data from the higher fidelity machine without any compilation or plurality voting (bottom left) to data from the inferior machine with both tricks (top right) you may incorrectly believe the inferior machine is performing better. Unfortunately, this inaccurate and misleading comparison has been made in the past.  It is important to note that algorithmic qubits uses a subset of algorithms from a QED-C paper that introduced a suite of application oriented tests and created a repository to test available quantum computers.  Importantly, that work explicitly forbids the compilation and error mitigation techniques that are causing the issue here.

As a demonstration of the perils of AQ as a benchmark, we look at data obtained on both Quantinuum’s H2-1 system as well as publicly available data from IonQ’s Forte system.

Fig. 2: Algorithmic qubit data with gate compilation but without plurality voting error mitigation.  Data from smaller qubit and gate counts was omitted from the Quantinuum data as those points do not tend to influence the AQ score.  H2-1 has a measured quantum volume of 216.  Based on this publicly available data from Forte, combined with the AQ simulation data above, we estimate the Forte quantum volume is around 25, although spread in qubit fidelities and details of circuit compilation could skew this estimate.

We reproduce data without any error mitigation from IonQ’s publicly released data in association with a preprint posted to the arXiv, and compare it to data taken on our H2-1 device. Without error mitigation, IonQ Forte achieves an AQ score of 9, whereas Quantinuum H2-1 achieves AQ of 26. Here you can clearly see improved circuit fidelities on the H2-1 device, as one would expect from the higher reported 2Q gate fidelities (average 99.816(5)% for Quantinuum’s H2-1 vs 99.35% for IonQ’s Forte).  However, after you apply error mitigation, in this case plurality voting, to both sets of data the picture changes substantially, hiding each underlying computer’s true capabilities.

Fig. 3: Algorithmic qubit data with gate compilation and plurality voting error mitigation. For the H2-1 data plurality voting is done over 25 variants each with 20 shots for every test and qubit number. For Forte it is not clear to us exactly what plurality voting strategy was employed.

Here the H2-1 algorithmic performance still exceeds Forte (from the publicly released data), but the perceived gap has been reduced by error mitigation.  

“Error mitigation, including plurality voting, may be a useful tool for some near-term quantum computing but it doesn’t work for every problem and it’s unlikely to be scalable to larger systems. In order to achieve the lofty goals of quantum computing we’ll need serious device performance upgrades. If we allow error mitigation in benchmarking it will conflate the error mitigation with the underlying device performance. This will make it hard for users to appreciate actual device improvements that translate to all applications and larger problems,” explained Dr. Charlie Baldwin, a leader in Quantinuum’s benchmarking efforts.

There are other issues with the algorithmic qubits test. The circuits used in the test can be reduced to very easy-to-run circuits with basic quantum circuit compilation that are freely available in packages like pytket. For example, the largest phase estimation and amplitude estimation tests required to pass AQ=32 are specified with 992 and 868 entangling gates respectively but applying pytket optimization reduces the circuits to 141 and 72 entangling gates. This is only possible due to choices in constructing the benchmarks and will not be universally available when using the algorithms in applications. Since AQ reports the precompiled gate counts this also may lead users to expect a machine to be able to run many more entangling gates than what is actually possible on the benchmarked hardware. 

What makes a good quantum benchmark? Quantum benchmarking is extremely useful in charting the hardware progress and providing roadmaps for future development. However, quantum benchmarking is an evolving field that is still an open area of research. At Quantinuum we believe in testing the limits of our machine with a variety of different benchmarks to learn as much as possible about the errors present in our system and how they affect different circuits. We are open to working with the larger community on refining benchmarks and creating new ones as the field evolves.

To learn more about the Algorithmic Qubits benchmark and the issues with it, please watch this video where Dr Charlie Baldwin walks us through the details, starting at 32:40.

The Quantinuum data from the H2-1 machine are available here.

arrow
Kaniah Konkoly-Thege

Kaniah is Chief Legal Counsel and SVP of Government Relations for Quantinuum. In her previous role, she served as General Counsel, Honeywell Quantum Solutions. Prior to Honeywell, she was General Counsel, Honeywell Federal Manufacturing and Technologies, LLC, and Senior Attorney, U.S. Department of Energy. She was Lead Counsel before the Civilian Board of Contract Appeals, the Merit Systems Protection Board, and the Equal Employment Opportunity Commission. Kaniah holds a J.D. from American University, Washington College of Law and B.A., International Relations and Spanish from the College of William and Mary.

Jeff Miller

Jeff Miller is Chief Information Officer for Quantinuum. In his previous role, he served as CIO for Honeywell Quantum Solutions and led a cross-functional team responsible for Information Technology, Cybersecurity, and Physical Security. For Honeywell, Jeff has held numerous management and executive roles in Information Technology, Security, Integrated Supply Chain and Program Management. Jeff holds a B.S., Computer Science, University of Arizona. He is a veteran of the U.S. Navy, attaining the rank of Commander.

Matthew Bohne

Matthew Bohne is the Vice President & Chief Product Security Officer for Honeywell Corporation. He is a passionate cybersecurity leader and executive with a proven track record of building and leading cybersecurity organizations securing energy, industrial, buildings, nuclear, pharmaceutical, and consumer sectors. He is a sought-after expert with deep experience in DevSecOps, critical infrastructure, software engineering, secure SDLC, supply chain security, privacy, and risk management.

Todd Moore

Todd Moore is the Global Vice President of Data Encryption Products at Thales. He is responsible for setting the business line and go to market strategies for an industry leading cybersecurity business. He routinely helps enterprises build solutions for a wide range of complex data security problems and use cases. Todd holds several management and technical degrees from the University of Virginia, Rochester Institute of Technology, Cornell University and Ithaca College. He is active in his community, loves to travel and spends much of his free time supporting his family in pursuing their various passions.

John Davis

Retired U.S. Army Major General John Davis is the Vice President, Public Sector for Palo Alto Networks, where he is responsible for expanding cybersecurity initiatives and global policy for the international public sector and assisting governments around the world to prevent successful cyber breaches. Prior to joining Palo Alto Networks, John served as the Senior Military Advisor for Cyber to the Under Secretary of Defense for Policy and served as the Acting Deputy Assistant Secretary of Defense for Cyber Policy.  Prior to this assignment, he served in multiple leadership positions in special operations, cyber, and information operations.