Hardware PQC Benchmark

Live ML-DSA and ML-KEM throughput measurement for hardware readiness scoring

Enabled via -hardware-benchmark flag — local mode only

On this page

Overview

The standard quantum readiness hardware assessment scores CPUs based on static properties: architecture, instruction sets, core count, and estimated clock speed. These properties tell you whether a CPU is architecturally capable of running PQC algorithms, but they cannot tell you whether it is fast enough to sustain PQC workloads at production throughput.

The Hardware PQC Benchmark fills that gap by running real cryptographic operations on the host CPU during the scan. It measures actual key generation, signing, verification, encapsulation, and decapsulation throughput for the NIST-standardized ML-DSA and ML-KEM algorithm families using github.com/cloudflare/circl. Each operation is run for a fixed measurement window (default 100 ms), and the resulting ops/sec is compared against a minimum pass threshold derived from production deployment requirements.

Local mode only. The benchmark runs on the scanner host itself. It is only available during local mode scans (-mode local). It has no effect in remote scan mode.
Overhead. 15 algorithm/operation pairs × 100 ms each adds approximately 1.5 seconds to the quantum readiness assessment. This is why the feature is opt-in via a flag rather than enabled by default.

Enabling the benchmark

Add -hardware-benchmark to any local scan command:

# Local scan with hardware benchmarks enabled
./tychon-pqc-scanner -hardware-benchmark -output-dir /tmp/results

# Combined with full local scan
./tychon-pqc-scanner -fullscan -hardware-benchmark -output-dir /tmp/results

# Quantum readiness only (fastest)
./tychon-pqc-scanner -disable-port-scan -hardware-benchmark -output-dir /tmp/results

The -disable-quantum-readiness flag takes precedence: if both are specified, the benchmark is never run.

How it works

  1. Key generation outside the timing loop. For sign, verify, encap, and decap operations, keys and ciphertexts are generated before timing starts so that only the target operation is measured. Keygen benchmarks generate a fresh key pair on every iteration.
  2. Fixed-window iteration. Each operation runs in a tight loop for windowMs milliseconds (default 100). The loop checks time.Now().Before(deadline) after each operation. This avoids synthetic iteration counts that may not reflect sustained throughput.
  3. Throughput and latency. After the window expires, ops/sec = iterations / elapsed_seconds and latency_µs = elapsed_seconds × 1,000,000 / iterations.
  4. Pass/fail per operation. Each result is compared against a fixed threshold. The suite-level overall_passed is true only if all 15 operations pass.
  5. Scoring penalty. If overall_passed is false, the hardware CPU sub-score is reduced by up to 4 points (out of 20 max). The static architecture/ISA score is preserved; only the benchmark penalty is added.

Algorithm thresholds

Thresholds represent the minimum ops/sec required for a pass. They are calibrated to reflect realistic PQC deployment requirements: server-grade TLS handshake rates, code-signing pipelines, and certificate authority throughput.

Algorithm Security Level Operation Threshold (ops/sec) Notes
ML-DSA-44NIST L2keygen500Dilithium2 equivalent
ML-DSA-44NIST L2sign500
ML-DSA-44NIST L2verify1,000Verify is faster than sign
ML-DSA-65NIST L3keygen300Dilithium3 equivalent
ML-DSA-65NIST L3sign300
ML-DSA-65NIST L3verify600
ML-DSA-87NIST L5keygen200Dilithium5 equivalent
ML-DSA-87NIST L5sign200
ML-DSA-87NIST L5verify400
ML-KEM-768NIST L3keygen2,000Kyber768 equivalent
ML-KEM-768NIST L3encap2,000TLS key exchange
ML-KEM-768NIST L3decap2,000TLS key exchange
ML-KEM-1024NIST L5keygen1,000Kyber1024 equivalent
ML-KEM-1024NIST L5encap1,000
ML-KEM-1024NIST L5decap1,000

Scoring impact

The hardware benchmark adds a dynamic correction to the CPU sub-score inside the 40-point hardware assessment. Without the flag, CPU scoring is static (architecture 8 pts, ISA 7 pts, cores 3 pts, clock 2 pts = 20 pts max). With the flag:

Benchmark outcome CPU score change Effect on overall score
All 15 operations pass (overall_passed = true)No changeNo change
One or more operations fail (overall_passed = false)−4 points (floor 0)Up to −4 pts on 100-pt scale

The penalty is applied additively on top of the static CPU score, not as a replacement. A machine that passes all static checks but has slow PQC throughput will be penalized; a machine that already scores low statically will be floored at 0 for the CPU sub-score.

Output fields — JSON

The benchmark data appears in the JSON report under quantum_readiness.hardware_score.details.pqc_benchmark. All fields are absent when -hardware-benchmark is not specified.

{
  "quantum_readiness": {
    "hardware_score": {
      "details": {
        "pqc_benchmark": {
          "overall_passed": true,
          "measurement_ms": 100,
          "timestamp_utc": "2026-05-10T14:32:07Z",
          "results": [
            {
              "algorithm": "ML-DSA-44",
              "operation": "keygen",
              "ops_per_sec": 3241.7,
              "latency_microsec": 308.5,
              "passed": true,
              "threshold": 500
            },
            {
              "algorithm": "ML-DSA-44",
              "operation": "sign",
              "ops_per_sec": 3187.2,
              "latency_microsec": 313.8,
              "passed": true,
              "threshold": 500
            },
            ...
          ]
        }
      }
    }
  }
}
Field Type Description
pqc_benchmark.overall_passedboolTrue when all 15 algorithm/operation results met their threshold.
pqc_benchmark.measurement_msintMeasurement window per operation in milliseconds.
pqc_benchmark.timestamp_utctimeRFC 3339 UTC timestamp when benchmarks were run.
pqc_benchmark.results[].algorithmstringAlgorithm name: ML-DSA-44, ML-DSA-65, ML-DSA-87, ML-KEM-768, ML-KEM-1024.
pqc_benchmark.results[].operationstringkeygen, sign, verify, encap, or decap.
pqc_benchmark.results[].ops_per_secfloat64Measured throughput in operations per second.
pqc_benchmark.results[].latency_microsecfloat64Average operation latency in microseconds.
pqc_benchmark.results[].passedboolTrue if ops_per_sec ≥ threshold.
pqc_benchmark.results[].thresholdfloat64Minimum ops/sec required to pass for this algorithm/operation.

Flat NDJSON fields

In Flat NDJSON (tychon.quantum_readiness event) the benchmark data is flattened using the naming pattern quantum_readiness.hardware.pqc_benchmark.<alg>.<op>.<metric> where <alg> uses underscores (e.g. ml_dsa_44).

quantum_readiness.hardware.pqc_benchmark.overall_passed       → true
quantum_readiness.hardware.pqc_benchmark.measurement_ms       → 100
quantum_readiness.hardware.pqc_benchmark.timestamp_utc        → "2026-05-10T14:32:07Z"
quantum_readiness.hardware.pqc_benchmark.ml_dsa_44.keygen.ops_per_sec      → 3241.7
quantum_readiness.hardware.pqc_benchmark.ml_dsa_44.keygen.latency_microsec → 308.5
quantum_readiness.hardware.pqc_benchmark.ml_dsa_44.keygen.passed           → true
quantum_readiness.hardware.pqc_benchmark.ml_dsa_44.sign.ops_per_sec        → 3187.2
quantum_readiness.hardware.pqc_benchmark.ml_dsa_44.sign.latency_microsec   → 313.8
quantum_readiness.hardware.pqc_benchmark.ml_dsa_44.sign.passed             → true
quantum_readiness.hardware.pqc_benchmark.ml_dsa_44.verify.ops_per_sec      → 6104.1
...
quantum_readiness.hardware.pqc_benchmark.ml_kem_768.encap.ops_per_sec      → 18420.3
quantum_readiness.hardware.pqc_benchmark.ml_kem_768.encap.latency_microsec → 54.3
quantum_readiness.hardware.pqc_benchmark.ml_kem_768.encap.passed           → true

The complete field set covers all 15 algorithm/operation pairs × 3 metrics (ops_per_sec, latency_microsec, passed) = 45 flat fields, plus the 3 summary fields.

Elasticsearch mapping

The pqc_benchmark object is explicitly mapped under quantum_readiness.hardware in artifact/elasticsearch_mappings.go. Apply the mapping before ingesting benchmark data to prevent dynamic mapping from choosing incorrect types for the float throughput fields.

"quantum_readiness.hardware.pqc_benchmark": {
  "overall_passed":  boolean
  "measurement_ms":  integer
  "timestamp_utc":   date
  "ml_dsa_44.keygen.ops_per_sec":       float
  "ml_dsa_44.keygen.latency_microsec":  float
  "ml_dsa_44.keygen.passed":            boolean
  ... (same pattern for all 15 algorithm/operation pairs)
}

Update an existing index with: PUT /<index>/_mapping using the full mapping body from the Elasticsearch deployment guide.

Interpreting results

overall_passed = true

All 15 operations met their minimum throughput threshold. The hardware is capable of sustaining PQC workloads at production rates. No CPU score penalty is applied.

overall_passed = false

One or more operations are below threshold. Inspect per-operation passed fields to identify bottlenecks. A −4 pt CPU score penalty is applied to the quantum readiness assessment.

Common failure patterns

Pattern Likely cause Recommendation
All operations fail on an older x86-64 CPU Pre-AVX2 CPU lacking hardware acceleration for lattice arithmetic Plan hardware refresh; prioritize CPUs with AVX2 (Intel Haswell+, AMD Zen+)
ML-DSA passes, ML-KEM fails Rare; KEM is generally faster than DSA Check for CPU throttling, thermal limits, or competing workloads during scan
Higher security levels (ML-DSA-87, ML-KEM-1024) fail only CPU meets L2/L3 requirements but not L5 Acceptable for most deployments; CNSA 2.0 requires ML-KEM-1024 — note the gap
Results vary between scans System load during scan affects throughput Run during a maintenance window or with -disable-port-scan to reduce contention
Note on measurement accuracy. The benchmark runs operations sequentially, not in parallel. It measures single-threaded throughput — the relevant metric for TLS handshake performance on individual connections. Multi-threaded throughput would be a multiple of this figure based on core count.