Hardware PQC Benchmark - TYCHON Documentation

Overview

The standard quantum readiness hardware assessment scores CPUs based on static properties: architecture, instruction sets, core count, and estimated clock speed. These properties tell you whether a CPU is architecturally capable of running PQC algorithms, but they cannot tell you whether it is fast enough to sustain PQC workloads at production throughput.

The Hardware PQC Benchmark fills that gap by running real cryptographic operations on the host CPU during the scan. It measures actual key generation, signing, verification, encapsulation, and decapsulation throughput for the NIST-standardized ML-DSA and ML-KEM algorithm families using github.com/cloudflare/circl. Each operation is run for a fixed measurement window (default 100 ms), and the resulting ops/sec is compared against a minimum pass threshold derived from production deployment requirements.

Local mode only. The benchmark runs on the scanner host itself. It is only available during local mode scans (-mode local). It has no effect in remote scan mode.

Overhead. 15 algorithm/operation pairs × 100 ms each adds approximately 1.5 seconds to the quantum readiness assessment. This is why the feature is opt-in via a flag rather than enabled by default.

Enabling the benchmark

Add -hardware-benchmark to any local scan command:

# Local scan with hardware benchmarks enabled
./tychon-pqc-scanner -hardware-benchmark -output-dir /tmp/results

# Combined with full local scan
./tychon-pqc-scanner -fullscan -hardware-benchmark -output-dir /tmp/results

# Quantum readiness only (fastest)
./tychon-pqc-scanner -disable-port-scan -hardware-benchmark -output-dir /tmp/results

The -disable-quantum-readiness flag takes precedence: if both are specified, the benchmark is never run.

How it works

Key generation outside the timing loop. For sign, verify, encap, and decap operations, keys and ciphertexts are generated before timing starts so that only the target operation is measured. Keygen benchmarks generate a fresh key pair on every iteration.
Fixed-window iteration. Each operation runs in a tight loop for windowMs milliseconds (default 100). The loop checks time.Now().Before(deadline) after each operation. This avoids synthetic iteration counts that may not reflect sustained throughput.
Throughput and latency. After the window expires, ops/sec = iterations / elapsed_seconds and latency_µs = elapsed_seconds × 1,000,000 / iterations.
Pass/fail per operation. Each result is compared against a fixed threshold. The suite-level overall_passed is true only if all 15 operations pass.
Scoring penalty. If overall_passed is false, the hardware CPU sub-score is reduced by up to 4 points (out of 20 max). The static architecture/ISA score is preserved; only the benchmark penalty is added.

Algorithm thresholds

Thresholds represent the minimum ops/sec required for a pass. They are calibrated to reflect realistic PQC deployment requirements: server-grade TLS handshake rates, code-signing pipelines, and certificate authority throughput.

Algorithm	Security Level	Operation	Threshold (ops/sec)	Notes
ML-DSA-44	NIST L2	keygen	500	Dilithium2 equivalent
ML-DSA-44	NIST L2	sign	500
ML-DSA-44	NIST L2	verify	1,000	Verify is faster than sign
ML-DSA-65	NIST L3	keygen	300	Dilithium3 equivalent
ML-DSA-65	NIST L3	sign	300
ML-DSA-65	NIST L3	verify	600
ML-DSA-87	NIST L5	keygen	200	Dilithium5 equivalent
ML-DSA-87	NIST L5	sign	200
ML-DSA-87	NIST L5	verify	400
ML-KEM-768	NIST L3	keygen	2,000	Kyber768 equivalent
ML-KEM-768	NIST L3	encap	2,000	TLS key exchange
ML-KEM-768	NIST L3	decap	2,000	TLS key exchange
ML-KEM-1024	NIST L5	keygen	1,000	Kyber1024 equivalent
ML-KEM-1024	NIST L5	encap	1,000
ML-KEM-1024	NIST L5	decap	1,000

Scoring impact

The hardware benchmark adds a dynamic correction to the CPU sub-score inside the 40-point hardware assessment. Without the flag, CPU scoring is static (architecture 8 pts, ISA 7 pts, cores 3 pts, clock 2 pts = 20 pts max). With the flag:

Benchmark outcome	CPU score change	Effect on overall score
All 15 operations pass (`overall_passed = true`)	No change	No change
One or more operations fail (`overall_passed = false`)	−4 points (floor 0)	Up to −4 pts on 100-pt scale

The penalty is applied additively on top of the static CPU score, not as a replacement. A machine that passes all static checks but has slow PQC throughput will be penalized; a machine that already scores low statically will be floored at 0 for the CPU sub-score.

Output fields — JSON

The benchmark data appears in the JSON report under quantum_readiness.hardware_score.details.pqc_benchmark. All fields are absent when -hardware-benchmark is not specified.

{
  "quantum_readiness": {
    "hardware_score": {
      "details": {
        "pqc_benchmark": {
          "overall_passed": true,
          "measurement_ms": 100,
          "timestamp_utc": "2026-05-10T14:32:07Z",
          "results": [
            {
              "algorithm": "ML-DSA-44",
              "operation": "keygen",
              "ops_per_sec": 3241.7,
              "latency_microsec": 308.5,
              "passed": true,
              "threshold": 500
            },
            {
              "algorithm": "ML-DSA-44",
              "operation": "sign",
              "ops_per_sec": 3187.2,
              "latency_microsec": 313.8,
              "passed": true,
              "threshold": 500
            },
            ...
          ]
        }
      }
    }
  }
}

Field	Type	Description
`pqc_benchmark.overall_passed`	bool	True when all 15 algorithm/operation results met their threshold.
`pqc_benchmark.measurement_ms`	int	Measurement window per operation in milliseconds.
`pqc_benchmark.timestamp_utc`	time	RFC 3339 UTC timestamp when benchmarks were run.
`pqc_benchmark.results[].algorithm`	string	Algorithm name: `ML-DSA-44`, `ML-DSA-65`, `ML-DSA-87`, `ML-KEM-768`, `ML-KEM-1024`.
`pqc_benchmark.results[].operation`	string	`keygen`, `sign`, `verify`, `encap`, or `decap`.
`pqc_benchmark.results[].ops_per_sec`	float64	Measured throughput in operations per second.
`pqc_benchmark.results[].latency_microsec`	float64	Average operation latency in microseconds.
`pqc_benchmark.results[].passed`	bool	True if `ops_per_sec ≥ threshold`.
`pqc_benchmark.results[].threshold`	float64	Minimum ops/sec required to pass for this algorithm/operation.

Flat NDJSON fields

In Flat NDJSON (tychon.quantum_readiness event) the benchmark data is flattened using the naming pattern quantum_readiness.hardware.pqc_benchmark.<alg>.<op>.<metric> where <alg> uses underscores (e.g. ml_dsa_44).

quantum_readiness.hardware.pqc_benchmark.overall_passed       → true
quantum_readiness.hardware.pqc_benchmark.measurement_ms       → 100
quantum_readiness.hardware.pqc_benchmark.timestamp_utc        → "2026-05-10T14:32:07Z"
quantum_readiness.hardware.pqc_benchmark.ml_dsa_44.keygen.ops_per_sec      → 3241.7
quantum_readiness.hardware.pqc_benchmark.ml_dsa_44.keygen.latency_microsec → 308.5
quantum_readiness.hardware.pqc_benchmark.ml_dsa_44.keygen.passed           → true
quantum_readiness.hardware.pqc_benchmark.ml_dsa_44.sign.ops_per_sec        → 3187.2
quantum_readiness.hardware.pqc_benchmark.ml_dsa_44.sign.latency_microsec   → 313.8
quantum_readiness.hardware.pqc_benchmark.ml_dsa_44.sign.passed             → true
quantum_readiness.hardware.pqc_benchmark.ml_dsa_44.verify.ops_per_sec      → 6104.1
...
quantum_readiness.hardware.pqc_benchmark.ml_kem_768.encap.ops_per_sec      → 18420.3
quantum_readiness.hardware.pqc_benchmark.ml_kem_768.encap.latency_microsec → 54.3
quantum_readiness.hardware.pqc_benchmark.ml_kem_768.encap.passed           → true

The complete field set covers all 15 algorithm/operation pairs × 3 metrics (ops_per_sec, latency_microsec, passed) = 45 flat fields, plus the 3 summary fields.

Elasticsearch mapping

The pqc_benchmark object is explicitly mapped under quantum_readiness.hardware in artifact/elasticsearch_mappings.go. Apply the mapping before ingesting benchmark data to prevent dynamic mapping from choosing incorrect types for the float throughput fields.

"quantum_readiness.hardware.pqc_benchmark": {
  "overall_passed":  boolean
  "measurement_ms":  integer
  "timestamp_utc":   date
  "ml_dsa_44.keygen.ops_per_sec":       float
  "ml_dsa_44.keygen.latency_microsec":  float
  "ml_dsa_44.keygen.passed":            boolean
  ... (same pattern for all 15 algorithm/operation pairs)
}

Update an existing index with: PUT /<index>/_mapping using the full mapping body from the Elasticsearch deployment guide.

Interpreting results

overall_passed = true

All 15 operations met their minimum throughput threshold. The hardware is capable of sustaining PQC workloads at production rates. No CPU score penalty is applied.

overall_passed = false

One or more operations are below threshold. Inspect per-operation passed fields to identify bottlenecks. A −4 pt CPU score penalty is applied to the quantum readiness assessment.

Common failure patterns

Pattern	Likely cause	Recommendation
All operations fail on an older x86-64 CPU	Pre-AVX2 CPU lacking hardware acceleration for lattice arithmetic	Plan hardware refresh; prioritize CPUs with AVX2 (Intel Haswell+, AMD Zen+)
ML-DSA passes, ML-KEM fails	Rare; KEM is generally faster than DSA	Check for CPU throttling, thermal limits, or competing workloads during scan
Higher security levels (ML-DSA-87, ML-KEM-1024) fail only	CPU meets L2/L3 requirements but not L5	Acceptable for most deployments; CNSA 2.0 requires ML-KEM-1024 — note the gap
Results vary between scans	System load during scan affects throughput	Run during a maintenance window or with `-disable-port-scan` to reduce contention

Note on measurement accuracy. The benchmark runs operations sequentially, not in parallel. It measures single-threaded throughput — the relevant metric for TLS handshake performance on individual connections. Multi-threaded throughput would be a multiple of this figure based on core count.

On this page

Overview

Enabling the benchmark

How it works

Algorithm thresholds

Scoring impact

Output fields — JSON

Flat NDJSON fields

Elasticsearch mapping

Interpreting results

overall_passed = true

overall_passed = false

Common failure patterns