Live ML-DSA and ML-KEM throughput measurement for hardware readiness scoring
Enabled via -hardware-benchmark flag — local mode only
The standard quantum readiness hardware assessment scores CPUs based on static properties: architecture, instruction sets, core count, and estimated clock speed. These properties tell you whether a CPU is architecturally capable of running PQC algorithms, but they cannot tell you whether it is fast enough to sustain PQC workloads at production throughput.
The Hardware PQC Benchmark fills that gap by running real cryptographic operations
on the host CPU during the scan. It measures actual key generation, signing, verification, encapsulation,
and decapsulation throughput for the NIST-standardized ML-DSA and ML-KEM algorithm families using
github.com/cloudflare/circl. Each operation is run for a fixed measurement window
(default 100 ms), and the resulting ops/sec is compared against a minimum pass threshold derived
from production deployment requirements.
-mode local). It has no effect in remote scan mode.
Add -hardware-benchmark to any local scan command:
# Local scan with hardware benchmarks enabled ./tychon-pqc-scanner -hardware-benchmark -output-dir /tmp/results # Combined with full local scan ./tychon-pqc-scanner -fullscan -hardware-benchmark -output-dir /tmp/results # Quantum readiness only (fastest) ./tychon-pqc-scanner -disable-port-scan -hardware-benchmark -output-dir /tmp/results
The -disable-quantum-readiness flag takes precedence: if both are specified,
the benchmark is never run.
windowMs milliseconds (default 100). The loop checks
time.Now().Before(deadline) after each operation. This avoids synthetic
iteration counts that may not reflect sustained throughput.
overall_passed is true only if all 15 operations pass.
overall_passed is false,
the hardware CPU sub-score is reduced by up to 4 points (out of 20 max). The static
architecture/ISA score is preserved; only the benchmark penalty is added.
Thresholds represent the minimum ops/sec required for a pass. They are calibrated to reflect realistic PQC deployment requirements: server-grade TLS handshake rates, code-signing pipelines, and certificate authority throughput.
| Algorithm | Security Level | Operation | Threshold (ops/sec) | Notes |
|---|---|---|---|---|
| ML-DSA-44 | NIST L2 | keygen | 500 | Dilithium2 equivalent |
| ML-DSA-44 | NIST L2 | sign | 500 | |
| ML-DSA-44 | NIST L2 | verify | 1,000 | Verify is faster than sign |
| ML-DSA-65 | NIST L3 | keygen | 300 | Dilithium3 equivalent |
| ML-DSA-65 | NIST L3 | sign | 300 | |
| ML-DSA-65 | NIST L3 | verify | 600 | |
| ML-DSA-87 | NIST L5 | keygen | 200 | Dilithium5 equivalent |
| ML-DSA-87 | NIST L5 | sign | 200 | |
| ML-DSA-87 | NIST L5 | verify | 400 | |
| ML-KEM-768 | NIST L3 | keygen | 2,000 | Kyber768 equivalent |
| ML-KEM-768 | NIST L3 | encap | 2,000 | TLS key exchange |
| ML-KEM-768 | NIST L3 | decap | 2,000 | TLS key exchange |
| ML-KEM-1024 | NIST L5 | keygen | 1,000 | Kyber1024 equivalent |
| ML-KEM-1024 | NIST L5 | encap | 1,000 | |
| ML-KEM-1024 | NIST L5 | decap | 1,000 |
The hardware benchmark adds a dynamic correction to the CPU sub-score inside the 40-point hardware assessment. Without the flag, CPU scoring is static (architecture 8 pts, ISA 7 pts, cores 3 pts, clock 2 pts = 20 pts max). With the flag:
| Benchmark outcome | CPU score change | Effect on overall score |
|---|---|---|
All 15 operations pass (overall_passed = true) | No change | No change |
One or more operations fail (overall_passed = false) | −4 points (floor 0) | Up to −4 pts on 100-pt scale |
The penalty is applied additively on top of the static CPU score, not as a replacement. A machine that passes all static checks but has slow PQC throughput will be penalized; a machine that already scores low statically will be floored at 0 for the CPU sub-score.
The benchmark data appears in the JSON report under
quantum_readiness.hardware_score.details.pqc_benchmark.
All fields are absent when -hardware-benchmark is not specified.
{
"quantum_readiness": {
"hardware_score": {
"details": {
"pqc_benchmark": {
"overall_passed": true,
"measurement_ms": 100,
"timestamp_utc": "2026-05-10T14:32:07Z",
"results": [
{
"algorithm": "ML-DSA-44",
"operation": "keygen",
"ops_per_sec": 3241.7,
"latency_microsec": 308.5,
"passed": true,
"threshold": 500
},
{
"algorithm": "ML-DSA-44",
"operation": "sign",
"ops_per_sec": 3187.2,
"latency_microsec": 313.8,
"passed": true,
"threshold": 500
},
...
]
}
}
}
}
}
| Field | Type | Description |
|---|---|---|
pqc_benchmark.overall_passed | bool | True when all 15 algorithm/operation results met their threshold. |
pqc_benchmark.measurement_ms | int | Measurement window per operation in milliseconds. |
pqc_benchmark.timestamp_utc | time | RFC 3339 UTC timestamp when benchmarks were run. |
pqc_benchmark.results[].algorithm | string | Algorithm name: ML-DSA-44, ML-DSA-65, ML-DSA-87, ML-KEM-768, ML-KEM-1024. |
pqc_benchmark.results[].operation | string | keygen, sign, verify, encap, or decap. |
pqc_benchmark.results[].ops_per_sec | float64 | Measured throughput in operations per second. |
pqc_benchmark.results[].latency_microsec | float64 | Average operation latency in microseconds. |
pqc_benchmark.results[].passed | bool | True if ops_per_sec ≥ threshold. |
pqc_benchmark.results[].threshold | float64 | Minimum ops/sec required to pass for this algorithm/operation. |
In Flat NDJSON (tychon.quantum_readiness event) the benchmark data is
flattened using the naming pattern
quantum_readiness.hardware.pqc_benchmark.<alg>.<op>.<metric>
where <alg> uses underscores (e.g. ml_dsa_44).
quantum_readiness.hardware.pqc_benchmark.overall_passed → true quantum_readiness.hardware.pqc_benchmark.measurement_ms → 100 quantum_readiness.hardware.pqc_benchmark.timestamp_utc → "2026-05-10T14:32:07Z" quantum_readiness.hardware.pqc_benchmark.ml_dsa_44.keygen.ops_per_sec → 3241.7 quantum_readiness.hardware.pqc_benchmark.ml_dsa_44.keygen.latency_microsec → 308.5 quantum_readiness.hardware.pqc_benchmark.ml_dsa_44.keygen.passed → true quantum_readiness.hardware.pqc_benchmark.ml_dsa_44.sign.ops_per_sec → 3187.2 quantum_readiness.hardware.pqc_benchmark.ml_dsa_44.sign.latency_microsec → 313.8 quantum_readiness.hardware.pqc_benchmark.ml_dsa_44.sign.passed → true quantum_readiness.hardware.pqc_benchmark.ml_dsa_44.verify.ops_per_sec → 6104.1 ... quantum_readiness.hardware.pqc_benchmark.ml_kem_768.encap.ops_per_sec → 18420.3 quantum_readiness.hardware.pqc_benchmark.ml_kem_768.encap.latency_microsec → 54.3 quantum_readiness.hardware.pqc_benchmark.ml_kem_768.encap.passed → true
The complete field set covers all 15 algorithm/operation pairs × 3 metrics (ops_per_sec, latency_microsec, passed) = 45 flat fields, plus the 3 summary fields.
The pqc_benchmark object is explicitly mapped under
quantum_readiness.hardware in artifact/elasticsearch_mappings.go.
Apply the mapping before ingesting benchmark data to prevent dynamic mapping from
choosing incorrect types for the float throughput fields.
"quantum_readiness.hardware.pqc_benchmark": {
"overall_passed": boolean
"measurement_ms": integer
"timestamp_utc": date
"ml_dsa_44.keygen.ops_per_sec": float
"ml_dsa_44.keygen.latency_microsec": float
"ml_dsa_44.keygen.passed": boolean
... (same pattern for all 15 algorithm/operation pairs)
}
Update an existing index with:
PUT /<index>/_mapping using the full mapping body from the
Elasticsearch deployment guide.
All 15 operations met their minimum throughput threshold. The hardware is capable of sustaining PQC workloads at production rates. No CPU score penalty is applied.
One or more operations are below threshold. Inspect per-operation
passed fields to identify bottlenecks. A −4 pt CPU score
penalty is applied to the quantum readiness assessment.
| Pattern | Likely cause | Recommendation |
|---|---|---|
| All operations fail on an older x86-64 CPU | Pre-AVX2 CPU lacking hardware acceleration for lattice arithmetic | Plan hardware refresh; prioritize CPUs with AVX2 (Intel Haswell+, AMD Zen+) |
| ML-DSA passes, ML-KEM fails | Rare; KEM is generally faster than DSA | Check for CPU throttling, thermal limits, or competing workloads during scan |
| Higher security levels (ML-DSA-87, ML-KEM-1024) fail only | CPU meets L2/L3 requirements but not L5 | Acceptable for most deployments; CNSA 2.0 requires ML-KEM-1024 — note the gap |
| Results vary between scans | System load during scan affects throughput | Run during a maintenance window or with -disable-port-scan to reduce contention |