Open Claim Registry
91 verified claims, listed and linked
Every quantitative result on this site maps to a claim with a passing, deterministic test. Each claim below has a short plain-English description, an ID you can search for in the test suite, and a link to the case study where applicable. We also list 1 near-pass claim that fell short of its pre-registered gate, kept here for transparency.
Core engine guarantees
11 claims
- C-CORE-001Pass
Default ingest auto-selects symbol width via the discovery curve. Fixed widths are opt-in.
- C-CORE-002Pass
Reads raw bytes exactly as given and replays them byte-for-byte at the chosen symbol width.
- C-CORE-003Pass
Anchor alignment scans bit offsets 0 to 7 with deterministic tie-breaking. Same input, same anchor, every run.
- C-CORE-004Pass
Identity seed comes only from the geometric signature after anchor alignment. No external sources.
- C-CORE-005Pass
Cryptographic hashes are bookkeeping metadata only. They never decide identity, deduplication, or claim gating.
- C-CORE-006Pass
Primitive reuse stays inside one ledger. Two separate ledgers cannot share IDs or store entries.
- C-CORE-007Pass
Scrubbing low-frequency primitives leaves replay intact for everything that was kept.
- C-CORE-008Pass
Determinism, replay truth, and corruption rejection all pass together. Any break marks the run as failed.
- C-CORE-009Pass
Persisted ledgers carry a version magic. Files without it, or with a wrong version, are rejected on load.
- C-CORE-010Pass
In verifying mode, corrupted ledger data triggers an explicit error. There is no silent fallback.
- C-CORE-011Pass
Calling the engine from Python yields bit-identical seeds and discovery rates to the Rust core. 50 inputs, 3 runs each.
Discovery findings
46 claims
- C-DISCOV-001Pass
Independent samples of the same data type produce overlapping primitive vocabularies. Random data does not.
- C-DISCOV-002Pass
The vocabulary is found in one forward pass. BPE needs at least three corpus passes for the same inputs.
- C-DISCOV-003Pass
Two streams with identical Shannon entropy can have very different discovery rates.
- C-DISCOV-004Pass
Joining different data types produces a measurable jump in unknown patterns at the boundary.
- C-DISCOV-005Pass
Byte-exact replay, structural decomposition, ordered vocabulary, vocabulary reuse, and single-pass operation, all in one tool.
- C-DISCOV-006Pass
A vocabulary learned from one sample of a data type carries over to other samples of the same type.
- C-DISCOV-007Pass
When data of one type is followed by data of a different type, the rate of new patterns spikes at the boundary.
- C-DISCOV-008Pass
The geometric signature changes less under structure-preserving edits than under structure-breaking ones.
- C-DISCOV-009Pass
Reuse rate correlates with established complexity measures and varies with scale in ways those measures do not.
- C-DISCOV-010Pass
The ratio of discovery rates at different symbol widths separates structured from random data with AUC over 0.90.
- C-DISCOV-011Pass
Primitive IDs are assigned in first-encounter order. The timeline preserves the order each pattern first appeared.
- C-DISCOV-012Pass
Phase-12-to-17 structural results hold on real filesystem files, not just generated data.
- C-DISCOV-013Pass
For structured data, vocabulary grows slower than the input length. For random data, it grows roughly linearly.
- C-DISCOV-014Pass
Without any training, the engine's scale-ratio classifies structured vs random data within 5 points of a trained classifier.
- C-DISCOV-015Pass
A vocabulary built from text data flags injected anomalies (DNA, binary, random) without configuration or thresholds.
- C-DISCOV-016Pass
For structured data, vocabulary discovery saturates within a bounded multiple of the final vocabulary size.
- C-DISCOV-017Pass
On 50 isoentropy pairs, the engine separates structured from shuffled with AUC over 0.90. Entropy stays at chance.
- C-DISCOV-018Pass
In a streaming text background, the engine flags anomalies engineered to have identical Shannon entropy to the surrounding data.
- C-DISCOV-019Pass
The engine simultaneously detects anomalies and replays the exact bytes of flagged windows from one record.
See the case study → - C-DISCOV-020Pass
On noisy-vocabulary sequences, structural discovery rate separates structured from shuffled while byte-level entropy is blind.
- C-DISCOV-021Pass
On harder partial-discrimination data, three engine variants are characterised with non-degenerate confidence intervals.
- C-DISCOV-022Pass
Detects segment reordering, partial bit corruption, and cross-domain drift. Adversarial reuse fails by design (documented scope boundary).
- C-DISCOV-023Pass
Geometric-signature mutation sensitivity exceeds 0.80 across 5 data types. Inputs under 16 bytes have a known low-sensitivity floor.
- C-DISCOV-024Pass
The engine is the only tested system meeting all five of: byte-exact replay, enumerable vocabulary, no pre-training, isoentropy discrimination, geometric identity.
- C-DISCOV-025Pass
On 1,000 isoentropy pairs, the engine separates structured from shuffled with AUC over 0.90. Shannon entropy stays at chance.
See the case study → - C-DISCOV-026Pass
BPE learns a finite set of merge rules. The engine enumerates every fixed-width pattern seen during training. Cross-domain coverage advantage measured.
- C-DISCOV-027Pass
Flags structural type boundaries in streaming data with no prior reference window. A gzip-NCD baseline needs at least 5 reference chunks.
- C-DISCOV-028Pass
Single-byte edits produce high primitive reuse across document versions. SHA-256 reports binary same/different and nothing in between.
- C-DISCOV-029Pass
Distinct memory/correctness tradeoff: exact replay + enumerable vocabulary + structural identity, at higher storage cost than gzip.
- C-DISCOV-030Pass
Adding grammar-based codes (Sequitur, Re-Pair) to the comparison still leaves the engine the only system meeting all five tested properties.
- C-DISCOV-031Pass
The five-property combination proof holds on real project files (Python, Rust, JSON, Markdown, config), not just synthetic data.
- C-DISCOV-032Pass
At fixed-32 symbol width, discovery rate cleanly separates 6 levels of isoentropy data. Shannon entropy is blind to all of them.
- C-DISCOV-033Pass
Independent samples of structured data types produce near-identical vocabularies at fixed-16 symbol width.
- C-DISCOV-034Pass
Structured data reaches vocabulary saturation within 64 KB. Random data does not saturate in that range.
- C-DISCOV-035Pass
For fixed symbol width, the engine's vocabulary is exactly the set of patterns present in the input. No phantoms, no missing.
- C-DISCOV-036Pass
Joining two byte-aligned segments produces a vocabulary equal to the union of the two parts. A formal closure property.
- C-DISCOV-037Pass
Reversing the order of fixed-width blocks produces the same vocabulary but a reversed timeline. Vocabulary captures what; timeline captures where.
- C-DISCOV-038Pass
The discovery-rate curve correlates with established complexity measures and separates isoentropy inputs.
- C-DISCOV-039Pass
Vocabulary growth is not a strict power law for structured data, but it separates structured from random by orders of magnitude.
- C-DISCOV-040Pass
Primitive frequency is not strictly Zipfian, but the fitted Zipf exponent strongly discriminates by data type.
- C-DISCOV-041Pass
Timeline autocorrelation peaks at the structural period for periodic data and is flat for random data.
- C-DISCOV-042Pass
As sample size grows, two independent vocabularies converge toward the same full alphabet. Convergence is monotonic for structured data.
- C-DISCOV-043Pass
Vocabulary saturation has a critical point that varies by data type. Random data saturates too, but very late.
- C-DISCOV-044Pass
Auto-curve picks symbol widths around 14 to 16 regardless of the natural unit size. It optimises for entropy, not structural boundaries.
- C-DISCOV-045Pass
A 5D structural complexity profile classifies five data types with 100% accuracy on 250 samples (zero-shot, nearest-centroid).
See the case study → - C-DISCOV-047Pass
At a 17-bit non-byte-aligned pattern, the engine separates structured from shuffled with AUC = 0.98. Gzip stays at chance.
See the case study →
Method-layer governance
2 claims
- C-METHOD-001Pass
The decision pipeline produces a 14-stage gate-validated, hash-chained ledger entry for every analysis request.
- C-METHOD-002Pass
On five compound tasks (detect+localise+replay, audit chain, single-pass, vocabulary continuity, per-primitive trail), the engine passes all five. Three baselines fail at least two each.
Bit-flip audit
1 claim
- C-AUDIT-001Pass
A single-byte flip in a structured file produces a deterministic, measurable change in the structural record.
Statistical characterisation
2 claims
- C-STAT-001Pass
Effect sizes characterised honestly. Gzip's effect size on this task exceeds the engine's, and we report it.
- C-STAT-002Pass
Isoentropy discrimination mapped across 5 lengths and 3 widths. All 15 cells exceed AUC = 0.70.
Performance
1 claim
- C-PERF-001Pass
Ingest time scales roughly linearly with input size. The engine is 13 to 21 times slower than gzip and 13 to 21 times larger in storage. Honest characterisation.
Novel-frontier research
23 claims
- C-NOVEL-001Pass
Inputs whose bit length is not a multiple of the symbol width still replay correctly via padding.
- C-NOVEL-002Pass
Symbol width 1 round-trips: 1 or 2 primitives, timeline length equals the bit count.
- C-NOVEL-003Pass
Symbol width 128 is accepted and replays correctly across 50 inputs of varying length.
- C-NOVEL-004Pass
At width 4, histogram-matched permutations produce near-identical vocabularies but distinct seeds.
- C-NOVEL-005Pass
The engine's signature is identical across selection widths (anchor uses width 8). Selection width still drives ledger state.
- C-NOVEL-006Pass
Inputs with all-zero or all-one suffixes round-trip exactly at non-aligned widths.
- C-NOVEL-007Pass
All-zero inputs produce a fixed sentinel signature. A single set bit anywhere produces a non-sentinel signature.
- C-NOVEL-008Pass
Initial primitive count is fixed at ledger construction. Scrubbing never mutates it. Discovery rate after scrub stays consistent.
- C-NOVEL-009Pass
With zero_point=False and simulate=True, the engine loads a persisted ledger and writes nothing.
- C-NOVEL-010Pass
Save and load of an empty ledger is idempotent. A reloaded empty engine matches a fresh one.
- C-NOVEL-011Pass
Eight independent engines on the same input produce byte-identical seeds and ledger files. Wall-clock under 4x single-thread.
- C-NOVEL-012Pass
Batch signature calls produce identical per-item dicts to serial calls across 200 diverse inputs.
- C-NOVEL-013Pass
NFC vs NFD Unicode normalisation produces distinct byte sequences and distinct seeds. The semantic pipeline does not collapse them.
- C-NOVEL-014Pass
For the same paragraph in UTF-8, UTF-16LE, UTF-16BE, UTF-32LE, the reuse-ratio ordering is preserved (20 of 20 paragraphs).
- C-NOVEL-015Pass
Decision-hash matches an external SHA-256 of canonical JSON. Tampered ledger entries are detected on re-read.
- C-NOVEL-016Pass
The decision pipeline always produces stages_executed in canonical order across 50 diverse requests.
- C-NOVEL-017Pass
The semantic pipeline terminates deterministically on inputs stacked with 6+ noise transforms.
- C-NOVEL-018Pass
Licence-status active flag matches a 14-day grace window around days_remaining = 0.
- C-NOVEL-019Pass
The causality gate's verdicts match an external re-derivation across 20 adversarial requests.
- C-NOVEL-020Pass
The adaptive-floor switch (between 2 and 16) is observable when the memory watermark changes for the same low-entropy inputs.
- C-NOVEL-021Pass
Across 1 million 128-byte inputs at fixed-8, seed-collision distribution is characterised. Max cluster size stays below 1 million.
- C-NOVEL-022Pass
At fixed-16 width, the engine separates Rule-30 streams from PRNG streams by discovery rate. Byte-histogram entropy cannot.
- C-NOVEL-023Pass
Pi, e, root-2 in binary produce statistically indistinguishable discovery rates. Consistent with the normality conjecture.
UPFM hypothesis claims
6 claims
- C-UPFM-001aPass
Per-window discovery-rate decay follows an exponential shape and dominates a coupon-collector null on every tested domain.
- C-UPFM-001bPass
After normalising by entropy, pattern depth, and reuse, the decay constant collapses across admissible domains within ~6x.
- C-UPFM-002Pass
Leading-padded vs trailing-padded inputs produce distinct primitive stores on 6 of 7 domains.
- C-UPFM-004cPass
Engineered multi-scale inputs hit the author-specified cross-scale coupling floor on all 5 fractal/harmonic types.
- C-UPFM-006Pass
Scrub reduction ratio correlates with byte entropy after controlling for input length.
- C-UPFM-007bNear-pass
Within the engine's declared noise-class contract, semantic composition converges on 97.97% of trials. The pre-registered gate is 98%, so 1 trial short.