Open Claim Registry

91 verified claims, listed and linked

Every quantitative result on this site maps to a claim with a passing, deterministic test. Each claim below has a short plain-English description, an ID you can search for in the test suite, and a link to the case study where applicable. We also list 1 near-pass claim that fell short of its pre-registered gate, kept here for transparency.

Core (11)Discovery (46)Method (2)Audit (1)Statistical (2)Performance (1)Novel (23)UPFM Hypotheses (6)

Core engine guarantees

11 claims

C-CORE-001
Default ingest auto-selects symbol width via the discovery curve. Fixed widths are opt-in.
Pass
C-CORE-002
Reads raw bytes exactly as given and replays them byte-for-byte at the chosen symbol width.
Pass
C-CORE-003
Anchor alignment scans bit offsets 0 to 7 with deterministic tie-breaking. Same input, same anchor, every run.
Pass
C-CORE-004
Identity seed comes only from the geometric signature after anchor alignment. No external sources.
Pass
C-CORE-005
Cryptographic hashes are bookkeeping metadata only. They never decide identity, deduplication, or claim gating.
Pass
C-CORE-006
Primitive reuse stays inside one ledger. Two separate ledgers cannot share IDs or store entries.
Pass
C-CORE-007
Scrubbing low-frequency primitives leaves replay intact for everything that was kept.
Pass
C-CORE-008
Determinism, replay truth, and corruption rejection all pass together. Any break marks the run as failed.
Pass
C-CORE-009
Persisted ledgers carry a version magic. Files without it, or with a wrong version, are rejected on load.
Pass
C-CORE-010
In verifying mode, corrupted ledger data triggers an explicit error. There is no silent fallback.
Pass
C-CORE-011
Calling the engine from Python yields bit-identical seeds and discovery rates to the Rust core. 50 inputs, 3 runs each.
Pass

Discovery findings

46 claims

C-DISCOV-001
Independent samples of the same data type produce overlapping primitive vocabularies. Random data does not.
Pass
C-DISCOV-002
The vocabulary is found in one forward pass. BPE needs at least three corpus passes for the same inputs.
Pass
C-DISCOV-003
Two streams with identical Shannon entropy can have very different discovery rates.
Pass
C-DISCOV-004
Joining different data types produces a measurable jump in unknown patterns at the boundary.
Pass
C-DISCOV-005
Byte-exact replay, structural decomposition, ordered vocabulary, vocabulary reuse, and single-pass operation, all in one tool.
Pass
C-DISCOV-006
A vocabulary learned from one sample of a data type carries over to other samples of the same type.
Pass
C-DISCOV-007
When data of one type is followed by data of a different type, the rate of new patterns spikes at the boundary.
Pass
C-DISCOV-008
The geometric signature changes less under structure-preserving edits than under structure-breaking ones.
Pass
C-DISCOV-009
Reuse rate correlates with established complexity measures and varies with scale in ways those measures do not.
Pass
C-DISCOV-010
The ratio of discovery rates at different symbol widths separates structured from random data with AUC over 0.90.
Pass
C-DISCOV-011
Primitive IDs are assigned in first-encounter order. The timeline preserves the order each pattern first appeared.
Pass
C-DISCOV-012
Phase-12-to-17 structural results hold on real filesystem files, not just generated data.
Pass
C-DISCOV-013
For structured data, vocabulary grows slower than the input length. For random data, it grows roughly linearly.
Pass
C-DISCOV-014
Without any training, the engine's scale-ratio classifies structured vs random data within 5 points of a trained classifier.
Pass
C-DISCOV-015
A vocabulary built from text data flags injected anomalies (DNA, binary, random) without configuration or thresholds.
Pass
C-DISCOV-016
For structured data, vocabulary discovery saturates within a bounded multiple of the final vocabulary size.
Pass
C-DISCOV-017
On 50 isoentropy pairs, the engine separates structured from shuffled with AUC over 0.90. Entropy stays at chance.
Pass
C-DISCOV-018
In a streaming text background, the engine flags anomalies engineered to have identical Shannon entropy to the surrounding data.
Pass
C-DISCOV-019
The engine simultaneously detects anomalies and replays the exact bytes of flagged windows from one record.
See the case study →
Pass
C-DISCOV-020
On noisy-vocabulary sequences, structural discovery rate separates structured from shuffled while byte-level entropy is blind.
Pass
C-DISCOV-021
On harder partial-discrimination data, three engine variants are characterised with non-degenerate confidence intervals.
Pass
C-DISCOV-022
Detects segment reordering, partial bit corruption, and cross-domain drift. Adversarial reuse fails by design (documented scope boundary).
Pass
C-DISCOV-023
Geometric-signature mutation sensitivity exceeds 0.80 across 5 data types. Inputs under 16 bytes have a known low-sensitivity floor.
Pass
C-DISCOV-024
The engine is the only tested system meeting all five of: byte-exact replay, enumerable vocabulary, no pre-training, isoentropy discrimination, geometric identity.
Pass
C-DISCOV-025
On 1,000 isoentropy pairs, the engine separates structured from shuffled with AUC over 0.90. Shannon entropy stays at chance.
See the case study →
Pass
C-DISCOV-026
BPE learns a finite set of merge rules. The engine enumerates every fixed-width pattern seen during training. Cross-domain coverage advantage measured.
Pass
C-DISCOV-027
Flags structural type boundaries in streaming data with no prior reference window. A gzip-NCD baseline needs at least 5 reference chunks.
Pass
C-DISCOV-028
Single-byte edits produce high primitive reuse across document versions. SHA-256 reports binary same/different and nothing in between.
Pass
C-DISCOV-029
Distinct memory/correctness tradeoff: exact replay + enumerable vocabulary + structural identity, at higher storage cost than gzip.
Pass
C-DISCOV-030
Adding grammar-based codes (Sequitur, Re-Pair) to the comparison still leaves the engine the only system meeting all five tested properties.
Pass
C-DISCOV-031
The five-property combination proof holds on real project files (Python, Rust, JSON, Markdown, config), not just synthetic data.
Pass
C-DISCOV-032
At fixed-32 symbol width, discovery rate cleanly separates 6 levels of isoentropy data. Shannon entropy is blind to all of them.
Pass
C-DISCOV-033
Independent samples of structured data types produce near-identical vocabularies at fixed-16 symbol width.
Pass
C-DISCOV-034
Structured data reaches vocabulary saturation within 64 KB. Random data does not saturate in that range.
Pass
C-DISCOV-035
For fixed symbol width, the engine's vocabulary is exactly the set of patterns present in the input. No phantoms, no missing.
Pass
C-DISCOV-036
Joining two byte-aligned segments produces a vocabulary equal to the union of the two parts. A formal closure property.
Pass
C-DISCOV-037
Reversing the order of fixed-width blocks produces the same vocabulary but a reversed timeline. Vocabulary captures what; timeline captures where.
Pass
C-DISCOV-038
The discovery-rate curve correlates with established complexity measures and separates isoentropy inputs.
Pass
C-DISCOV-039
Vocabulary growth is not a strict power law for structured data, but it separates structured from random by orders of magnitude.
Pass
C-DISCOV-040
Primitive frequency is not strictly Zipfian, but the fitted Zipf exponent strongly discriminates by data type.
Pass
C-DISCOV-041
Timeline autocorrelation peaks at the structural period for periodic data and is flat for random data.
Pass
C-DISCOV-042
As sample size grows, two independent vocabularies converge toward the same full alphabet. Convergence is monotonic for structured data.
Pass
C-DISCOV-043
Vocabulary saturation has a critical point that varies by data type. Random data saturates too, but very late.
Pass
C-DISCOV-044
Auto-curve picks symbol widths around 14 to 16 regardless of the natural unit size. It optimises for entropy, not structural boundaries.
Pass
C-DISCOV-045
A 5D structural complexity profile classifies five data types with 100% accuracy on 250 samples (zero-shot, nearest-centroid).
See the case study →
Pass
C-DISCOV-047
At a 17-bit non-byte-aligned pattern, the engine separates structured from shuffled with AUC = 0.98. Gzip stays at chance.
See the case study →
Pass

Method-layer governance

2 claims

C-METHOD-001
The decision pipeline produces a 14-stage gate-validated, hash-chained ledger entry for every analysis request.
Pass
C-METHOD-002
On five compound tasks (detect+localise+replay, audit chain, single-pass, vocabulary continuity, per-primitive trail), the engine passes all five. Three baselines fail at least two each.
Pass

Bit-flip audit

1 claim

C-AUDIT-001
A single-byte flip in a structured file produces a deterministic, measurable change in the structural record.
Pass

Statistical characterisation

2 claims

C-STAT-001
Effect sizes characterised honestly. Gzip's effect size on this task exceeds the engine's, and we report it.
Pass
C-STAT-002
Isoentropy discrimination mapped across 5 lengths and 3 widths. All 15 cells exceed AUC = 0.70.
Pass

Performance

1 claim

C-PERF-001
Ingest time scales roughly linearly with input size. The engine is 13 to 21 times slower than gzip and 13 to 21 times larger in storage. Honest characterisation.
Pass

Novel-frontier research

23 claims

C-NOVEL-001
Inputs whose bit length is not a multiple of the symbol width still replay correctly via padding.
Pass
C-NOVEL-002
Symbol width 1 round-trips: 1 or 2 primitives, timeline length equals the bit count.
Pass
C-NOVEL-003
Symbol width 128 is accepted and replays correctly across 50 inputs of varying length.
Pass
C-NOVEL-004
At width 4, histogram-matched permutations produce near-identical vocabularies but distinct seeds.
Pass
C-NOVEL-005
The engine's signature is identical across selection widths (anchor uses width 8). Selection width still drives ledger state.
Pass
C-NOVEL-006
Inputs with all-zero or all-one suffixes round-trip exactly at non-aligned widths.
Pass
C-NOVEL-007
All-zero inputs produce a fixed sentinel signature. A single set bit anywhere produces a non-sentinel signature.
Pass
C-NOVEL-008
Initial primitive count is fixed at ledger construction. Scrubbing never mutates it. Discovery rate after scrub stays consistent.
Pass
C-NOVEL-009
With zero_point=False and simulate=True, the engine loads a persisted ledger and writes nothing.
Pass
C-NOVEL-010
Save and load of an empty ledger is idempotent. A reloaded empty engine matches a fresh one.
Pass
C-NOVEL-011
Eight independent engines on the same input produce byte-identical seeds and ledger files. Wall-clock under 4x single-thread.
Pass
C-NOVEL-012
Batch signature calls produce identical per-item dicts to serial calls across 200 diverse inputs.
Pass
C-NOVEL-013
NFC vs NFD Unicode normalisation produces distinct byte sequences and distinct seeds. The semantic pipeline does not collapse them.
Pass
C-NOVEL-014
For the same paragraph in UTF-8, UTF-16LE, UTF-16BE, UTF-32LE, the reuse-ratio ordering is preserved (20 of 20 paragraphs).
Pass
C-NOVEL-015
Decision-hash matches an external SHA-256 of canonical JSON. Tampered ledger entries are detected on re-read.
Pass
C-NOVEL-016
The decision pipeline always produces stages_executed in canonical order across 50 diverse requests.
Pass
C-NOVEL-017
The semantic pipeline terminates deterministically on inputs stacked with 6+ noise transforms.
Pass
C-NOVEL-018
Licence-status active flag matches a 14-day grace window around days_remaining = 0.
Pass
C-NOVEL-019
The causality gate's verdicts match an external re-derivation across 20 adversarial requests.
Pass
C-NOVEL-020
The adaptive-floor switch (between 2 and 16) is observable when the memory watermark changes for the same low-entropy inputs.
Pass
C-NOVEL-021
Across 1 million 128-byte inputs at fixed-8, seed-collision distribution is characterised. Max cluster size stays below 1 million.
Pass
C-NOVEL-022
At fixed-16 width, the engine separates Rule-30 streams from PRNG streams by discovery rate. Byte-histogram entropy cannot.
Pass
C-NOVEL-023
Pi, e, root-2 in binary produce statistically indistinguishable discovery rates. Consistent with the normality conjecture.
Pass

UPFM hypothesis claims

6 claims

C-UPFM-001a
Per-window discovery-rate decay follows an exponential shape and dominates a coupon-collector null on every tested domain.
Pass
C-UPFM-001b
After normalising by entropy, pattern depth, and reuse, the decay constant collapses across admissible domains within ~6x.
Pass
C-UPFM-002
Leading-padded vs trailing-padded inputs produce distinct primitive stores on 6 of 7 domains.
Pass
C-UPFM-004c
Engineered multi-scale inputs hit the author-specified cross-scale coupling floor on all 5 fractal/harmonic types.
Pass
C-UPFM-006
Scrub reduction ratio correlates with byte entropy after controlling for input length.
Pass
C-UPFM-007b
Within the engine's declared noise-class contract, semantic composition converges on 97.97% of trials. The pre-registered gate is 98%, so 1 trial short.
Near-pass