Open Claim Registry

91 verified claims, listed and linked

Every quantitative result on this site maps to a claim with a passing, deterministic test. Each claim below has a short plain-English description, an ID you can search for in the test suite, and a link to the case study where applicable. We also list 1 near-pass claim that fell short of its pre-registered gate, kept here for transparency.

Core engine guarantees

11 claims

  • C-CORE-001

    Default ingest auto-selects symbol width via the discovery curve. Fixed widths are opt-in.

    Pass
  • C-CORE-002

    Reads raw bytes exactly as given and replays them byte-for-byte at the chosen symbol width.

    Pass
  • C-CORE-003

    Anchor alignment scans bit offsets 0 to 7 with deterministic tie-breaking. Same input, same anchor, every run.

    Pass
  • C-CORE-004

    Identity seed comes only from the geometric signature after anchor alignment. No external sources.

    Pass
  • C-CORE-005

    Cryptographic hashes are bookkeeping metadata only. They never decide identity, deduplication, or claim gating.

    Pass
  • C-CORE-006

    Primitive reuse stays inside one ledger. Two separate ledgers cannot share IDs or store entries.

    Pass
  • C-CORE-007

    Scrubbing low-frequency primitives leaves replay intact for everything that was kept.

    Pass
  • C-CORE-008

    Determinism, replay truth, and corruption rejection all pass together. Any break marks the run as failed.

    Pass
  • C-CORE-009

    Persisted ledgers carry a version magic. Files without it, or with a wrong version, are rejected on load.

    Pass
  • C-CORE-010

    In verifying mode, corrupted ledger data triggers an explicit error. There is no silent fallback.

    Pass
  • C-CORE-011

    Calling the engine from Python yields bit-identical seeds and discovery rates to the Rust core. 50 inputs, 3 runs each.

    Pass

Discovery findings

46 claims

  • C-DISCOV-001

    Independent samples of the same data type produce overlapping primitive vocabularies. Random data does not.

    Pass
  • C-DISCOV-002

    The vocabulary is found in one forward pass. BPE needs at least three corpus passes for the same inputs.

    Pass
  • C-DISCOV-003

    Two streams with identical Shannon entropy can have very different discovery rates.

    Pass
  • C-DISCOV-004

    Joining different data types produces a measurable jump in unknown patterns at the boundary.

    Pass
  • C-DISCOV-005

    Byte-exact replay, structural decomposition, ordered vocabulary, vocabulary reuse, and single-pass operation, all in one tool.

    Pass
  • C-DISCOV-006

    A vocabulary learned from one sample of a data type carries over to other samples of the same type.

    Pass
  • C-DISCOV-007

    When data of one type is followed by data of a different type, the rate of new patterns spikes at the boundary.

    Pass
  • C-DISCOV-008

    The geometric signature changes less under structure-preserving edits than under structure-breaking ones.

    Pass
  • C-DISCOV-009

    Reuse rate correlates with established complexity measures and varies with scale in ways those measures do not.

    Pass
  • C-DISCOV-010

    The ratio of discovery rates at different symbol widths separates structured from random data with AUC over 0.90.

    Pass
  • C-DISCOV-011

    Primitive IDs are assigned in first-encounter order. The timeline preserves the order each pattern first appeared.

    Pass
  • C-DISCOV-012

    Phase-12-to-17 structural results hold on real filesystem files, not just generated data.

    Pass
  • C-DISCOV-013

    For structured data, vocabulary grows slower than the input length. For random data, it grows roughly linearly.

    Pass
  • C-DISCOV-014

    Without any training, the engine's scale-ratio classifies structured vs random data within 5 points of a trained classifier.

    Pass
  • C-DISCOV-015

    A vocabulary built from text data flags injected anomalies (DNA, binary, random) without configuration or thresholds.

    Pass
  • C-DISCOV-016

    For structured data, vocabulary discovery saturates within a bounded multiple of the final vocabulary size.

    Pass
  • C-DISCOV-017

    On 50 isoentropy pairs, the engine separates structured from shuffled with AUC over 0.90. Entropy stays at chance.

    Pass
  • C-DISCOV-018

    In a streaming text background, the engine flags anomalies engineered to have identical Shannon entropy to the surrounding data.

    Pass
  • C-DISCOV-019

    The engine simultaneously detects anomalies and replays the exact bytes of flagged windows from one record.

    See the case study →
    Pass
  • C-DISCOV-020

    On noisy-vocabulary sequences, structural discovery rate separates structured from shuffled while byte-level entropy is blind.

    Pass
  • C-DISCOV-021

    On harder partial-discrimination data, three engine variants are characterised with non-degenerate confidence intervals.

    Pass
  • C-DISCOV-022

    Detects segment reordering, partial bit corruption, and cross-domain drift. Adversarial reuse fails by design (documented scope boundary).

    Pass
  • C-DISCOV-023

    Geometric-signature mutation sensitivity exceeds 0.80 across 5 data types. Inputs under 16 bytes have a known low-sensitivity floor.

    Pass
  • C-DISCOV-024

    The engine is the only tested system meeting all five of: byte-exact replay, enumerable vocabulary, no pre-training, isoentropy discrimination, geometric identity.

    Pass
  • C-DISCOV-025

    On 1,000 isoentropy pairs, the engine separates structured from shuffled with AUC over 0.90. Shannon entropy stays at chance.

    See the case study →
    Pass
  • C-DISCOV-026

    BPE learns a finite set of merge rules. The engine enumerates every fixed-width pattern seen during training. Cross-domain coverage advantage measured.

    Pass
  • C-DISCOV-027

    Flags structural type boundaries in streaming data with no prior reference window. A gzip-NCD baseline needs at least 5 reference chunks.

    Pass
  • C-DISCOV-028

    Single-byte edits produce high primitive reuse across document versions. SHA-256 reports binary same/different and nothing in between.

    Pass
  • C-DISCOV-029

    Distinct memory/correctness tradeoff: exact replay + enumerable vocabulary + structural identity, at higher storage cost than gzip.

    Pass
  • C-DISCOV-030

    Adding grammar-based codes (Sequitur, Re-Pair) to the comparison still leaves the engine the only system meeting all five tested properties.

    Pass
  • C-DISCOV-031

    The five-property combination proof holds on real project files (Python, Rust, JSON, Markdown, config), not just synthetic data.

    Pass
  • C-DISCOV-032

    At fixed-32 symbol width, discovery rate cleanly separates 6 levels of isoentropy data. Shannon entropy is blind to all of them.

    Pass
  • C-DISCOV-033

    Independent samples of structured data types produce near-identical vocabularies at fixed-16 symbol width.

    Pass
  • C-DISCOV-034

    Structured data reaches vocabulary saturation within 64 KB. Random data does not saturate in that range.

    Pass
  • C-DISCOV-035

    For fixed symbol width, the engine's vocabulary is exactly the set of patterns present in the input. No phantoms, no missing.

    Pass
  • C-DISCOV-036

    Joining two byte-aligned segments produces a vocabulary equal to the union of the two parts. A formal closure property.

    Pass
  • C-DISCOV-037

    Reversing the order of fixed-width blocks produces the same vocabulary but a reversed timeline. Vocabulary captures what; timeline captures where.

    Pass
  • C-DISCOV-038

    The discovery-rate curve correlates with established complexity measures and separates isoentropy inputs.

    Pass
  • C-DISCOV-039

    Vocabulary growth is not a strict power law for structured data, but it separates structured from random by orders of magnitude.

    Pass
  • C-DISCOV-040

    Primitive frequency is not strictly Zipfian, but the fitted Zipf exponent strongly discriminates by data type.

    Pass
  • C-DISCOV-041

    Timeline autocorrelation peaks at the structural period for periodic data and is flat for random data.

    Pass
  • C-DISCOV-042

    As sample size grows, two independent vocabularies converge toward the same full alphabet. Convergence is monotonic for structured data.

    Pass
  • C-DISCOV-043

    Vocabulary saturation has a critical point that varies by data type. Random data saturates too, but very late.

    Pass
  • C-DISCOV-044

    Auto-curve picks symbol widths around 14 to 16 regardless of the natural unit size. It optimises for entropy, not structural boundaries.

    Pass
  • C-DISCOV-045

    A 5D structural complexity profile classifies five data types with 100% accuracy on 250 samples (zero-shot, nearest-centroid).

    See the case study →
    Pass
  • C-DISCOV-047

    At a 17-bit non-byte-aligned pattern, the engine separates structured from shuffled with AUC = 0.98. Gzip stays at chance.

    See the case study →
    Pass

Method-layer governance

2 claims

  • C-METHOD-001

    The decision pipeline produces a 14-stage gate-validated, hash-chained ledger entry for every analysis request.

    Pass
  • C-METHOD-002

    On five compound tasks (detect+localise+replay, audit chain, single-pass, vocabulary continuity, per-primitive trail), the engine passes all five. Three baselines fail at least two each.

    Pass

Bit-flip audit

1 claim

  • C-AUDIT-001

    A single-byte flip in a structured file produces a deterministic, measurable change in the structural record.

    Pass

Statistical characterisation

2 claims

  • C-STAT-001

    Effect sizes characterised honestly. Gzip's effect size on this task exceeds the engine's, and we report it.

    Pass
  • C-STAT-002

    Isoentropy discrimination mapped across 5 lengths and 3 widths. All 15 cells exceed AUC = 0.70.

    Pass

Performance

1 claim

  • C-PERF-001

    Ingest time scales roughly linearly with input size. The engine is 13 to 21 times slower than gzip and 13 to 21 times larger in storage. Honest characterisation.

    Pass

Novel-frontier research

23 claims

  • C-NOVEL-001

    Inputs whose bit length is not a multiple of the symbol width still replay correctly via padding.

    Pass
  • C-NOVEL-002

    Symbol width 1 round-trips: 1 or 2 primitives, timeline length equals the bit count.

    Pass
  • C-NOVEL-003

    Symbol width 128 is accepted and replays correctly across 50 inputs of varying length.

    Pass
  • C-NOVEL-004

    At width 4, histogram-matched permutations produce near-identical vocabularies but distinct seeds.

    Pass
  • C-NOVEL-005

    The engine's signature is identical across selection widths (anchor uses width 8). Selection width still drives ledger state.

    Pass
  • C-NOVEL-006

    Inputs with all-zero or all-one suffixes round-trip exactly at non-aligned widths.

    Pass
  • C-NOVEL-007

    All-zero inputs produce a fixed sentinel signature. A single set bit anywhere produces a non-sentinel signature.

    Pass
  • C-NOVEL-008

    Initial primitive count is fixed at ledger construction. Scrubbing never mutates it. Discovery rate after scrub stays consistent.

    Pass
  • C-NOVEL-009

    With zero_point=False and simulate=True, the engine loads a persisted ledger and writes nothing.

    Pass
  • C-NOVEL-010

    Save and load of an empty ledger is idempotent. A reloaded empty engine matches a fresh one.

    Pass
  • C-NOVEL-011

    Eight independent engines on the same input produce byte-identical seeds and ledger files. Wall-clock under 4x single-thread.

    Pass
  • C-NOVEL-012

    Batch signature calls produce identical per-item dicts to serial calls across 200 diverse inputs.

    Pass
  • C-NOVEL-013

    NFC vs NFD Unicode normalisation produces distinct byte sequences and distinct seeds. The semantic pipeline does not collapse them.

    Pass
  • C-NOVEL-014

    For the same paragraph in UTF-8, UTF-16LE, UTF-16BE, UTF-32LE, the reuse-ratio ordering is preserved (20 of 20 paragraphs).

    Pass
  • C-NOVEL-015

    Decision-hash matches an external SHA-256 of canonical JSON. Tampered ledger entries are detected on re-read.

    Pass
  • C-NOVEL-016

    The decision pipeline always produces stages_executed in canonical order across 50 diverse requests.

    Pass
  • C-NOVEL-017

    The semantic pipeline terminates deterministically on inputs stacked with 6+ noise transforms.

    Pass
  • C-NOVEL-018

    Licence-status active flag matches a 14-day grace window around days_remaining = 0.

    Pass
  • C-NOVEL-019

    The causality gate's verdicts match an external re-derivation across 20 adversarial requests.

    Pass
  • C-NOVEL-020

    The adaptive-floor switch (between 2 and 16) is observable when the memory watermark changes for the same low-entropy inputs.

    Pass
  • C-NOVEL-021

    Across 1 million 128-byte inputs at fixed-8, seed-collision distribution is characterised. Max cluster size stays below 1 million.

    Pass
  • C-NOVEL-022

    At fixed-16 width, the engine separates Rule-30 streams from PRNG streams by discovery rate. Byte-histogram entropy cannot.

    Pass
  • C-NOVEL-023

    Pi, e, root-2 in binary produce statistically indistinguishable discovery rates. Consistent with the normality conjecture.

    Pass

UPFM hypothesis claims

6 claims

  • C-UPFM-001a

    Per-window discovery-rate decay follows an exponential shape and dominates a coupon-collector null on every tested domain.

    Pass
  • C-UPFM-001b

    After normalising by entropy, pattern depth, and reuse, the decay constant collapses across admissible domains within ~6x.

    Pass
  • C-UPFM-002

    Leading-padded vs trailing-padded inputs produce distinct primitive stores on 6 of 7 domains.

    Pass
  • C-UPFM-004c

    Engineered multi-scale inputs hit the author-specified cross-scale coupling floor on all 5 fractal/harmonic types.

    Pass
  • C-UPFM-006

    Scrub reduction ratio correlates with byte entropy after controlling for input length.

    Pass
  • C-UPFM-007b

    Within the engine's declared noise-class contract, semantic composition converges on 97.97% of trials. The pre-registered gate is 98%, so 1 trial short.

    Near-pass