Transparency Report

Where the dataset is, what we know it can and can't do, what we are working on next, and what we will not do. Generated at build time from the live dataset and the most recent audit-script run.

Dataset at a glance

753 group profiles across 15 category labels
276 glossary termscovering the BITE model, Lifton's eight criteria, Lalich's bounded choice, recovery vocabulary, and group-specific terminology
28 long-form blog posts
5 trust-and-policy pages (this is one of them)
4 audience-facing tools (self-assessment, compare, quizzes, courses)

CLCI band distribution

Band	Range	Count	Share
Extreme	31–40	99	13.1%
High	21–30	255	33.9%
Moderate	11–20	240	31.9%
Low / reference	0–10	159	21.1%

Confidence rating distribution

High confidence: 374 (49.7%) — court records, government investigations, peer-reviewed academic work, multiple long-form investigations
Medium confidence: 282 (37.5%) — credible journalism plus testimony, limited primary record
Low confidence: 97 (12.9%) — fragmentary anecdotal reports or contested claims, ratings marked provisional

Editorial quality tier distribution

Computed by the Round 33 content-quality audit script. Quality tier drives the Round 38 noindex roll-out — entries marked thin or merge-candidate will get robots: noindex until upgraded.

Tier	Count	Share
`complete`	710	94.3%
`needs-sources`	43	5.7%
`thin`	0	0.0%
`merge-candidate`	0	0.0%
`noindex`	0	0.0%

Last audit run: 27 July 2026 at 11:27.

Freshness

753 entries reviewed this calendar year
753 of 753 entries carry a lastReviewed date

We aim to factually re-review every entry within 24 months. Entries with newer court findings, government actions, or major investigative reporting take priority.

What we have done in the last 12 months

Built out the dataset from ~150 to 753 entries across 32 named “rounds” of audit-and-expansion work.
Added 5 trust-and-policy pages, this transparency report, and an AI-use-disclosure page (Round 32-33).
Added Source / EvidenceItem / ScoreJustification data-model scaffolding (Round 32) for the Round 38 per-entry structured-source migration.
Split the sitemap into 5 named shards, added WebSite, Organization, FAQPage, DefinedTerm JSON-LD (Round 32).
Fixed a crawl-discovery bug that limited static HTML to 60 group cards instead of all 753 (Round 32).

What we have not done yet

Source-URL enrichment — only 0.3% of source strings currently bear URLs; the rest are name-only citations to court records, books, and journalism that we have not yet linked.
The 7 audience hubs and 24 topic hubs proposed in the long-term roadmap.
Country-specific landing pages (the brief proposes 15).
The 100-post blog expansion across 6 thematic clusters.
Formal duplicate-entry merges — 5 high-confidence merge candidates flagged by the Round 33 audit have not yet been merged or alias-linked.

Correction handling (the past year)

We do not have automated tooling to count correction requests yet — this section will become numeric once the corrections workflow stabilises in the GitHub issue tracker. Until then, the substantive practice has been:

Every correction received is acknowledged within seven days.
Routine fact corrections (dates, names, role changes) typically process within two weeks.
Score-change requests require a substantive evidence comparison case and may take longer.
No correction request has been declined for political or organisational reasons in the past year.

Known limitations

Heuristic source classification. The Round 33 source audit classifies sources by regex; about 61% of source strings fall into the “other” bucket because their format doesn't match any of the more-specific patterns. Round 38 per-entry migration will replace this with formal sourceType assignment.
English-language dominance. The dataset over-indexes on English-language scholarship and journalism. Korean, Japanese, Indian, and Latin American entries draw on translated and English-language sources but not native-language primary material.
US- and UK-jurisdiction bias. Court-record and government-source coverage is strongest for these jurisdictions. Coverage of Asian and African legal systems is thinner.
Selection bias in what gets covered. Highly-publicised cases are easier to source than equally harmful but unpublicised ones. The dataset is more comprehensive at the top of the spectrum (CLCI 30+) than at the bottom.
No interview-based primary research. We rely on publicly-available material; we do not conduct primary survivor interviews ourselves.

Roadmap

The forward roadmap is documented in the audit files under /docs/. High-level priorities:

Round 34 — Profile redesign with Evidence Matrix and per-axis Score Justification cards (depends on Round 32 data-model extension).
Round 35 — Audience hubs for family, recovery, researchers, journalists, students (5 hubs × ~10 sub-pages each).
Round 36 — Topic hubs and country hubs.
Round 37 — Static tools (BITE self-assessment v2, comparison, red-flag checklist, conversation builder, etc.) and first 20 of the 100-post blog expansion.
Round 38 — Per-entry source-URL enrichment and qualityTier noindex roll-out.
Round 39+ — llms-files split, sensitivity audit pass, remaining blog content.

Funding and conflicts

CLCI Hub is independently operated and does not accept advertising. There are no sponsors, no affiliate revenue, and no advocacy-organisation funding. Operational costs are covered by the operator. We do not accept gifts, paid placements, or promoted listings of any kind. Trust-architecture details are documented in the Editorial Policy.

This page is regenerated on every site build. Numbers reflect the dataset state at build time. See also: Editorial Policy · Source Policy · Corrections · AI Use Disclosure