Transparency Report
Where the dataset is, what we know it can and can't do, what we are working on next, and what we will not do. Generated at build time from the live dataset and the most recent audit-script run.
Dataset at a glance
- 683 group profiles across 15 category labels
- 258 glossary terms covering the BITE model, Lifton's eight criteria, Lalich's bounded choice, recovery vocabulary, and group-specific terminology
- 22 long-form blog posts
- 5 trust-and-policy pages (this is one of them)
- 4 audience-facing tools (self-assessment, compare, quizzes, courses)
CLCI band distribution
| Band | Range | Count | Share |
|---|---|---|---|
| Extreme | 31–40 | 81 | 11.9% |
| High | 21–30 | 215 | 31.5% |
| Moderate | 11–20 | 228 | 33.4% |
| Low / reference | 0–10 | 159 | 23.3% |
Confidence rating distribution
- High confidence: 339 (49.6%) — court records, government investigations, peer-reviewed academic work, multiple long-form investigations
- Medium confidence: 252 (36.9%) — credible journalism plus testimony, limited primary record
- Low confidence: 92 (13.5%) — fragmentary anecdotal reports or contested claims, ratings marked provisional
Editorial quality tier distribution
Computed by the Round 33 content-quality audit script. Quality tier drives the Round 38 noindex roll-out — entries marked thin or merge-candidate will get robots: noindex until upgraded.
| Tier | Count | Share |
|---|---|---|
complete | 212 | 31.0% |
needs-sources | 207 | 30.3% |
thin | 254 | 37.2% |
merge-candidate | 10 | 1.5% |
noindex | 0 | 0.0% |
Last audit run: 19 May 2026 at 06:48.
Freshness
- 94 entries reviewed this calendar year
- 94 of 683 entries carry a
lastRevieweddate
We aim to factually re-review every entry within 24 months. Entries with newer court findings, government actions, or major investigative reporting take priority.
What we have done in the last 12 months
- Built out the dataset from ~150 to 683 entries across 32 named “rounds” of audit-and-expansion work.
- Added 5 trust-and-policy pages, this transparency report, and an AI-use-disclosure page (Round 32-33).
- Added Source / EvidenceItem / ScoreJustification data-model scaffolding (Round 32) for the Round 38 per-entry structured-source migration.
- Split the sitemap into 5 named shards, added WebSite, Organization, FAQPage, DefinedTerm JSON-LD (Round 32).
- Fixed a crawl-discovery bug that limited static HTML to 60 group cards instead of all 683 (Round 32).
What we have not done yet
- Source-URL enrichment — only 0.3% of source strings currently bear URLs; the rest are name-only citations to court records, books, and journalism that we have not yet linked.
- The 7 audience hubs and 24 topic hubs proposed in the long-term roadmap.
- Country-specific landing pages (the brief proposes 15).
- The 100-post blog expansion across 6 thematic clusters.
- Formal duplicate-entry merges — 5 high-confidence merge candidates flagged by the Round 33 audit have not yet been merged or alias-linked.
Correction handling (the past year)
We do not have automated tooling to count correction requests yet — this section will become numeric once the corrections workflow stabilises in the GitHub issue tracker. Until then, the substantive practice has been:
- Every correction received is acknowledged within seven days.
- Routine fact corrections (dates, names, role changes) typically process within two weeks.
- Score-change requests require a substantive evidence comparison case and may take longer.
- No correction request has been declined for political or organisational reasons in the past year.
Known limitations
- Heuristic source classification. The Round 33 source audit classifies sources by regex; about 61% of source strings fall into the “other” bucket because their format doesn't match any of the more-specific patterns. Round 38 per-entry migration will replace this with formal
sourceTypeassignment. - English-language dominance. The dataset over-indexes on English-language scholarship and journalism. Korean, Japanese, Indian, and Latin American entries draw on translated and English-language sources but not native-language primary material.
- US- and UK-jurisdiction bias. Court-record and government-source coverage is strongest for these jurisdictions. Coverage of Asian and African legal systems is thinner.
- Selection bias in what gets covered. Highly-publicised cases are easier to source than equally harmful but unpublicised ones. The dataset is more comprehensive at the top of the spectrum (CLCI 30+) than at the bottom.
- No interview-based primary research. We rely on publicly-available material; we do not conduct primary survivor interviews ourselves.
Roadmap
The forward roadmap is documented in the audit files under /docs/. High-level priorities:
- Round 34 — Profile redesign with Evidence Matrix and per-axis Score Justification cards (depends on Round 32 data-model extension).
- Round 35 — Audience hubs for family, recovery, researchers, journalists, students (5 hubs × ~10 sub-pages each).
- Round 36 — Topic hubs and country hubs.
- Round 37 — Static tools (BITE self-assessment v2, comparison, red-flag checklist, conversation builder, etc.) and first 20 of the 100-post blog expansion.
- Round 38 — Per-entry source-URL enrichment and
qualityTiernoindex roll-out. - Round 39+ — llms-files split, sensitivity audit pass, remaining blog content.
Funding and conflicts
CLCI Hub is independently operated and does not accept advertising. There are no sponsors, no affiliate revenue, and no advocacy-organisation funding. Operational costs are covered by the operator. We do not accept gifts, paid placements, or promoted listings of any kind. Trust-architecture details are documented in the Editorial Policy.
This page is regenerated on every site build. Numbers reflect the dataset state at build time. See also: Editorial Policy · Source Policy · Corrections · AI Use Disclosure