Methodology limitations

The CLCI Hub dataset is intentionally an editorial reference, not a research instrument. This page documents what that distinction means in practice — what the dataset does well, what it does not, and the specific biases researchers should know about when using it.

Scoring is editorial, not measurement

CLCI scores are produced by reviewing public sources and assigning BITE-axis sub-scores plus a signed modifier. There is no inter-rater reliability study, no calibrated rubric beyond the per-axis descriptions, and no claim of measurement invariance across categories. Two careful reviewers might produce CLCI scores differing by 4–6 points for the same group. The scores are useful for relative orientation across groups; they are not a metric in the technical sense.

See /methodology/bite-model and /methodology/scoring-appeals for the framing this site uses.

Source-density flags are heuristic

The boolean fields hasCourtRecords, hasAcademicSources, hasInvestigativeJournalism, hasExMemberSources, and hasOfficialStatements were set by pattern-matching the free-text sources[] arrays for indicative keywords. They are an approximation; some entries will have, for example, court records cited under a non-matching phrase, and a subsequent editorial pass on structuredSources[] will correct individual entries.

Dataset coverage is uneven

Anglophone coverage is stronger than non-Anglophone coverage. Groups primarily reported in English-language sources are over-represented.
High-profile cases (Scientology, JWs, FLDS, NXIVM) have multi-decade source bases; new or regional groups frequently have only one or two sources.
The dataset has stronger coverage of Christian-tradition high-control groups than of other-faith and political-ideological categories.
Recent online-first groups are systematically under-represented because the source material is more recent and less consolidated.

Confidence ratings are coarse

The three-level confidence rating (High / Medium / Low) reflects editorial judgement about the volume and reliability of available sources. It is not a probability and does not have a numeric backing. A “high confidence” rating means the source base is substantial and reasonably converges; it does not mean the score is precise.

Per-entry structured evidence is sparse

The structured evidence[] and structuredSources[] fields are populated for a minority of entries; most entries still carry only the free-text sources[] array. Researchers requiring structured citation should treat the structured fields as an ongoing migration rather than a complete reference.

Editorial framing

CLCI Hub is written in a deliberately non-sensational editorial voice — see /editorial-policy. The dataset is not neutral in the strong sense; groups with documented patterns of coercive control are described as such. Where this framing affects the score, the scoreJustification field should make the editorial reasoning visible.

Right-of-reply is offered via /right-of-reply; corrections via /corrections.

Citing the dataset in academic work

See /research/citation-guide for the recommended citation form. We ask that researchers also cite the limitations documented here and avoid using CLCI scores as if they were calibrated measurements.