clarity-gate — quality + safety report
In the Skillier index (antigravity__clarity-gate) · scanned 2026-06-03 · engine: builtin+triage
1 heuristic flag to review
Heuristic flags from the builtin scanner, which is known to over-flag (it trips on legitimate env-reading integrations, security skills, and library .eval calls). This is NOT an authoritative malicious verdict — re-scan with SkillSpector for the authoritative result. Run the authoritative scan →
📇 This skill is in the Skillier index (curated · deduped · quality-filtered). Install Skillier to route & load it into your AI client.
Quality notes
About this skill
Pre-ingestion verification for epistemic quality in RAG systems. Ensures documents are properly qualified before entering knowledge bases. Produces CGD Clarity-Gated Documents and validates SOT Source of Truth files.
📄 Read the SKILL.md
---
# agentskills.io compliant frontmatter
name: clarity-gate
risk: unknown
source: community
version: 2.1.3
description: >
Pre-ingestion verification for epistemic quality in RAG systems.
Ensures documents are properly qualified before entering knowledge bases.
Produces CGD (Clarity-Gated Documents) and validates SOT (Source of Truth) files.
author: Francesco Marinoni Moretto
license: CC-BY-4.0
repository: https://github.com/frmoretto/clarity-gate
triggers:
- clarity gate
- check for hallucination risks
- can an LLM read this safely
- review for equivocation
- verify document clarity
- pre-ingestion check
- cgd verify
- sot verify
capabilities:
- document-verification
- epistemic-quality
- rag-preparation
- cgd-generation
- sot-validation
outputs:
- type: cgd
extension: .cgd.md
spec: docs/CLARITY_GATE_FORMAT_SPEC.md
spec_version: "2.1"
---
# Clarity Gate v2.1
**Purpose:** Pre-ingestion verification system that enforces epistemic quality before documents enter RAG knowledge bases. Produces Clarity-Gated Documents (CGD) compliant with the Clarity Gate Format Specification v2.1.
**Core Question:** "If another LLM reads this document, will it mistake assumptions for facts?"
**Core Principle:** *"Detection finds what is; enforcement ensures what should be. In practice: find the missing uncertainty markers before they become confident hallucinations."*
---
## What's New in v2.1
| Feature | Description |
|---------|-------------|
| **Claim Completion Status** | PENDING/VERIFIED determined by field presence (no explicit status field) |
| **Source Field Semantics** | Actionable source (PENDING) vs. what-was-found (VERIFIED) |
| **Claim ID Format Guidance** | Hash-based IDs preferred, collision analysis for scale |
| **Body Structure Requirements** | HITL Verification Record section mandatory when claims exist |
| **New Validation Codes** | E-ST10, W-ST11, W-HC01, W-HC02, E-SC06 (FORMAT_SPEC); E-TB01-07 (SOT validation) |
| **Bundled Scripts** | `claim_id.py` and `document_hash.py` for deterministic computations |
---
## Specifications
This skill implements and references:
| Specification | Version | Location |
|---------------|---------|----------|
| Clarity Gate Format (Unified) | v2.1 | docs/CLARITY_GATE_FORMAT_SPEC.md |
**Note:** v2.0 unifies CGD and SOT into a single `.cgd.md` format. SOT is now a CGD with an optional `tier:` block.
---
## Validation Codes
Clarity Gate defines validation codes for structural and semantic checks per FORMAT_SPEC v2.1:
### HITL Claim Validation (§1.3.2-1.3.3)
| Code | Check | Severity |
|------|-------|----------|
| **W-HC01** | Partial `confirmed-by`/`confirmed-date` fields | WARNING |
| **W-HC02** | Vague source (e.g., "industry reports", "TBD") | WARNING |
| **E-SC06** | Schema error in `hitl-claims` structure | ERROR |
### Body Structure (§1.2.1)
| Code | Check | Severity |
|------|-------|----------|
| **E-ST10** | Missing `## HITL Verification Record` when claims exist | ERROR |
| **W-ST11** | Table rows don't match `hitl-claims` count | WARNING |
### SOT Table Validation (§3.1)
| Code | Check | Severity |
|------|-------|----------|
| **E-TB01** | No `## Verified Claims` section | ERROR |
| **E-TB02** | Table has no data rows | ERROR |
| **E-TB03** | Required columns missing | ERROR |
| **E-TB04** | Column order wrong | ERROR |
| **E-TB05** | Empty cell in required column | ERROR |
| **E-TB06** | Invalid date format in Verified column | ERROR |
| **E-TB07** | Verified date in future (beyond 24h grace) | ERROR |
**Note:** Additional validation codes may be defined in RFC-001 (clarification document) but are not part of the normative FORMAT_SPEC.
---
## Bundled Scripts
This skill includes Python scripts for deterministic computations per FORMAT_SPEC.
### scripts/claim_id.py
Computes stable, hash-based claim IDs for HITL tracking (per §1.3.4).
```bash
# Generate claim ID
python scripts/claim_id.py "Base price is $99/mo" "api-pricing/1"
# Output: claim-75fb137a
# Run test vectors
python scripts/claim_id.py --test
```
**Algorithm:**
1. Normalize text (strip + collapse whitespace)
2. Concatenate with location using pipe delimiter
3. SHA-256 hash, take first 8 hex chars
4. Prefix with "claim-"
**Test vectors:**
- `claim_id("Base price is $99/mo", "api-pricing/1")` → `claim-75fb137a`
- `claim_id("The API supports GraphQL", "features/1")` → `claim-eb357742`
### scripts/document_hash.py
Computes document SHA-256 hash per FORMAT_SPEC §2.2-2.4 with full canonicalization.
```bash
# Compute hash
python scripts/document_hash.py my-doc.cgd.md
# Output: 7d865e959b2466918c9863afca942d0fb89d7c9ac0c99bafc3749504ded97730
# Verify existing hash
python scripts/document_hash.py --verify my-doc.cgd.md
# Output: PASS: Hash verified: 7d865e...
# Run normalization tests
python scripts/document_hash.py --test
```
**Algorithm (per §2.2-2.4):**
1. Extract content between opening `---\n` and `<!-- CLARITY_GATE_END -->`
2. Remove `document-sha256` line from YAML frontmatter ONLY (with multiline continuation support)
3. Canonicalize:
- Strip trailing whitespace per line
- Collapse 3+ consecutive newlines to 2
- Normalize final newline (exactly 1 LF)
- UTF-8 NFC normalization
4. Compute SHA-256
**Cross-platform normalization:**
- BOM removed if present
- CRLF to LF (Windows)
- CR to LF (old Mac)
- Boundary detection (prevents hash computation on content outside CGD structure)
- Whitespace variations produce identical hashes (deterministic across platforms)
---
## The Key Distinction
Existing tools like UnScientify and HedgeHunter (CoNLL-2010) **detect** uncertainty markers already present in text ("Is uncertainty expressed?").
Clarity Gate **enforces** their presence where epistemically required ("Should uncertainty be expressed but isn't?").
| Tool Type | Question | Example |
|-----------|----------|---------|
| **Detection** | "Does this text contain hedges?" | UnScientify/HedgeHunter find "may", "possibly" |
| **Enforcement** | "Should this claim be hedged but isn't?" | Clarity Gate flags "Revenue will be $50M" |
---
## Critical Limitation
> **Clarity Gate verifies FORM, not TRUTH.**
>
> This skill checks whether claims are properly marked as uncertain—it cannot verify if claims are actually true.
>
> **Risk:** An LLM can hallucinate facts INTO a document, then "pass" Clarity Gate by adding source markers to false claims.
>
> **Solution:** HITL (Human-In-The-Loop) verification is **MANDATORY** before declaring PASS.
---
## When to Use
- Before ingesting documents into RAG systems
- Before sharing documents with other AI systems
- After writing specifications, state docs, or methodology descriptions
- When a document contains projections, estimates, or hypotheses
- Before publishing claims that haven't been validated
- When handing off documentation between LLM sessions
---
## The 9 Verification Points
### Relationship to Spec Suite
The 9 Verification Points guide **semantic review** — content quality checks that require judgment (human or AI). They answer questions like "Should this claim be hedged?" and "Are these numbers consistent?"
When review completes, output a CGD file conforming to CLARITY_GATE_FORMAT_SPEC.md. The C/S rules in CLARITY_GATE_FORMAT_SPEC.md validate **file structure**, not semantic content.
**The connection:**
1. Semantic findings (9 points) determine what issues exist
2. Issues are recorded in CGD state fields (`clarity-status`, `hitl-status`, `hitl-pending-count`)
3. State consistency is enforced by structural rules (C7-C10)
*Example: If Point 5 (Data Consistency) finds conflicting numbers, you'd mark `clarity-status: UNCLEAR` until resolved. Rule C7 then ensures you can't claim `REVIEWED` while still `UNCLEAR`.*
---
### Epistemic Checks (Core Focus: Points 1-4)
**1. HYPOTHESIS vs FACT LABELING**
Every claim must be clearly marked as validated or hypothetical.
| Fails | Passes |
|-------|--------|
| "Our architecture outperforms competitors" | "Our architecture outperforms competitors [benchmark data in Table 3]" |
| "The model achieves 40% improvement" | "The model achieves 40% improvement [measured on dataset X]" |
**Fix:** Add markers: "PROJECTED:", "HYPOTHESIS:", "UNTESTED:", "(estimated)", "~", "?"
---
**2. UNCERTAINTY MARKER ENFORCEMENT**
Forward-looking statements require qualifiers.
| Fails | Passes |
|-------|--------|
| "Revenue will be $50M by Q4" | "Revenue is **projected** to be $50M by Q4" |
| "The feature will reduce churn" | "The feature is **expected** to reduce churn" |
**Fix:** Add "projected", "estimated", "expected", "designed to", "intended to"
---
**3. ASSUMPTION VISIBILITY**
Implicit assumptions that affect interpretation must be explicit.
| Fails | Passes |
|-------|--------|
| "The system scales linearly" | "The system scales linearly [assuming <1000 concurrent users]" |
| "Response time is 50ms" | "Response time is 50ms [under standard load conditions]" |
**Fix:** Add bracketed conditions: "[assuming X]", "[under conditions Y]", "[when Z]"
---
**4. AUTHORITATIVE-LOOKING UNVALIDATED DATA**
Tables with specific percentages and checkmarks look like measured data.
**Red flag:** Tables with specific numbers (89%, 95%, 100%) without sources
**Fix:** Add "(guess)", "(est.)", "?" to numbers. Add explicit warning: "PROJECTED VALUES - NOT MEASURED"
---
### Data Quality Checks (Complementary: Points 5-7)
**5. DATA CONSISTENCY**
Scan for conflicting numbers, dates, or facts within the document.
**Red flag:** "500 users" in one section, "750 users" in another
**Fix:** Reconcile conflicts or explicitly note the discrepancy with explanation.
---
**6. IMPLICIT CAUSATION**
Claims that imply causation without evidence.
**Red flag:** "Shorter prompts improve response quality" (plausible but unproven)
**Fix:** Reframe as hypothesis: "Shorter prompts MAY improve response quality (hypothesis, not validated)"
---
**7. FUTURE STATE AS PRESENT**
Describing planned/hoped outcomes as if already achieved.
**Red flag:** "The system processes 10,000 requests per second" (when it hasn't been built)
**Fix:** Use future/conditional: "The system is DESIGNED TO process..." or "TARGET: 10,000 rps"
---
### Verification Routing (Points 8-9)
**8. TEMPORAL COHERENCE**
Document dates and timestamps must be internally consistent and plausible.
| Fails | Passes |
|-------|--------|
| "Last Updated: December 2024" (when current is 2026) | "Last Updated: January 2026" |
| v1.0.0 dated 2024-12-23, v1.1.0 dated 2024-12-20 | Versions in chronological order |
**Sub-checks:**
1. Document date vs current date
2. Internal chronology (versions, events in order)
3. Reference freshness ("current", "now", "today" claims)
**Fix:** Update dates, add "as of [date]" qualifiers, flag stale claims
---
**9. EXTERNALLY VERIFIABLE CLAIMS**
Specific numbers that could be fact-checked should be flagged for verification.
| Type | Example | Risk |
|------|---------|------|
| Pricing | "Costs ~$0.005 per call" | API pricing changes |
| Statistics | "Papers average 15-30 equations" | May be wildly off |
| Rates/ratios | "40% of researchers use X" | Needs citation |
| Competitor claims | "No competitor offers Y" | May be outdated |
**Fix options:**
1. Add source with date
2. Add uncertainty marker
3. Route to HITL or external search
4. Generalize ("low cost" instead of "$0.005")
---
## The Verification Hierarchy
```
Claim Extracted --> Does Source of Truth Exist?
|
+---------------+---------------+
YES NO
| |
Tier 1: Automated Tier 2: HITL
Consistency & Verification Two-Round Verification
| |
PASS / BLOCK Round A → Round B → APPROVE / REJECT
```
### Tier 1: Automated Verification
**A. Internal Consistency**
- Figure vs. Text contradictions
… (truncated)Want a live grade + an embeddable README badge? Run your skill through the free scanner.
Graded independently by Skillproof — nothing to sell the author. Quality is mechanical + corpus-grounded; safety flags are heuristic (builtin+triage), not a malicious verdict.