bleu — quality + safety report

Name: bleu — quality + safety report
Item: bleu
Rating: 92
Author: Skillproof

In the Skillier index (davila7__bleu) · scanned 2026-06-03 · engine: builtin+triage

Quality

92/100

Safety

3 heuristic flags to review

Heuristic flags from the builtin scanner, which is known to over-flag (it trips on legitimate env-reading integrations, security skills, and library .eval calls). This is NOT an authoritative malicious verdict — re-scan with SkillSpector for the authoritative result. Run the authoritative scan →

📇 This skill is in the Skillier index (curated · deduped · quality-filtered). Install Skillier to route & load it into your AI client.

Quality notes

Skill is large (~8602 tokens)

medium · quality · body

→ Tighten to the essential procedure; move long reference material to linked files.

About this skill

Use this skill whenever a developer wants to turn an idea into a complete, production-ready, end-to-end system plan BEFORE writing any code. Trigger on 'plan this system', 'design the architecture for', 'help me blueprint', 'deep plan for X', 'break this idea into components', 'expand into action…

📄 Read the SKILL.md

---
name: bleu
description: "Use this skill whenever a developer wants to turn an idea into a complete, production-ready, end-to-end system plan BEFORE writing any code. Trigger on 'plan this system', 'design the architecture for', 'help me blueprint', 'deep plan for X', 'break this idea into components', 'expand into action points', 'full implementation plan', or when the user pastes a project idea wanting architecture, components, pipelines, and file-level execution mapped out. Casual phrasing also triggers: 'help me think this through end-to-end', 'plan before coding'. Also covers living-workspace patterns: self-improving knowledge bases, reflection loops with auditor agents, four-agent teams, schema-as-code, wiki health scoring. **Resume triggers**: 'where did we leave off', 'continue this plan', 'resume my blueprint' - rehydrates state from disk via SESSION.md/NEXT.md/decisions/. Web research is mandatory every invocation."
license: MIT
metadata:
author: Nirvaan Lagishetty
version: "1.0.0"
source: https://github.com/Nirvaan05/Bleu-plugin
---

# Bleu

Turn an idea into a fully thought-through, deeply structured system plan - from architecture down to file-level execution - before any code is written. The output is a navigable knowledge base, not a single document: raw inputs compiled by an LLM into an interlinked markdown wiki, with lint passes to heal gaps. No RAG, no vector store, no embeddings - the whole plan fits in a modern context window and every claim is traceable to a file a human can open, edit, or delete.

The goal: by the end, the user can visualize the entire execution flow, catch expected-vs-actual mismatches early, and start implementation with zero ambiguity.

## Why this skill exists

Most "planning" with an LLM is one-shot: ask for an architecture, get a wall of text, lose it next session. This skill replaces that with a **persistent, LLM-maintained planning wiki** that grows, lints itself, and survives context resets. It's deliberately heavy on structure because the failure mode of light planning is discovering the architectural hole in week three.

The strongest single argument for the skill, worth memorizing:

> The tedious part of maintaining a knowledge base is not the reading or the thinking - it's the bookkeeping. Updating cross-references, keeping summaries current, noting when new data contradicts old claims, maintaining consistency across dozens of pages. Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass.

That's the bet. Every other design choice in this skill serves it. Frontliner teams that have adopted spec-driven workflows (PubNub, Effloow, EPAM) report that the **safe delegation window expands from 10–20 minute tasks to multi-hour feature delivery** once a real plan exists in files the agent can re-read. That's the value proposition: planning before code is what makes long-running autonomous work safe enough to actually leave running.

It also assumes the user wants ~38 action points (or thereabouts) - meaning the plan must be decomposed deeply enough that each AP is an executable unit with named files, named functions, and explicit dependencies. Anything vaguer than that and the skill isn't done yet. The number is a granularity guideline, not a quota - small projects should have fewer APs. The Phase 0 intake sizes the workflow to the project. Don't sledgehammer a nut.

For the deeper context behind every design choice, including citations to the frontliner research that informed this skill, see `references/landscape-research.md`.

## Operating principles

Hold these the entire time. They override any instinct to move faster.

- **Plan, don't code.** No implementation until the blueprint is signed off. If the user drifts toward "just start coding," remind them once, then comply if they insist.
- **Be proactively suggestive, not reactive.** Think like a system architect, a senior engineer, and a product thinker simultaneously. Challenge the user's assumptions where they're weak. If you spot a better approach, surface it with a comparison and a recommendation - don't wait to be asked.
- **Continuous web research is mandatory.** Not a one-time pass. Every phase researches what's relevant to that phase. Every claim that came from research gets a citation. See `references/research-and-citations.md`.
- **Files outlast context.** Everything goes into the planning workspace as markdown. The conversation is ephemeral; the workspace is the deliverable.
- **Treat the chat as stateless and the workspace as stateful.** Chats die - context windows fill, the user runs `/clear`, terminals crash. Anthropic's own Agent SDK docs are explicit on this: don't rely on session resume, capture results to disk and rehydrate from disk in fresh sessions. Every session ends with the persistence ritual (Phase R): journal entry, ADRs for any new architectural decisions, rewritten `SESSION.md` and `NEXT.md`. Every session starts by reading those same files first. See `references/session-persistence.md`.
- **Lint relentlessly.** Iterate until gaps, edge cases, and architectural flaws are surfaced and either resolved or explicitly logged as open questions. "Done" means the user agrees it's near-perfect, not that you ran out of ideas.
- **Adversarial evaluation, not self-evaluation.** Anthropic's harness research surfaced the canonical pitfall: "agents tend to respond by confidently praising the work - even when, to a human observer, the quality is obviously mediocre." Whenever this skill spawns a separate validator (Auditor, Linter, evaluator hook), it must be a different agent from the one that produced the work. Same agent both proposing and approving = self-praise. Production teams from Anthropic to PubNub enforce this strictly.
- **Write for the gap, not the overview.** ETH Zurich's AGENTbench paper (Feb 2026) found that LLM-generated CLAUDE.md files actively *reduce* coding agent success rates by ~3% and inflate cost by 20+%, because they restate things the agent could already infer from `package.json` and the README. The same principle applies inside the blueprint: when the Curator writes a plan file, every line should encode something the reader couldn't infer from the raw inputs. Every restated fact is taking attention away from a missing one.
- **Audit your harness as models improve.** From Anthropic's harness post: "Are you running complex context management because the model actually needs it, or because you designed the system six months ago when the model did need it?" When the user upgrades models, revisit which scaffolding is still load-bearing and which is dead weight. Sonnet 4.5 needed context resets; Opus 4.6 dropped them. Don't carry yesterday's workarounds into tomorrow's runs.
- **Contamination control.** Keep human-curated artifacts (`README.md`, ADRs, the actual codebase) separate from `blueprint/`. The blueprint is the LLM's domain - high volume, agent-edited, safe to rewrite. Mixing the two leads to either silent overwrites of human work or the agent treating its own output as ground truth.
- **Start simpler than you think you need to.** Across every frontline source, the loudest message is the same. Anthropic's "Building Effective Agents" post: "Most tasks need Pattern 1 (single specialist). Add complexity only when it demonstrably improves results." The Claude Code best-practices catalogue: *"Despite multi-agent systems being all the rage, Claude Code has just one main thread. I highly doubt your app needs a multi-agent system."* The base file-only workflow in this skill is the path for most blueprints. `references/claude-code-integration.md` and `references/advanced-architecture.md` exist for the cases where they actually pay off - substantial blueprints that will be revisited frequently - not as defaults.
- **Match granularity to scope.** From Augment's research, **multi-file tasks accuracy is ~19% versus single-function tasks at ~87%**. Smaller scope dramatically improves agent success rate. Anthropic's harness research adds: doubling task duration **quadruples** the failure rate, and every agent degrades after ~35 minutes of human time. The ~38 action point target assumes a substantial system; the Phase 0 intake explicitly chooses coarse decomposition (3–5 APs) for small jobs and fine decomposition (~38 APs) for greenfield systems. Don't sledgehammer a nut, and don't tweezer a tree.
- **Ground truth beats LLM opinion.** From the Anthropic agent-patterns catalog: *"Use test results, compiler output, linters - not just LLM self-evaluation - to validate work."* Whenever the Linter or Auditor runs, it should consult `.claude/rules/blueprint-schema.md` and the actual filesystem state, not vibe-check the proposals. Same applies to research: cite the source, don't paraphrase from memory.
- **The Curator owns the wiki, not you.** The core rule: *you rarely ever write or edit the wiki manually - it's the domain of the LLM.* If you ever find yourself writing `blueprint/plan/` files directly instead of using the Curator (or being the Curator yourself), something's gone wrong with the workflow. The user should be sourcing inputs and asking questions; the agent should be doing the bookkeeping.

## The workspace

This skill organizes the plan as a markdown knowledge base on disk - an evolving markdown library compiled and maintained by the LLM, with no vector DB, no chunking, no embeddings. Read `references/knowledge-base-pattern.md` before creating the workspace - it explains the layout and why it's shaped this way.

Default layout:

```
blueprint/
├── README.md ← entry point + how to navigate
├── SESSION.md ← current snapshot - read this FIRST on resume
├── NEXT.md ← imperative next actions - read SECOND on resume
├── journal.md ← append-only session history
├── index.md ← compact summary of every file (the "wiki index")
├── decisions/ ← MADR-style ADR log, append-only
│ ├── README.md ← ADR index with status table
│ ├── ADR-001-<slug>.md
│ ├── ADR-002-<slug>.md
│ └── ...
├── raw/ ← raw inputs: user transcript, research dumps, code excerpts, links
├── plan/
│ ├── 00-vision.md ← problem, goals, non-goals, success criteria
│ ├── 01-architecture.md ← system diagram, layers, data flow, key decisions
│ ├── 02-pipelines.md ← every pipeline/flow end-to-end
│ ├── 03-components/ ← one file per component
│ │ ├── component-name.md ← logic, responsibilities, dependencies, interfaces
│ │ └── ...
│ ├── 04-data-model.md ← entities, schemas, storage, migrations
│ ├── 05-integrations.md ← external services, APIs, auth, rate limits
│ ├── 06-non-functional.md ← perf, security, observability, cost, scaling
│ └── 07-risks-open-questions.md
├── action-points/ ← ~38 APs, one file each (AP-01.md … AP-38.md)
├── research/ ← web research notes with citations, one file per topic
└── outputs/ ← query responses and synthesized reports the user asked for
```

You don't have to materialize every folder upfront - create files as you go. But the structure should converge on this shape.

**Session persistence is non-negotiable.** `SESSION.md`, `NEXT.md`, `journal.md`, and `decisions/` exist so the workspace survives `/clear`, terminal crashes, and context-window resets. Treat the chat as stateless and the workspace as the source of truth - Anthropic's own Agent SDK docs recommend this over relying on built-in session resume. Every session ends with the persistence ritual (see Phase R below). Read `references/session-persistence.md` for the full pattern, ADR template, and resume protocol.

`outputs/` is the third top-level directory in the canonical `raw/` → `wiki/` → `outputs/` layout. The Cura

… (truncated)

Scan or optimize your own skill →

Want a live grade + an embeddable README badge? Run your skill through the free scanner.

Graded independently by Skillproof — nothing to sell the author. Quality is mechanical + corpus-grounded; safety flags are heuristic (builtin+triage), not a malicious verdict.