tolerance-stack-analysis — quality + safety report

Name: tolerance-stack-analysis — quality + safety report
Item: tolerance-stack-analysis
Rating: 98
Author: Skillproof

In the Skillier index (local__tolerance-stack-analysis) · scanned 2026-06-03 · engine: builtin+triage

Quality

98/100

Safety

✓ Clean — no heuristic safety flags surfaced.

Heuristic flags from the builtin scanner, which is known to over-flag (it trips on legitimate env-reading integrations, security skills, and library .eval calls). This is NOT an authoritative malicious verdict — re-scan with SkillSpector for the authoritative result. Run the authoritative scan →

📇 This skill is in the Skillier index (curated · deduped · quality-filtered). Install Skillier to route & load it into your AI client.

Quality notes

No example

low · quality · body

→ Add at least one worked example (input → expected action/output).

About this skill

Force a tolerance-stack reframe whenever a user is staring at a system whose pain comes from too many parts multiplying their variances together. Use this skill aggressively for hardware design, mechanical assemblies, microservices-vs-monolith debates, build-system fragmentation, data-pipeline…

📄 Read the SKILL.md

---
name: tolerance-stack-analysis
description: Force a tolerance-stack reframe whenever a user is staring at a system whose pain comes from too many parts multiplying their variances together. Use this skill aggressively for hardware design, mechanical assemblies, microservices-vs-monolith debates, build-system fragmentation, data-pipeline stage proliferation, vendor-chain sprawl, or any "too many moving parts" complaint. Trigger on phrases like "we have fifty different services", "each team optimized their piece", "we picked the best part for each job", "the seams are killing us", or any moment a team is treating individually-correct local choices as if they sum to a correct global system. Also fires when a roadmap adds parts/services/stages instead of combining them, when interface code (adapters, sealants, shims, ETL glue) is growing faster than feature code, or when "best of breed per component" is the implicit strategy. Trigger eagerly even when the user does not name Musk or the framework.
stacks_with:
  - best-part-is-no-part
---

# Tolerance Stack Analysis

> "If you've got a whole bunch of separate parts and each of them has a given tolerance—even if that tolerance is tight, like 0.2 millimeter tolerance—but if you've got fifty parts…you have to multiply the variances together. You'll end up with a huge variance between cars. That's one of the reasons it's better to combine parts rather than have more individual parts."
> — Elon Musk, *The Book of Elon* (Chapter: Simplicity Wins (tolerance subsection))

## What this skill captures

Variance does not add — it multiplies. A system of fifty parts, each held to a tight individual tolerance, still produces a "huge variance" at the assembly level because the errors compound across every joint, interface, and handoff. Worse, when each part is independently optimized ("what's the best material for this one part?") you end up with dissimilar metals, mismatched interfaces, and a "Frankenstein situation" held together by rivets, spot welds, resin, and sealant. Musk's fix on Model Y was brutal: stop joining parts and cast the rear body as a single piece — "no gaps, no sealant, no dissimilar metals" — which deleted 30 percent of the body shop and 300 robots.

The same pattern shows up in software: fifty microservices each at 99.9% availability give you a system far below 99.9%; fifty pipeline stages each with a small data-loss rate compound; fifty vendor integrations each "within SLA" produce an unreliable product. The value you get from this skill: stop debating which individual part to tighten and start asking which parts to *delete by combining*.

## When to use this skill

- A hardware assembly meets every individual spec but the finished product rattles, leaks, drifts, or varies unit-to-unit.
- A microservices/SOA architecture is being debated, expanded, or defended on "separation of concerns" grounds while integration cost dominates roadmap.
- A data pipeline keeps growing stages (ingest → clean → enrich → dedupe → enrich-again → score → serve) and the seams are where failures live.
- A team is proudly picking "the best component for each job" and the integration code (adapters, shims, ETL, sealant, glue) is outgrowing the feature code.
- A vendor chain has too many links and each one is "within SLA" but the end-to-end SLA is unacceptable.
- A roadmap solves quality problems by tightening tolerances on individual parts instead of reducing part count.

## The how-to

1. **Count the parts and the interfaces.** Write down N (parts/services/stages) and the number of interfaces between them. Interfaces are where variance multiplies and where sealant lives.
   > "If you've got fifty parts…you have to multiply the variances together."
   > — *The Book of Elon*
   If you cannot enumerate the parts and interfaces in five minutes, the system is already too complex to reason about — that itself is the finding.

2. **Multiply the variances, do not average them.** For each part, write its individual error/failure/latency/availability rate. Then compound them across the chain (multiply availabilities, sum latencies, multiply error-free rates). The system-level number is almost always shocking versus the per-part number.
   > "Even if that tolerance is tight, like 0.2 millimeter tolerance…you'll end up with a huge variance."
   > — *The Book of Elon*
   This is the step that ends the "but each piece is within spec" defense.

3. **Name the Frankenstein — list every sealant, adapter, and shim.** Inventory the glue holding the parts together: rivets, spot welds, resin, sealant, adapters, ETL jobs, anti-corruption layers, schema translators, retry wrappers. Glue-to-part ratio is your tell.
   > "You need better sealant to prevent galvanic corrosion. You've got to join some with rivets, some with spot welds, some with resin, or resin and spot welds. Then it looks like a Frankenstein situation all together."
   > — *The Book of Elon*
   If the glue is growing faster than the parts, you are not maintaining a system, you are maintaining its seams.

4. **Diagnose the "right answers to the wrong questions" failure.** Ask whether each part was independently optimized in isolation. If fifty engineers each picked the "best material" / "best framework" / "best database" for their slice, you have fifty locally-correct decisions that are collectively wrong.
   > "Fifty different times for fifty different parts, an engineer would ask, 'What's the best material to make this part out of?' Of course, they would get fifty different answers. They were all true individually, but not true collectively."
   > — *The Book of Elon*
   Local optimization is how variance gets injected; you have to re-pose the question at the system level.

5. **Combine — design the single cast piece.** Identify which group of parts can be collapsed into one. The goal is not a tighter tolerance, it is a *deleted interface*. One piece eliminates the variance term entirely.
   > "It's way better to have a single piece, casted. Then you have no gaps, no sealant, no dissimilar metals."
   > — *The Book of Elon*
   The Model Y rear casting deleted 30% of the body shop and 300 robots — the win is not incremental, it is structural.

6. **Count what you deleted, not what you kept.** Track the part/service/stage count and the interface count as the headline metric of the refactor. More robots, more services, more stages are not figures of merit.
   > "There's roughly a thousand robots on the Model 3 body line. Which, by the way, is not a figure of merit…You want fewer things, not more."
   > — *The Book of Elon*
   If the redesign did not reduce N, you simplified the wrong layer.

## Common failure modes

- **Tightening tolerances instead of reducing parts.** Spending another quarter holding each component to a stricter spec when the real fix is to delete the joint between them.
- **"Best of breed per component" architecture.** Picking the optimal database, queue, framework, and vendor for each slice and discovering the integration is a Frankenstein. Locally true, collectively false.
- **Glue growth disguised as progress.** Adding adapters, anti-corruption layers, ETL jobs, schema translators, retry middleware — these are sealant, and sealant is evidence the design is wrong.
  > "Then it looks like a Frankenstein situation all together." — *The Book of Elon*
- **Counting parts as a virtue.** "We now have 200 microservices" / "1,000 robots on the line" treated as scale achievements rather than warning signs.
- **Mistaking per-component SLAs for system SLAs.** Defending the system because each piece "meets spec," ignoring that the variances multiply.

## When NOT to use this skill

- Genuinely independent systems with no shared output — uncorrelated parts do not compound variance into a single artifact.
- Early-stage prototypes where modularity is buying you cycle-time and the integration cost is still small; combine *after* you know which parts are load-bearing, not before.
- Regulated or safety-critical boundaries where the interface exists deliberately (e.g., a hardware isolator, a security trust boundary, an audited API surface). Deleting those interfaces creates risk, not simplicity.
- Organizational seams that exist because two teams must ship independently — the fix there is org design (Conway's Law), not a casting.

## Source

The Book of Elon by Eric Jorgenson (2026, Scribe Media). Chapter: "Simplicity Wins (tolerance subsection)" (in "Designing the Organization").

Scan or optimize your own skill →

Want a live grade + an embeddable README badge? Run your skill through the free scanner.

Graded independently by Skillproof — nothing to sell the author. Quality is mechanical + corpus-grounded; safety flags are heuristic (builtin+triage), not a malicious verdict.