autonomous-agents — quality + safety report

Name: autonomous-agents — quality + safety report
Item: autonomous-agents
Rating: 88
Author: Skillproof

In the Skillier index (antigravity__autonomous-agents) · scanned 2026-06-03 · engine: builtin+triage

Quality

88/100

Safety

✓ Clean — no heuristic safety flags surfaced.

Heuristic flags from the builtin scanner, which is known to over-flag (it trips on legitimate env-reading integrations, security skills, and library .eval calls). This is NOT an authoritative malicious verdict — re-scan with SkillSpector for the authoritative result. Run the authoritative scan →

📇 This skill is in the Skillier index (curated · deduped · quality-filtered). Install Skillier to route & load it into your AI client.

Quality notes

Skill is large (~7356 tokens)

medium · quality · body

→ Tighten to the essential procedure; move long reference material to linked files.

No explicit trigger / 'when to use'

low · quality · body

→ Add a 'When to use' section or 'Use this when …' line listing trigger conditions.

No example

low · quality · body

→ Add at least one worked example (input → expected action/output).

About this skill

Autonomous agents are AI systems that can independently decompose

📄 Read the SKILL.md

---
name: autonomous-agents
description: Autonomous agents are AI systems that can independently decompose
  goals, plan actions, execute tools, and self-correct without constant human
  guidance. The challenge isn't making them capable - it's making them reliable.
  Every extra decision multiplies failure probability.
risk: unknown
source: vibeship-spawner-skills (Apache 2.0)
date_added: 2026-02-27
---

# Autonomous Agents

Autonomous agents are AI systems that can independently decompose goals,
plan actions, execute tools, and self-correct without constant human guidance.
The challenge isn't making them capable - it's making them reliable. Every
extra decision multiplies failure probability.

This skill covers agent loops (ReAct, Plan-Execute), goal decomposition,
reflection patterns, and production reliability. Key insight: compounding
error rates kill autonomous agents. A 95% success rate per step drops to
60% by step 10. Build for reliability first, autonomy second.

2025 lesson: The winners are constrained, domain-specific agents with clear
boundaries, not "autonomous everything." Treat AI outputs as proposals,
not truth.

## Principles

- Reliability over autonomy - every step compounds error probability
- Constrain scope - domain-specific beats general-purpose
- Treat outputs as proposals, not truth
- Build guardrails before expanding capabilities
- Human-in-the-loop for critical decisions is non-negotiable
- Log everything - every action must be auditable
- Fail safely with rollback, not silently with corruption

## Capabilities

- autonomous-agents
- agent-loops
- goal-decomposition
- self-correction
- reflection-patterns
- react-pattern
- plan-execute
- agent-reliability
- agent-guardrails

## Scope

- multi-agent-systems → multi-agent-orchestration
- tool-building → agent-tool-builder
- memory-systems → agent-memory-systems
- workflow-orchestration → workflow-automation

## Tooling

### Frameworks

- LangGraph - When: Production agents with state management Note: 1.0 released Oct 2025, checkpointing, human-in-loop
- AutoGPT - When: Research/experimentation, open-ended exploration Note: Needs external guardrails for production
- CrewAI - When: Role-based agent teams Note: Good for specialized agent collaboration
- Claude Agent SDK - When: Anthropic ecosystem agents Note: Computer use, tool execution

### Patterns

- ReAct - When: Reasoning + Acting in alternating steps Note: Foundation for most modern agents
- Plan-Execute - When: Separate planning from execution Note: Better for complex multi-step tasks
- Reflection - When: Self-evaluation and correction Note: Evaluator-optimizer loop

## Patterns

### ReAct Agent Loop

Alternating reasoning and action steps

**When to use**: Interactive problem-solving, tool use, exploration

# REACT PATTERN:

"""
The ReAct loop:
1. Thought: Reason about what to do next
2. Action: Choose and execute a tool
3. Observation: Receive result
4. Repeat until goal achieved

Key: Explicit reasoning traces make debugging possible
"""

## Basic ReAct Implementation
"""
from langchain.agents import create_react_agent
from langchain_openai import ChatOpenAI

# Define the ReAct prompt template
react_prompt = '''
Answer the question using the following format:

Question: the input question
Thought: reason about what to do
Action: tool_name
Action Input: input to the tool
Observation: result of the action
... (repeat Thought/Action/Observation as needed)
Thought: I now know the final answer
Final Answer: the answer
'''

# Create the agent
agent = create_react_agent(
    llm=ChatOpenAI(model="gpt-4o"),
    tools=tools,
    prompt=react_prompt,
)

# Execute with step limit
result = agent.invoke(
    {"input": query},
    config={"max_iterations": 10}  # Prevent runaway loops
)
"""

## LangGraph ReAct (Production)
"""
from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.postgres import PostgresSaver

# Production checkpointer
checkpointer = PostgresSaver.from_conn_string(
    os.environ["POSTGRES_URL"]
)

agent = create_react_agent(
    model=llm,
    tools=tools,
    checkpointer=checkpointer,  # Durable state
)

# Invoke with thread for state persistence
config = {"configurable": {"thread_id": "user-123"}}
result = agent.invoke({"messages": [query]}, config)
"""

### Plan-Execute Pattern

Separate planning phase from execution

**When to use**: Complex multi-step tasks, when full plan visibility matters

# PLAN-EXECUTE PATTERN:

"""
Two-phase approach:
1. Planning: Decompose goal into subtasks
2. Execution: Execute subtasks, potentially re-plan

Advantages:
- Full visibility into plan before execution
- Can validate/modify plan with human
- Cleaner separation of concerns

Disadvantages:
- Less adaptive to mid-task discoveries
- Plan may become stale
"""

## LangGraph Plan-Execute
"""
from langgraph.prebuilt import create_plan_and_execute_agent

# Planner creates the task list
planner_prompt = '''
For the given objective, create a step-by-step plan.
Each step should be atomic and actionable.
Format: numbered list of steps.
'''

# Executor handles individual steps
executor_prompt = '''
You are executing step {step_number} of the plan.
Previous results: {previous_results}
Current step: {current_step}
Execute this step using available tools.
'''

agent = create_plan_and_execute_agent(
    planner=planner_llm,
    executor=executor_llm,
    tools=tools,
    replan_on_error=True,  # Re-plan if step fails
)

# Human approval of plan
config = {
    "configurable": {
        "thread_id": "task-456",
    },
    "interrupt_before": ["execute"],  # Pause before execution
}

# First call creates plan
plan = agent.invoke({"objective": goal}, config)

# Review plan, then continue
if human_approves(plan):
    result = agent.invoke(None, config)  # Continue from checkpoint
"""

## Decomposition Strategies
"""
# Decomposition-First: Plan everything, then execute
# Best for: Stable tasks, need full plan approval

# Interleaved: Plan one step, execute, repeat
# Best for: Dynamic tasks, learning as you go

def interleaved_execute(goal, max_steps=10):
    state = {"goal": goal, "completed": [], "remaining": [goal]}

    for step in range(max_steps):
        # Plan next action based on current state
        next_action = planner.plan_next(state)

        if next_action == "DONE":
            break

        # Execute and update state
        result = executor.execute(next_action)
        state["completed"].append((next_action, result))

        # Re-evaluate remaining work
        state["remaining"] = planner.reassess(state)

    return state
"""

### Reflection Pattern

Self-evaluation and iterative improvement

**When to use**: Quality matters, complex outputs, creative tasks

# REFLECTION PATTERN:

"""
Self-correction loop:
1. Generate initial output
2. Evaluate against criteria
3. Critique and identify issues
4. Refine based on critique
5. Repeat until satisfactory

Also called: Evaluator-Optimizer, Self-Critique
"""

## Basic Reflection
"""
def reflect_and_improve(task, max_iterations=3):
    # Initial generation
    output = generator.generate(task)

    for i in range(max_iterations):
        # Evaluate output
        critique = evaluator.critique(
            task=task,
            output=output,
            criteria=[
                "Correctness",
                "Completeness",
                "Clarity",
            ]
        )

        if critique["passes_all"]:
            return output

        # Refine based on critique
        output = generator.refine(
            task=task,
            previous_output=output,
            critique=critique["feedback"],
        )

    return output  # Best effort after max iterations
"""

## LangGraph Reflection
"""
from langgraph.graph import StateGraph

def build_reflection_graph():
    graph = StateGraph(ReflectionState)

    # Nodes
    graph.add_node("generate", generate_node)
    graph.add_node("reflect", reflect_node)
    graph.add_node("output", output_node)

    # Edges
    graph.add_edge("generate", "reflect")
    graph.add_conditional_edges(
        "reflect",
        should_continue,
        {
            "continue": "generate",  # Loop back
            "end": "output",
        }
    )

    return graph.compile()

def should_continue(state):
    if state["iteration"] >= 3:
        return "end"
    if state["score"] >= 0.9:
        return "end"
    return "continue"
"""

## Separate Evaluator (More Robust)
"""
# Use different model for evaluation to avoid self-bias
generator = ChatOpenAI(model="gpt-4o")
evaluator = ChatOpenAI(model="gpt-4o-mini")  # Different perspective

# Or use specialized evaluators
from langchain.evaluation import load_evaluator
evaluator = load_evaluator("criteria", criteria="correctness")
"""

### Guardrailed Autonomy

Constrained agents with safety boundaries

**When to use**: Production systems, critical operations

# GUARDRAILED AUTONOMY:

"""
Production agents need multiple safety layers:
1. Input validation
2. Action constraints
3. Output validation
4. Cost limits
5. Human escalation
6. Rollback capability
"""

## Multi-Layer Guardrails
"""
class GuardedAgent:
    def __init__(self, agent, config):
        self.agent = agent
        self.max_cost = config.get("max_cost_usd", 1.0)
        self.max_steps = config.get("max_steps", 10)
        self.allowed_actions = config.get("allowed_actions", [])
        self.require_approval = config.get("require_approval", [])

    async def execute(self, goal):
        total_cost = 0
        steps = 0

        while steps < self.max_steps:
            # Get next action
            action = await self.agent.plan_next(goal)

            # Validate action is allowed
            if action.name not in self.allowed_actions:
                raise ActionNotAllowedError(action.name)

            # Check if approval needed
            if action.name in self.require_approval:
                approved = await self.request_human_approval(action)
                if not approved:
                    return {"status": "rejected", "action": action}

            # Estimate cost
            estimated_cost = self.estimate_cost(action)
            if total_cost + estimated_cost > self.max_cost:
                raise CostLimitExceededError(total_cost)

            # Execute with rollback capability
            checkpoint = await self.save_checkpoint()
            try:
                result = await self.agent.execute(action)
                total_cost += self.actual_cost(action)
                steps += 1
            except Exception as e:
                await self.rollback_to(checkpoint)
                raise

            if result.is_complete:
                break

        return {"status": "complete", "total_cost": total_cost}
"""

## Least Privilege Principle
"""
# Define minimal permissions per task type
TASK_PERMISSIONS = {
    "research": ["web_search", "read_file"],
    "coding": ["read_file", "write_file", "run_tests"],
    "admin": ["all"],  # Rarely grant this
}

def create_scoped_agent(task_type):
    allowed = TASK_PERMISSIONS.get(task_type, [])
    tools = [t for t in ALL_TOOLS if t.name in allowed]
    return Agent(tools=tools)
"""

## Cost Control
"""
# Context length grows quadratically in cost
# Double context = 4x cost

def trim_context(messages, max_tokens=4000):
    # Keep system message and recent messages
    system = messages[0]
    recent = messages[-10:]

    # Summarize middle if needed
    if len(messages) > 11:
        middle = messages[1:-10]
        summary = summarize(middle)
        return [system, summary] + recent

    return messages
"""

### Durable Execution Pattern

Agents that survive failures and resume

**When to use**: Long-running tasks, production systems, multi-day processes

# DURABLE EXECUTION:

"""
Production agents must:
- Survive server restarts
- Resume from exact point of failure
- Handle hours/days of runtime
- Allow human intervention mid-process

LangGraph 1.0 provides this nativel

… (truncated)

Scan or optimize your own skill →

Want a live grade + an embeddable README badge? Run your skill through the free scanner.

Graded independently by Skillproof — nothing to sell the author. Quality is mechanical + corpus-grounded; safety flags are heuristic (builtin+triage), not a malicious verdict.