autonomous-agents — quality + safety report
In the Skillier index (antigravity__autonomous-agents) · scanned 2026-06-03 · engine: builtin+triage
✓ Clean — no heuristic safety flags surfaced.
Heuristic flags from the builtin scanner, which is known to over-flag (it trips on legitimate env-reading integrations, security skills, and library .eval calls). This is NOT an authoritative malicious verdict — re-scan with SkillSpector for the authoritative result. Run the authoritative scan →
📇 This skill is in the Skillier index (curated · deduped · quality-filtered). Install Skillier to route & load it into your AI client.
Quality notes
About this skill
Autonomous agents are AI systems that can independently decompose
📄 Read the SKILL.md
---
name: autonomous-agents
description: Autonomous agents are AI systems that can independently decompose
goals, plan actions, execute tools, and self-correct without constant human
guidance. The challenge isn't making them capable - it's making them reliable.
Every extra decision multiplies failure probability.
risk: unknown
source: vibeship-spawner-skills (Apache 2.0)
date_added: 2026-02-27
---
# Autonomous Agents
Autonomous agents are AI systems that can independently decompose goals,
plan actions, execute tools, and self-correct without constant human guidance.
The challenge isn't making them capable - it's making them reliable. Every
extra decision multiplies failure probability.
This skill covers agent loops (ReAct, Plan-Execute), goal decomposition,
reflection patterns, and production reliability. Key insight: compounding
error rates kill autonomous agents. A 95% success rate per step drops to
60% by step 10. Build for reliability first, autonomy second.
2025 lesson: The winners are constrained, domain-specific agents with clear
boundaries, not "autonomous everything." Treat AI outputs as proposals,
not truth.
## Principles
- Reliability over autonomy - every step compounds error probability
- Constrain scope - domain-specific beats general-purpose
- Treat outputs as proposals, not truth
- Build guardrails before expanding capabilities
- Human-in-the-loop for critical decisions is non-negotiable
- Log everything - every action must be auditable
- Fail safely with rollback, not silently with corruption
## Capabilities
- autonomous-agents
- agent-loops
- goal-decomposition
- self-correction
- reflection-patterns
- react-pattern
- plan-execute
- agent-reliability
- agent-guardrails
## Scope
- multi-agent-systems → multi-agent-orchestration
- tool-building → agent-tool-builder
- memory-systems → agent-memory-systems
- workflow-orchestration → workflow-automation
## Tooling
### Frameworks
- LangGraph - When: Production agents with state management Note: 1.0 released Oct 2025, checkpointing, human-in-loop
- AutoGPT - When: Research/experimentation, open-ended exploration Note: Needs external guardrails for production
- CrewAI - When: Role-based agent teams Note: Good for specialized agent collaboration
- Claude Agent SDK - When: Anthropic ecosystem agents Note: Computer use, tool execution
### Patterns
- ReAct - When: Reasoning + Acting in alternating steps Note: Foundation for most modern agents
- Plan-Execute - When: Separate planning from execution Note: Better for complex multi-step tasks
- Reflection - When: Self-evaluation and correction Note: Evaluator-optimizer loop
## Patterns
### ReAct Agent Loop
Alternating reasoning and action steps
**When to use**: Interactive problem-solving, tool use, exploration
# REACT PATTERN:
"""
The ReAct loop:
1. Thought: Reason about what to do next
2. Action: Choose and execute a tool
3. Observation: Receive result
4. Repeat until goal achieved
Key: Explicit reasoning traces make debugging possible
"""
## Basic ReAct Implementation
"""
from langchain.agents import create_react_agent
from langchain_openai import ChatOpenAI
# Define the ReAct prompt template
react_prompt = '''
Answer the question using the following format:
Question: the input question
Thought: reason about what to do
Action: tool_name
Action Input: input to the tool
Observation: result of the action
... (repeat Thought/Action/Observation as needed)
Thought: I now know the final answer
Final Answer: the answer
'''
# Create the agent
agent = create_react_agent(
llm=ChatOpenAI(model="gpt-4o"),
tools=tools,
prompt=react_prompt,
)
# Execute with step limit
result = agent.invoke(
{"input": query},
config={"max_iterations": 10} # Prevent runaway loops
)
"""
## LangGraph ReAct (Production)
"""
from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.postgres import PostgresSaver
# Production checkpointer
checkpointer = PostgresSaver.from_conn_string(
os.environ["POSTGRES_URL"]
)
agent = create_react_agent(
model=llm,
tools=tools,
checkpointer=checkpointer, # Durable state
)
# Invoke with thread for state persistence
config = {"configurable": {"thread_id": "user-123"}}
result = agent.invoke({"messages": [query]}, config)
"""
### Plan-Execute Pattern
Separate planning phase from execution
**When to use**: Complex multi-step tasks, when full plan visibility matters
# PLAN-EXECUTE PATTERN:
"""
Two-phase approach:
1. Planning: Decompose goal into subtasks
2. Execution: Execute subtasks, potentially re-plan
Advantages:
- Full visibility into plan before execution
- Can validate/modify plan with human
- Cleaner separation of concerns
Disadvantages:
- Less adaptive to mid-task discoveries
- Plan may become stale
"""
## LangGraph Plan-Execute
"""
from langgraph.prebuilt import create_plan_and_execute_agent
# Planner creates the task list
planner_prompt = '''
For the given objective, create a step-by-step plan.
Each step should be atomic and actionable.
Format: numbered list of steps.
'''
# Executor handles individual steps
executor_prompt = '''
You are executing step {step_number} of the plan.
Previous results: {previous_results}
Current step: {current_step}
Execute this step using available tools.
'''
agent = create_plan_and_execute_agent(
planner=planner_llm,
executor=executor_llm,
tools=tools,
replan_on_error=True, # Re-plan if step fails
)
# Human approval of plan
config = {
"configurable": {
"thread_id": "task-456",
},
"interrupt_before": ["execute"], # Pause before execution
}
# First call creates plan
plan = agent.invoke({"objective": goal}, config)
# Review plan, then continue
if human_approves(plan):
result = agent.invoke(None, config) # Continue from checkpoint
"""
## Decomposition Strategies
"""
# Decomposition-First: Plan everything, then execute
# Best for: Stable tasks, need full plan approval
# Interleaved: Plan one step, execute, repeat
# Best for: Dynamic tasks, learning as you go
def interleaved_execute(goal, max_steps=10):
state = {"goal": goal, "completed": [], "remaining": [goal]}
for step in range(max_steps):
# Plan next action based on current state
next_action = planner.plan_next(state)
if next_action == "DONE":
break
# Execute and update state
result = executor.execute(next_action)
state["completed"].append((next_action, result))
# Re-evaluate remaining work
state["remaining"] = planner.reassess(state)
return state
"""
### Reflection Pattern
Self-evaluation and iterative improvement
**When to use**: Quality matters, complex outputs, creative tasks
# REFLECTION PATTERN:
"""
Self-correction loop:
1. Generate initial output
2. Evaluate against criteria
3. Critique and identify issues
4. Refine based on critique
5. Repeat until satisfactory
Also called: Evaluator-Optimizer, Self-Critique
"""
## Basic Reflection
"""
def reflect_and_improve(task, max_iterations=3):
# Initial generation
output = generator.generate(task)
for i in range(max_iterations):
# Evaluate output
critique = evaluator.critique(
task=task,
output=output,
criteria=[
"Correctness",
"Completeness",
"Clarity",
]
)
if critique["passes_all"]:
return output
# Refine based on critique
output = generator.refine(
task=task,
previous_output=output,
critique=critique["feedback"],
)
return output # Best effort after max iterations
"""
## LangGraph Reflection
"""
from langgraph.graph import StateGraph
def build_reflection_graph():
graph = StateGraph(ReflectionState)
# Nodes
graph.add_node("generate", generate_node)
graph.add_node("reflect", reflect_node)
graph.add_node("output", output_node)
# Edges
graph.add_edge("generate", "reflect")
graph.add_conditional_edges(
"reflect",
should_continue,
{
"continue": "generate", # Loop back
"end": "output",
}
)
return graph.compile()
def should_continue(state):
if state["iteration"] >= 3:
return "end"
if state["score"] >= 0.9:
return "end"
return "continue"
"""
## Separate Evaluator (More Robust)
"""
# Use different model for evaluation to avoid self-bias
generator = ChatOpenAI(model="gpt-4o")
evaluator = ChatOpenAI(model="gpt-4o-mini") # Different perspective
# Or use specialized evaluators
from langchain.evaluation import load_evaluator
evaluator = load_evaluator("criteria", criteria="correctness")
"""
### Guardrailed Autonomy
Constrained agents with safety boundaries
**When to use**: Production systems, critical operations
# GUARDRAILED AUTONOMY:
"""
Production agents need multiple safety layers:
1. Input validation
2. Action constraints
3. Output validation
4. Cost limits
5. Human escalation
6. Rollback capability
"""
## Multi-Layer Guardrails
"""
class GuardedAgent:
def __init__(self, agent, config):
self.agent = agent
self.max_cost = config.get("max_cost_usd", 1.0)
self.max_steps = config.get("max_steps", 10)
self.allowed_actions = config.get("allowed_actions", [])
self.require_approval = config.get("require_approval", [])
async def execute(self, goal):
total_cost = 0
steps = 0
while steps < self.max_steps:
# Get next action
action = await self.agent.plan_next(goal)
# Validate action is allowed
if action.name not in self.allowed_actions:
raise ActionNotAllowedError(action.name)
# Check if approval needed
if action.name in self.require_approval:
approved = await self.request_human_approval(action)
if not approved:
return {"status": "rejected", "action": action}
# Estimate cost
estimated_cost = self.estimate_cost(action)
if total_cost + estimated_cost > self.max_cost:
raise CostLimitExceededError(total_cost)
# Execute with rollback capability
checkpoint = await self.save_checkpoint()
try:
result = await self.agent.execute(action)
total_cost += self.actual_cost(action)
steps += 1
except Exception as e:
await self.rollback_to(checkpoint)
raise
if result.is_complete:
break
return {"status": "complete", "total_cost": total_cost}
"""
## Least Privilege Principle
"""
# Define minimal permissions per task type
TASK_PERMISSIONS = {
"research": ["web_search", "read_file"],
"coding": ["read_file", "write_file", "run_tests"],
"admin": ["all"], # Rarely grant this
}
def create_scoped_agent(task_type):
allowed = TASK_PERMISSIONS.get(task_type, [])
tools = [t for t in ALL_TOOLS if t.name in allowed]
return Agent(tools=tools)
"""
## Cost Control
"""
# Context length grows quadratically in cost
# Double context = 4x cost
def trim_context(messages, max_tokens=4000):
# Keep system message and recent messages
system = messages[0]
recent = messages[-10:]
# Summarize middle if needed
if len(messages) > 11:
middle = messages[1:-10]
summary = summarize(middle)
return [system, summary] + recent
return messages
"""
### Durable Execution Pattern
Agents that survive failures and resume
**When to use**: Long-running tasks, production systems, multi-day processes
# DURABLE EXECUTION:
"""
Production agents must:
- Survive server restarts
- Resume from exact point of failure
- Handle hours/days of runtime
- Allow human intervention mid-process
LangGraph 1.0 provides this nativel
… (truncated)Want a live grade + an embeddable README badge? Run your skill through the free scanner.
Graded independently by Skillproof — nothing to sell the author. Quality is mechanical + corpus-grounded; safety flags are heuristic (builtin+triage), not a malicious verdict.