โ† All posts
๐Ÿฆž OpenClaw2026-04-21ยท6 min

What Your AI Agent Actually Does When You're Not Watching

I traced 50 AI agent runs and found patterns nobody talks about.

The Numbers

  • โ—40% of tool calls were unnecessary โ€” the agent asked the same question 3 different ways
  • โ—The average "simple research task" costs $0.08 โ€” but 1 in 10 costs $0.50+ due to looping
  • โ—23% of URLs in agent output were hallucinated (not from any search result)
  • โ—Adding ONE sentence to the system prompt cut costs by 35%

What a Loop Looks Like

Here's a real trace from an agent asked to "find the latest AI news":

Step 1: User sends prompt. Step 2: web_search("AI news") โ†’ 3 results. Step 3: web_search("AI news") โ†’ same 3 results. Step 4: web_search("AI news today") โ†’ similar results. Steps 5-8: four more search variations. Step 9: finally writes the summary.

2 minutes. $0.34. 8 searches. It could've been done in 2 searches and $0.03.

The Biggest Insight

The model (GPT-4, Claude, etc.) is the least interesting part of an AI agent. The architecture around the model โ€” tools, memory, skills, config โ€” determines whether your agent works or wastes money.

A $0.002/token model with good tooling outperforms a $0.06/token model with bad tooling. Every time.

What to Trace on Every Run

1. Duration Per Step

Not just total time โ€” time per tool call. You'll find one step consistently takes 60% of the session.

2. Cost Per Step

Tokens times model pricing per step. Most people only see total cost. Per-step cost reveals the system prompt alone accounts for 30-50% of input tokens.

3. Tool Call Patterns

Which tools, how many times, any repeats with identical arguments? The repeat-with-same-args pattern is the #1 cost driver.

4. URL Verification

Are output URLs from search results or hallucinated? Automated checks catch this on every run.

5. Loop Detection

Same tool called 3+ times with the same arguments = guaranteed loop. This should be an automatic alert.

6. Security Checks

Internal network access? API key leaks? Sensitive files? These should run on every trace, not just when you're worried.

7. Quality Score

Did the agent complete the task? Empty output? Refusal? Automated evals catch these patterns.

You Can't Fix What You Can't See

AI agents need observability. Same as APIs. Same as servers. Same as databases. The tooling for web services took 20 years to mature. Agent observability is at day one.

Free, no signup. Works with OpenClaw, MyClaw, KiloClaw. Waterfall timeline, cost breakdown, 8 auto-evals, security audit, AI debugging โ€” all instant.

Debug your OpenClaw agent

See every tool call, token, and dollar. Auto-diagnosis with fix suggestions. Free.