Back to articles
FRAMEWORKApr 15, 20262 min read

The 5-Step Agentic AI Tool Evaluation Framework

Before you spend a dollar on another agentic AI tool, run it through this five-question checklist. If it can't pass, it's vaporware.

Omri Dan

Omri Dan · Nomadan founder

The market is flooded. Every week, three new "agentic" tools land in your inbox with a screencast, a Stripe link, and a launch discount.

Most of them are not agentic. They are clever prompts in a wrapper. Some of them are useful anyway. Most are not.

Here's the five-step filter I run every tool through before recommending it to a client. Use it. It will save you a lot of money.

1. Does it actually take action, or just generate text?

A real agent does work. It books the meeting, sends the email, updates the row, files the ticket. If the tool stops at generating a draft and hands the actual action back to a human every time, that is fine. But it is a copilot, not an agent. Price it accordingly.

The test. Ask the vendor: "show me one full loop from trigger to side effect, with no human in the middle." If they show you a chat window, it's a copilot.

2. Is the human-in-the-loop real or theater?

The honest agentic systems pause for human approval at the dangerous step: the send, the spend, the delete. The dishonest ones either skip the pause (high-risk) or pause on every step including the trivial ones (theater that makes the product useless).

Watch a full session before you buy. Count the approvals. If it asks you to approve every output token, run.

3. Where does your data go?

Three questions, in order:

  1. Is my data used to train the underlying model?
  2. Is my data shared with sub-processors I haven't seen named?
  3. Can I export and delete everything in one click?

If you can't get clean answers in writing, you don't have an enterprise-grade tool. You have a hobby project with your data in it.

4. What happens when it fails?

Agents fail. The mature products tell you exactly how: timeouts, retries, dead-letter queues, alerting. The immature ones go quiet.

Ask: "what does the failure mode look like, and how do I find out?" If the answer is "you'll see it in the logs," they don't have an alerting story. If the answer is "we email you within 5 minutes," you have a vendor.

5. Could you replicate 80% of it in an afternoon?

This is the question vendors hate. Look at the workflow on offer. Could you stand up something like it with an AI Gateway, a function call, and a Zapier-style trigger? If yes, the question becomes whether the vendor's polish, support, and reliability are worth the markup.

Sometimes they are. Often they aren't.

The shortcut

If a tool can answer all five questions clearly in fifteen minutes of a sales call, it's a real product. If the conversation drifts into "trust the magic" territory at any of the five, you've found vaporware.

Either way, you keep your money.

After the read

Want this run on your business?

Book a 30-min call. Tell me what’s slowing your team down. I’ll give you a straight answer: whether an AI layer can help, and which work it could take over.

Omri Dan

You’ll be talking with

Omri Dan · Founder

Book a free audit call

30 minutes · No slides · No obligation

Book a free audit call