FRAMEWORKApr 15, 20262 min read

The 5-Step Agentic AI Tool Evaluation Framework

Before you spend a dollar on another agentic AI tool, run it through this five-question checklist. If it can't pass, it's vaporware.

Omri Dan · Nomadan founder

The market is flooded. Every week, three new "agentic" tools land in your inbox with a screencast, a Stripe link, and a launch discount.

Most of them are not agentic. They are clever prompts in a wrapper. Some of them are useful anyway. Most are not.

Here's the five-step filter I run every tool through before recommending it to a client. Use it. It will save you a lot of money.

1. Does it actually take action, or just generate text?

A real agent does work. It books the meeting, sends the email, updates the row, files the ticket. If the tool stops at generating a draft and hands the actual action back to a human every time, that is fine. But it is a copilot, not an agent. Price it accordingly.

The test. Ask the vendor: "show me one full loop from trigger to side effect, with no human in the middle." If they show you a chat window, it's a copilot.

2. Is the human-in-the-loop real or theater?

The honest agentic systems pause for human approval at the dangerous step: the send, the spend, the delete. The dishonest ones either skip the pause (high-risk) or pause on every step including the trivial ones (theater that makes the product useless).

Watch a full session before you buy. Count the approvals. If it asks you to approve every output token, run.

3. Where does your data go?

Three questions, in order:

Is my data used to train the underlying model?
Is my data shared with sub-processors I haven't seen named?
Can I export and delete everything in one click?

If you can't get clean answers in writing, you don't have an enterprise-grade tool. You have a hobby project with your data in it.

4. What happens when it fails?

Agents fail. The mature products tell you exactly how: timeouts, retries, dead-letter queues, alerting. The immature ones go quiet.

Ask: "what does the failure mode look like, and how do I find out?" If the answer is "you'll see it in the logs," they don't have an alerting story. If the answer is "we email you within 5 minutes," you have a vendor.

5. Could you replicate 80% of it in an afternoon?

This is the question vendors hate. Look at the workflow on offer. Could you stand up something like it with an AI Gateway, a function call, and a Zapier-style trigger? If yes, the question becomes whether the vendor's polish, support, and reliability are worth the markup.

Sometimes they are. Often they aren't.

The shortcut

If a tool can answer all five questions clearly in fifteen minutes of a sales call, it's a real product. If the conversation drifts into "trust the magic" territory at any of the five, you've found vaporware.

Either way, you keep your money.

The 5-Step Agentic AI Tool Evaluation Framework

1. Does it actually take action, or just generate text?

2. Is the human-in-the-loop real or theater?

3. Where does your data go?

4. What happens when it fails?

5. Could you replicate 80% of it in an afternoon?

The shortcut

The four layers of an AI that runs your business

What is an AI-native business

Why Agentic AI Will Change Small Business Forever

Want this run on your business?