The 5-Step Agentic AI Tool Evaluation Framework
Before you spend a dollar on another agentic AI tool, run it through this five-question checklist. If it can't pass, it's vaporware.

Omri Dan · Nomadan founder
The market is flooded. Every week, three new "agentic" tools land in your inbox with a screencast, a Stripe link, and a launch discount.
Most of them are not agentic. They are clever prompts in a wrapper. Some of them are useful anyway. Most are not.
Here's the five-step filter I run every tool through before recommending it to a client. Use it. It will save you a lot of money.
1. Does it actually take action, or just generate text?
A real agent does work. It books the meeting, sends the email, updates the row, files the ticket. If the tool stops at generating a draft and hands the actual action back to a human every time, that is fine. But it is a copilot, not an agent. Price it accordingly.
The test. Ask the vendor: "show me one full loop from trigger to side effect, with no human in the middle." If they show you a chat window, it's a copilot.
2. Is the human-in-the-loop real or theater?
The honest agentic systems pause for human approval at the dangerous step: the send, the spend, the delete. The dishonest ones either skip the pause (high-risk) or pause on every step including the trivial ones (theater that makes the product useless).
Watch a full session before you buy. Count the approvals. If it asks you to approve every output token, run.
3. Where does your data go?
Three questions, in order:
- Is my data used to train the underlying model?
- Is my data shared with sub-processors I haven't seen named?
- Can I export and delete everything in one click?
If you can't get clean answers in writing, you don't have an enterprise-grade tool. You have a hobby project with your data in it.
4. What happens when it fails?
Agents fail. The mature products tell you exactly how: timeouts, retries, dead-letter queues, alerting. The immature ones go quiet.
Ask: "what does the failure mode look like, and how do I find out?" If the answer is "you'll see it in the logs," they don't have an alerting story. If the answer is "we email you within 5 minutes," you have a vendor.
5. Could you replicate 80% of it in an afternoon?
This is the question vendors hate. Look at the workflow on offer. Could you stand up something like it with an AI Gateway, a function call, and a Zapier-style trigger? If yes, the question becomes whether the vendor's polish, support, and reliability are worth the markup.
Sometimes they are. Often they aren't.
The shortcut
If a tool can answer all five questions clearly in fifteen minutes of a sales call, it's a real product. If the conversation drifts into "trust the magic" territory at any of the five, you've found vaporware.
Either way, you keep your money.