An agentic system's economics are decided before the agent runs, in how the data is modeled. A semantic model built to cost the least to query lets everything above it scale.
Context for this engagement. The data-layer readiness assessment is complete and a Snowflake Cortex build is live in an initial market. The foundation is in motion. What follows is the approach to optimize it, across its technical, architectural, and economic dimensions.
The token bill is mostly written at the data layer. A well-structured semantic model produces an output at the lowest token cost and the best quality. A thin or unmodeled layer makes the agent work harder, and pay more, on every call. This is a data engineering decision first.
It helps to hold two ideas apart. How the data is modeled, the shape of the tables, is one layer. How an agent works a problem is another. The two are easy to run together, and doing so is what makes the architecture conversation harder than it needs to be. This piece stays on the layer that controls cost.
A comparison between an older approach and a newer one shows which behaves better. It does not show which is more economical. Cost is a separate axis, and it is the one this methodology is built around.
A comparison of two approaches usually weighs output quality and latency. The component most often left out is token consumption, and it is the one that decides which approach stays affordable as usage grows. Measuring it is what turns a question of which behaves better into a question of which scales.
The reason it matters is in the shape of the bill.
The unit price falls, the volume rises, the bill grows. The metric that matters is not total token spend but cost per output, the price of each answer or generated result, and that number is set almost entirely by how the data is modeled before the agent runs.
Market figures cited from public 2026 sources for context. They are not Hakkoda or IBM measurements.
Two principles sit underneath the work. First, do not use AI for everything: the parts of the system that should be deterministic stay in code, where they are cheaper, faster, and safer. Second, model the data so it costs the least to query. From there, cost per output is set in four layers, and the order matters. Each layer removes work the agent would otherwise pay to do at runtime.
The semantic model is the foundation. It serves whatever agent approach runs on top, at the lowest token cost and the best output quality of any choice available. A bare model forces the agent to interpret it from scratch on every call, and that interpretation is where the tokens go. Get this layer right and every layer above it gets cheaper.
Skills are prebuilt capabilities attached to specific kinds of work. With a skill in place, the agent stops reasoning through how to read the model for that task and runs a known-good procedure instead. Tokens per call drop, reliability rises. Modular skill architectures have cut token cost by 60 to 90 percent with no loss of output quality. Skills also compound: each one built for one engagement is a reusable accelerator for the next.
Above the skills library sit locked, tested prompts, built against the specific model and data, not generic templates. Model context protocols (MCPs) define how the agent connects to tools, passes context, and hands off between steps, so it does not improvise its connections on every call. Together they cut spend, cut latency, and leave less room for an expensive wrong turn.
With the data modeled and the skills, prompts, and protocols in place, the last layer sends each task to the right model: the most economic one that clears the bar, capable enough to be reliably correct and no costlier than that requires. Snowflake orchestrates across cost tiers out of the box, on one principle, the shallowest sufficient capability per query. Simple retrieval does not pay frontier prices; complex synthesis earns them. Running a single top-tier model for everything overspends an estimated 40 to 85 percent.
Stood up, the methodology becomes a running system. It is not a fixed pipeline: the router sends each request down the lightest path that fits, retrieval fans out in parallel when the question needs it, and only the hard cases run the full loop.
Every request enters a lightweight router, a shallow reason-then-act step that picks the lightest mode that resolves it reliably. Most routine traffic resolves here, without an expensive reasoning loop, and a large share of the token bill is decided at this one step.
When a question needs real context, the system retrieves in parallel: many tool calls against the semantic model and its skills, deduplicated by source and ranked by evidence quality, then reduced to the strongest set. Public enterprise-agent research calls this schedule explore then exploit, broad first and focused after. The parallel calls also cross-check one another, so a single unreliable result is caught rather than carried forward. Reasoning then runs on that filtered context only, offloading intermediate state to a store instead of its own history, which keeps a long task from inflating the context window and the bill.
Roles are bounded, which keeps the system testable and stops it from spawning work without end. In a data system they map directly to the data path.
Access follows the same discipline. Read and search sit with retrieval, write and execution with tightly governed workers, and a pure reasoning step may hold no tools at all. Least privilege keeps the blast radius of any one component small.
An architecture that works in a notebook holds in production only with the operating discipline around it. Day to day, the work is governance, guardrails, and observability, and none of it is model judgment.
The controls that protect cost and correctness are enforced in code, which is cheaper, faster, and safer than asking an agent to enforce them.
The system emits a step-by-step execution trace, and operators watch agent success rates, tool-call failure rates, time to first token, and tokens consumed by agent type. Without traces, the system is a box that works until it does not. This is also where data-layer weakness shows up first, as latency in the data agent and cost that rises without an obvious cause.
The architecture and operating patterns in these two sections describe current public framing for enterprise agent systems. They are the reference design this methodology applies, not a specific delivered build or a Hakkoda or IBM measurement.
Readiness is a property of the data model. The question to ask is whether the data is modeled for how agents will actually query it. Two markers answer it quickly.
Data quality runs as events, not on a schedule. Event-based checks through Snowflake Data Metric Functions catch an issue the moment it appears, where scheduled checks drift out of step with the data and let bad inputs reach the agent before anyone notices.
The semantic model is clean, because its quality is not cosmetic. Gaps in the semantic layer do not stay there. They surface downstream as latency in the data agent and as cost on the observability dashboard. Resolving them in the model is cheaper than paying for them on every call.
Read this way, the choice between an older approach and a newer one is made on the axis that matters: which is modeled so the data costs the least to query, at the quality the work requires.
None of this is a new category. It is an existing efficiency discipline applied to the data and model layer.
Frugal AI, achieving high impact with minimal resources, was named in the Allen Institute's 2019 Green AI work and advanced through French national AI policy and Cambridge Judge Business School's Frugal AI Hub, which evaluates AI systems on return on investment and total cost of ownership rather than raw capability. The overkill reflex it warns against is not only running a frontier model on a simple question. It is running any agent on data it should not have to interpret at runtime. Modeling that interpretation away is the frugal move.
Read against this discipline, the methodology is one continuous idea: spend tokens where they change the decision, and engineer the data layer so the rest is close to free.
The pattern is not theoretical. Hakkoda builds the data foundation: governed semantic models on Snowflake, grounded text-to-SQL, and the data engineering that makes agentic systems reliable. IBM builds the orchestration and routing layer on watsonx, with the governance to keep production systems accountable.
The data modeling layer is the first decision and the most consequential one. A semantic model built to minimize token cost, paired with a skills library, a tested prompt repository, and a clean protocol layer, is what makes any agent architecture economical. The choices above it are tools on that foundation, not substitutes for it. So the question to carry into a build is simple: is the data modeled so the output costs the least tokens, and are the skills and protocols in place to remove interpretation overhead at runtime? When the answer is yes, the rest of the architecture gets cheaper to settle, and the choice between an older approach and a newer one is made on performance and cost at once.
First pass for review. Confirm all client-specific framing and every proof point before this goes to the client. Client metrics labeled directional, estimated, projected, or structural are not confirmed realized results, and IBM-estimated figures carry illustrative-only qualification. Frugal AI sources: Cambridge Judge Business School Frugal AI Hub (frugalai.org); Allen Institute, Green AI, 2019; French national AI policy. Frugal AI is an existing efficiency discipline, not a new category.