What private AI actually costs.
An honest framing piece. We don't publish a price list because pilots vary several-fold across firm shapes — a number that's right for a 5-person law office is wrong for a 50-person therapy group. Here's how we think about the cost so you walk into a Readiness Call already informed.
Why cloud-AI bills creep.
Cloud AI looks cheap when you start. A team of five running ChatGPT Plus is $100 a month. A few API calls per day is rounding error. Then adoption succeeds.
What changes:
- Staff use AI throughout the day instead of for occasional lookups.
- Tools that started as helpers move into production workflows. Token volume jumps.
- Frontier-model price changes hit you mid-quarter. Forecasting becomes guesswork.
- Consumer-tier rate limits start to bite. The vendor walks you to enterprise tier.
- Enterprise tier minimums — annual commits, seat counts — quote a lot higher than the consumer-tier number you scoped against.
This isn't a problem with cloud AI as a category. It's a problem with using consumer-priced cloud AI as a budgeting baseline for a real production workload. Once consumption is steady and high, fixed-cost local hardware is the cheaper, more predictable path. That's where the math turns.
The shape of a Wilcoe Private AI engagement.
Three buckets. Different shapes. Different time horizons.
Bucket 1 — the Readiness Sprint (one-time, fixed-fee).
Two weeks of focused work to scope your firm's private-AI pilot. We inventory sensitive workflows, retention rules, document systems, approval steps, IT capacity, and existing AI use. We translate your compliance frame (HIPAA / ABA / IRS+FTC) into operating policy. We propose the pilot's hardware shape, first workflow, and rollout plan. You leave with a written sprint scope and a sized estimate for the rest.
This is paid because the work is real. It's also bounded — the price is fixed before we start, and the deliverable is a document plus a sized pilot scope you can take to other vendors if you want.
Bucket 2 — the pilot deployment (one-time, scoped from the sprint).
Hardware procurement, install, security configuration, network segmentation, identity, logging, encrypted backup, model setup, the first knowledge-layer index, the first vertical copilot, mandatory review gates, and team training. Lands inside 90 days from the sprint kickoff.
The price moves with the firm. A solo law office pilot is small. A multi-office therapy group pilot is larger. The Readiness Sprint produces the firm-specific number.
Bucket 3 — the managed retainer (ongoing, monthly).
Patching, model allowlists, prompt-template management, logging, access reviews, backup checks, incident response playbooks, quarterly workflow tuning, and team training. This is the moat — the thing most SMBs can't run themselves and the thing that turns "we have AI" into "we have AI we can defend in a regulator conversation."
Retainer scales with managed scope: number of users, number of workflows, sensitivity profile, and how much we handle vs. how much your IT person handles.
The five levers that move your specific number.
1. Firm size and concurrency.
A solo office with five people running occasional AI tasks needs one Mac mini. A 50-person therapy group with concurrent users on every workflow needs a Mac Studio cluster. Hardware shape moves linearly with concurrent active users, not headcount.
2. Sensitivity profile.
The more your work touches PHI, privileged communications, or restricted financial data, the more we invest in segmentation, encryption, audit logging, and breach-response playbooks. A clinic running patient notes through the appliance gets more configuration than a small marketing-services firm running internal drafting.
3. Cloud-fallback policy.
"Local by default, cloud allowed with policy" is cheaper to operate than "local-only, no cloud, no exceptions." Strict cloud-off pushes more work onto the local stack — bigger hardware, more model tuning, more workflow design. Most firms land on a middle path tailored to their compliance frame.
4. Number of workflows in the first wave.
One workflow live in 90 days is a clean pilot. Three workflows in 90 days is a project. Five workflows is a program. The Readiness Sprint produces a prioritized list; pilots launch the top one and add the rest in subsequent quarters.
5. IT-handoff scope.
If you have an in-house IT person or an MSP we can hand operations off to, our retainer is lighter. If we're the on-call team for the appliance, the retainer is heavier. We charge for the work, not for the title.
Why we don't publish a price list.
We've quoted pilots that varied 5x across firm shapes and 3x within the same vertical depending on workflow count and sensitivity profile. Publishing a number that fits one of those quotes would be the wrong number for the next caller. A cheap engagement that doesn't fit is wasteful for everyone — your team won't use it, the workflow won't survive a compliance review, and you'll resent the line item six months later.
So we size every engagement in the Readiness Sprint, where the answer is real.
What you get from a Readiness Call.
Before any sprint scope is signed:
- A verbal sized estimate by tier — small / mid / large — so you can budget the right order of magnitude.
- A workflow shortlist for your firm — the two or three highest-leverage places to start.
- A compliance frame walkthrough — how HIPAA / ABA / IRS+FTC actually applies to your private-AI architecture.
- A written sprint scope you can compare against any other proposal.
It's a 30-minute call. We don't pitch — we listen, scope, and price.
Want the firm-specific number?
Book a Readiness Call. Or take the 90-second readiness check first to map yourself to a deployment shape.
Take the readiness check Book a Readiness Call →