idx/05·solutions
solutions/capability
capability

On-Premises AI

definition

On-premises AI is artificial intelligence — language models, document retrieval, transcription, and automation — deployed and run entirely on hardware the business owns and controls, so no data leaves the building and there are no metered per-token or per-seat cloud fees.

the problem

The most valuable AI use cases sit on top of a business's most sensitive data: contracts, patient records, financials, source code, internal knowledge. Sending that to a third-party cloud API is a privacy, compliance, and IP problem — and for regulated industries it can be a non-starter.

Cloud AI also bills by the token, the seat, and the minute. Costs are unpredictable, they scale against you the more useful the tool becomes, and you're locked to a vendor who can change pricing or models out from under you.

Teams assume running capable AI locally is too slow or too hard. With modern open models and the right hardware it isn't — but sizing the machine, choosing the model, and wiring it into a real workflow reliably is genuine engineering.

how stride solves it

Stride deploys AI on hardware you own: a private language model running locally, retrieval over your own documents (RAG) so the model answers from your knowledge, or document and workflow automation — all inside your network, with no third-party API keys and no data leaving the building.

We scope the use case first, then handle model selection and hardware sizing, the deploy on your iron, and integration into the workflow that needs it — exposed through a clean API your other tools can call.

Voice and transcription (Local AI Voice) are one common application of this; private document search, internal copilots, and on-prem automation are others. The pattern is the same: modern AI leverage, kept entirely within your own walls.

what we build
  • A private internal copilot that answers from a company's own policy and procedure documents, running on an in-office server
  • On-device clinical-notes transcription that replaced a per-minute cloud vendor
  • Retrieval over a law firm's matter files so attorneys can query precedent without documents leaving the firm
  • Document-classification and extraction automation for a finance team, run on-prem so records never touch a third-party API
architecture
architecture — Capable AI, entirely inside your network
  Your docs / data ──▶  Local model (LLM / Whisper / Piper)
        │                        │
        ▼                        ▼
  Retrieval index ───▶  Answer / transcript / action
        │
        ▼
  Your app / agent  ◀───── REST / WebSocket

  [ everything runs on your hardware — no cloud calls, no API keys ]
  • ·No data ever leaves your hardware; no third-party API keys, no per-token meter.
  • ·Runs on commodity x86 and Apple Silicon — we size the machine during scoping; a GPU helps for larger models but isn't always required.
  • ·Ships as containerized services your team can run, monitor, and update.
typical stack
Open-weight LLMs (Llama / Qwen class)WhisperPiper TTSRAG / vector retrievalPythonFastAPIDocker
common questions

Does any of our data leave our hardware?

No. The entire stack runs on hardware you own, inside your network. There are no cloud API calls and no third-party keys — which is the whole point for privacy-, IP-, and compliance-sensitive work.

What hardware do we need?

It depends on the model and use case. Smaller models and transcription run fine on commodity x86 or Apple Silicon; larger language models benefit from a GPU. We size the machine with you during scoping so you buy what you actually need.

How is this different from using a cloud AI API?

No per-token or per-seat billing, no vendor lock-in, and no data leaving your environment. You trade a metered, externally-hosted dependency for a fixed asset you own and control.

Is local voice part of this?

Yes — Local AI Voice (private speech-to-text and text-to-speech) is one application of on-premises AI. It's offered as a focused, lower-cost deploy for teams that specifically need voice and transcription on their own iron.

end of document·doc. v2026.05.r1·sheet 01 of 01
On-Premises AI · Stride Techworks