Platform
Offerings

Production AI infrastructure for the rooms that can't use frontier APIs. Deployed on customer hardware. Audited end-to-end. Compounding on customer data.

↳ Explore the platform
Solutions
Company
Get Started

Talk to us about a CASTLE deployment in your environment.

↳ Request a demo
THE CANADIANINFERENCE ENGINE
P95 < 200msLOCAL LATENCY
OAI-COMPATIBLEDROP-IN SDK CONTRACT
©2026AXE TECHNOLOGIES INC.

ECHO

The Canadian inference engine. OpenAI-compatible. Multi-model routing across the Edge family.

SCROLL
ECHO · /0.1

OpenAI-Compatible. Canadian by Default.

Drop-in replacement for cloud AI APIs. Same endpoints, same SDKs, same developer experience. But every token stays on your infrastructure. Every model is yours.

  • /v1/chat/completions — streaming inference
  • /v1/models — model registry
  • /v1/embeddings — vector generation
  • /health — system status
$ curl https://echo.castle.local/v1/chat/completions \ -H "Authorization: Bearer $CASTLE_KEY" \ -d '{ "model": "edge-1", "stream": true, "messages": [{ "role": "user", "content": "Analyze Q3 defense procurement" }] }' data: {"choices":[{"delta":{"content":"Based on"}}]} data: {"choices":[{"delta":{"content":" the latest"}}]} data: {"choices":[{"delta":{"content":" PSPC data"}}]}

Quick Start

OpenAI-compatible. Drop in your existing code. Every token stays in your custody.

# Streaming inference — OpenAI-compatible endpoint curl -X POST https://<your-instance>/v1/chat/completions \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "casa-mini-9b", "messages": [{"role": "user", "content": "Analyze this document for PII"}], "stream": true }'
# Python — uses the official OpenAI SDK from openai import OpenAI client = OpenAI( base_url="https://<your-instance>/v1", api_key="YOUR_API_KEY" ) response = client.chat.completions.create( model="casa-mini-9b", messages=[{"role": "user", "content": "Analyze this document for PII"}], stream=True ) for chunk in response: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="")
// TypeScript — uses the official OpenAI SDK import OpenAI from 'openai'; const echo = new OpenAI({ baseURL: 'https://<your-instance>/v1', apiKey: 'YOUR_API_KEY' }); const stream = await echo.chat.completions.create({ model: 'casa-mini-9b', messages: [{ role: 'user', content: 'Analyze this document for PII' }], stream: true }); for await (const chunk of stream) { process.stdout.write(chunk.choices[0]?.delta?.content || ''); }

Works with any OpenAI-compatible SDK. Swap base_url to your Echo instance and every token stays on your infrastructure.

Canadian Models. Trained on Your Data.

Casanova 2.0
70B Parameters
Flagship. Complex reasoning, research, strategic analysis. 8 tok/s on consumer hardware.
Geralt 2.0
11B Parameters
Balanced. Code generation, document analysis, reasoning. 25 tok/s.
Anakin 8B
8B Parameters
Fast. Chat, tool calling, real-time inference. 44 tok/s.

Intelligent Routing. Automatic Model Selection.

Every request is classified in real-time. The right model for the right task. Always.

Request
User Query
Classifier
Maestro
Router
Best Model
Classification time: <50ms. Response time: <200ms P95.

Security First. Verified by Design.

When you unplug from the internet, cloud AI goes dark. Echo keeps running.
Model Name Sanitization
Path traversal blocked. Model names validated against whitelist.
Stack Trace Suppression
Zero information leakage. Errors sanitized before transmission.
System Prompt Protection
Injection-proof. Prompts immutable, role-based enforcement.
Constant-Time Comparison
Timing attack resistant. Key comparison in constant time.
Air-Gappable
Works offline. Zero dependencies on external APIs or cloud.
FIPS 140-2 Path
Cryptographic foundation ready for certification.

From Zero to Inference in Four Hours

1
Install
Docker pull. One command. Echo runs on any Linux, macOS, or air-gapped system.
2
Load Models
Point to your GGUF models or pull from AXE Model Hub.
3
Query
curl the endpoint. Same API you already know.
Request access to a private deployment →

/ Contact · we read every inquiry

Talk to AXE.

Demos, partnerships, government RFPs, technical questions. A person reads every form. You hear from someone — not a queue.

Inquiry type

Replies within one business day · Knox audit chain records every inquiry