echo.route(model=edge-1, stream=true)
↳ shield.verify(scope=inference) ok
↳ tower.policy(actor=jamie) allow
↳ crown.retrieve(top_k=5) 0.82
↳ edge-1.generate(tokens=1847)
↳ knox.audit.append(0x3a8f...)
echo.route(model=edge-1, stream=true)
↳ shield.verify(scope=inference) ok
↳ tower.policy(actor=jamie) allow
↳ crown.retrieve(top_k=5) 0.82
↳ edge-1.generate(tokens=1847)
↳ knox.audit.append(0x3a8f...)
echo.route(model=edge-1, stream=true)
↳ shield.verify(scope=inference) ok
↳ tower.policy(actor=jamie) allow
↳ crown.retrieve(top_k=5) 0.82
↳ edge-1.generate(tokens=1847)
↳ knox.audit.append(0x3a8f...)
echo.route(model=edge-1, stream=true)
↳ shield.verify(scope=inference) ok
↳ tower.policy(actor=jamie) allow
↳ crown.retrieve(top_k=5) 0.82
↳ edge-1.generate(tokens=1847)
↳ knox.audit.append(0x3a8f...)
echo.route(model=edge-1, stream=true)
↳ shield.verify(scope=inference) ok
↳ tower.policy(actor=jamie) allow
↳ crown.retrieve(top_k=5) 0.82
↳ edge-1.generate(tokens=1847)
↳ knox.audit.append(0x3a8f...)
echo.route(model=edge-1, stream=true)
↳ shield.verify(scope=inference) ok
↳ tower.policy(actor=jamie) allow
↳ crown.retrieve(top_k=5) 0.82
↳ edge-1.generate(tokens=1847)
↳ knox.audit.append(0x3a8f...)

THE CANADIANINFERENCE ENGINE

P95 < 200msLOCAL LATENCY

OAI-COMPATIBLEDROP-IN SDK CONTRACT

ECHO

The Canadian inference engine. OpenAI-compatible. Multi-model routing across the Edge family.

SCROLL

ECHO · /0.1

OpenAI-Compatible. Canadian by Default.

Drop-in replacement for cloud AI APIs. Same endpoints, same SDKs, same developer experience. But every token stays on your infrastructure. Every model is yours.

/v1/chat/completions — streaming inference
/v1/models — model registry
/v1/embeddings — vector generation
/health — system status

$ curl https://echo.castle.local/v1/chat/completions \ -H "Authorization: Bearer $CASTLE_KEY" \ -d '{ "model": "edge-1", "stream": true, "messages": [{ "role": "user", "content": "Analyze Q3 defense procurement" }] }' data: {"choices":[{"delta":{"content":"Based on"}}]} data: {"choices":[{"delta":{"content":" the latest"}}]} data: {"choices":[{"delta":{"content":" PSPC data"}}]}

Quick Start

OpenAI-compatible. Drop in your existing code. Every token stays in your custody.

# Streaming inference — OpenAI-compatible endpoint curl -X POST https://<your-instance>/v1/chat/completions \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "casa-mini-9b", "messages": [{"role": "user", "content": "Analyze this document for PII"}], "stream": true }'

# Python — uses the official OpenAI SDK from openai import OpenAI client = OpenAI( base_url="https://<your-instance>/v1", api_key="YOUR_API_KEY" ) response = client.chat.completions.create( model="casa-mini-9b", messages=[{"role": "user", "content": "Analyze this document for PII"}], stream=True ) for chunk in response: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="")

// TypeScript — uses the official OpenAI SDK import OpenAI from 'openai'; const echo = new OpenAI({ baseURL: 'https://<your-instance>/v1', apiKey: 'YOUR_API_KEY' }); const stream = await echo.chat.completions.create({ model: 'casa-mini-9b', messages: [{ role: 'user', content: 'Analyze this document for PII' }], stream: true }); for await (const chunk of stream) { process.stdout.write(chunk.choices[0]?.delta?.content || ''); }

Works with any OpenAI-compatible SDK. Swap base_url to your Echo instance and every token stays on your infrastructure.

Canadian Models. Trained on Your Data.

Casanova 2.0

70B Parameters

Flagship. Complex reasoning, research, strategic analysis. 8 tok/s on consumer hardware.

Geralt 2.0

11B Parameters

Balanced. Code generation, document analysis, reasoning. 25 tok/s.

Anakin 8B

8B Parameters

Fast. Chat, tool calling, real-time inference. 44 tok/s.

Intelligent Routing. Automatic Model Selection.

Every request is classified in real-time. The right model for the right task. Always.

Request

User Query

→

Classifier

Maestro

→

Router

Best Model

Classification time: <50ms. Response time: <200ms P95.

Security First. Verified by Design.

When you unplug from the internet, cloud AI goes dark. Echo keeps running.

Model Name Sanitization

Path traversal blocked. Model names validated against whitelist.

Stack Trace Suppression

Zero information leakage. Errors sanitized before transmission.

System Prompt Protection

Injection-proof. Prompts immutable, role-based enforcement.

Constant-Time Comparison

Timing attack resistant. Key comparison in constant time.

Air-Gappable

Works offline. Zero dependencies on external APIs or cloud.

FIPS 140-2 Path

Cryptographic foundation ready for certification.

From Zero to Inference in Four Hours

Install

Docker pull. One command. Echo runs on any Linux, macOS, or air-gapped system.

Load Models

Point to your GGUF models or pull from AXE Model Hub.

Query

curl the endpoint. Same API you already know.

Ready to deploy Canadian inference?

Launch Echo →

/ Contact · we read every inquiry

Talk to .

Demos, partnerships, government RFPs, technical questions. A person reads every form. You hear from someone — not a queue.