Technical deep dive

How Memo AI actually works.

Seven AI model tiers (including Auto-router). Persistent memory, RAG knowledge base, voice loop and Artifacts panel in v2.0. 40+ engines across 9 infrastructure partners. FLUX.2 klein for image editing, Gemini for live search, DeepSeek V3.2 for frontier reasoning. £0/month.

The request flow.

Every message you send travels through 8 stages. Most complete in under 200 milliseconds.

01

User sends message

Frontend captures input, attachments (PDF/Excel/Word/images), voice, or drag-drop. Streams via SSE.

02

Quota check

Per-user, per-model daily limit verified. 1K Smart · 500 Reasoner · 1K Live · 5K Fast · 800 Coder · 500 Vision.

03

Intent detection

Auto-detect: image generation? Image editing? Web search? Vision? Document analysis? Code? Plain chat?

04

Cascade routing

Selected model's cascade fires. Multiple API connections across 7 providers rotate via round-robin — no single connection is always first.

05

Inter-model fallback

If provider returns 429/5xx, falls through to the next cascade step in <50ms. Smart has 11 steps, Reasoner 9, Fast 9, Coder 9, Vision 8, Live 4.

06

Stream tokens

Real-time SSE streaming. <think> blocks are separated and hidden behind a toggle. Backend label + latency shown per reply.

07

Markdown rendering

Custom parser: tables, syntax-highlighted code blocks, lists, images. Theme-aware styling in light and dark mode.

08

Persist & log

Chat history saved to encrypted Supabase. File attachments persist in Cloudflare R2. Events logged for admin dashboard.

Model parameters.

Specs for every brain inside Memo AI — total parameters, active parameters, context window and primary strength.

Smart
DeepSeek V3.2 (685B MoE)
Total params685B
Active37B
Context128K
KnowledgeApril 2025
General · Writing · Reasoning
Reasoner
DeepSeek V3.2 + GPT-OSS 120B
Total params685B
Active37B
Context128K
KnowledgeDecember 2024
Code · Logic · Maths
Live
Gemini 2.5 Flash + Search
Total params
Active
Context1M
KnowledgeReal-time
Current events · Web search
Fast
Cerebras Llama 3.1 8B
Total params8B
Active8B
Context128K
KnowledgeApril 2025
Speed · Quick lookups
Coder
DeepSeek V3.2 (685B MoE)
Total params685B
Active37B
Context128K
KnowledgeApril 2025
Code · Refactor · Debug
Vision
SambaNova Llama 4 Maverick
Total params17B
Active17B
Context128K
KnowledgeAugust 2024
Image OCR · Visual analysis

Model benchmarks.

How our primary AI engines compare to the industry leaders. These are real benchmark scores from published evaluations — not marketing claims.

BenchmarkDeepSeek V3.2GPT-4oClaude SonnetGemini 2.5
MMLU (knowledge)87.1%88.7%88.7%90.0%
HumanEval (code)92.7%90.2%92.0%89.5%
MATH-500 (maths)90.2%76.6%78.3%83.2%
SWE-bench (coding)42.0%33.2%49.0%38.8%
Arena ELO (overall)1318128712711299
Context window128K128K200K1M
Cost to you£0£20+/user£18+/user£18+/user

DeepSeek V3.2 (685B MoE, 37B active) is the primary Smart, Reasoner, and Coder engine in Memo AI. It's a mixture-of-experts model — only 37 billion parameters are active per token, but the full 685B knowledge base is available. On maths (MATH-500) and code (HumanEval) it outperforms GPT-4o. Memo AI gets it for free via SambaNova Cloud.

Cascade depth & capacity.

How many free-tier engines each model can fall back through (left), and how many messages each user gets per day (right). More depth = more resilient. More capacity = more freedom.

Cascade depth
Engines per model — if one fails, next fires in <50ms
Smart11 engines
Coder9 engines
Reasoner9 engines
Fast9 engines
Vision8 engines
Live4 engines
Daily capacity
Messages per user per day (10 active users)
Fast5,000 /day
Auto2,000 /day
Smart1,000 /day
Live1,000 /day
Coder800 /day
Reasoner500 /day
Vision500 /day

Multi-provider redundancy.

Every provider runs independently. If Groq goes down entirely, SambaNova picks up. If SambaNova is busy, Cerebras fires. If all primary providers fail, OpenRouter's 17 :free models catch the request. Total redundancy.

API key distribution
Multiple connections per provider — each rotates per request to spread load
Google (Gemini 2.5 Flash, Flash Lite, Gemini 3 Flash Preview)11 models
Tavily (Web search + research (Live mode))6 models
SambaNova (DeepSeek V3.2/V3.1, Llama 4 Maverick)4 models
Cerebras (Qwen 3 235B, Llama 3.1 8B (2,000 tok/sec))4 models
Cloudflare (FLUX.2 klein 9B/4B image gen + R2 file storage)4 models
OpenRouter (GPT-OSS, Nemotron, GLM-4.5, Gemma 4, Arcee Trinity)3 models
Groq (GPT-OSS 120B/20B, Llama 3.3/3.1, Qwen 3, Scout)2 models
Anthropic (Claude Haiku 4.5 — final OCR fallback)1 models

Throughput per provider.

Combined requests per minute (RPM) and tokens per minute (TPM) across all connections per provider. Higher = more concurrent users supported without rate limits.

Requests per minute
Combined RPM across all connections
Groq540 rpm
Cerebras120 rpm
SambaNova40 rpm
Gemini120 rpm
OpenRouter40 rpm
Tokens per minute
Combined TPM capacity
Groq2.7M tpm
Cerebras4.0M tpm
SambaNova2.0M tpm
Gemini3.0M tpm
OpenRouter1.0M tpm

Performance by the numbers.

Time to first token (lower is faster) and tokens per second (higher is faster). Measured live via scripts/full-key-audit.mjs on 12 May 2026.

Time to first token
Lower = snappier feel
Smart813 ms
Reasoner850 ms
Live2291 ms
Fast41 ms
Vision44 ms
Tokens per second
Higher = streams faster
Fast2000 tok/s
Vision180 tok/s
Reasoner120 tok/s
Smart85 tok/s

Token economics.

Total daily processing capacity across all models, all providers, all engines combined. And what it costs Memo Fashion.

🧠
37
Live models
Across Groq, Cerebras, SambaNova, Gemini, OpenRouter & Cloudflare
📊
25M
Tokens / day
Combined free-tier throughput — ~90 novels every day
🔑
32
Failover connections
Groq · Cerebras · SambaNova · Gemini · OpenRouter · Cloudflare · Tavily
💰
£0
Cost / month
Free-tier infrastructure, including FLUX.2 klein image gen + edit

Memo AI vs the rest.

How Memo AI stacks up against the leading commercial AI products. We deliberately built the features that matter most for Memo Fashion's workflow.

FeatureMemo AIChatGPTClaudeGemini
Real-time streaming
7 models in one app
Web search built in
Image generation (FLUX.2)
Image editing (FLUX.2 klein)
Document upload (10 files)
Excel/Word/PDF download
Attachments persist in chat
Memory across all chats
50 saved conversations
Visible reasoning trace
Temporary chat
Light & dark mode
PWA installable
Voice input
Drag & drop files
Custom branding
Fully owned & in-house
Cost to Memo Fashion£0£20+/user£18+/user£18+/user

Daily limits.

Realistic limits for 15 staff on 100% free-tier infrastructure. Every model cascades 7–12 engines deep — if one provider is rate-limited, the next fires in under 50ms. You'll never notice.

Smart
1,000
messages / user / day
10,000 total across 15 users · SambaNova DeepSeek V3.2 (685B MoE) · Groq GPT-OSS 120B · Llama 4 Maverick · DeepSeek V3.1 · Cerebras Qwen 3 235B · Groq + OpenRouter fallbacks
Reasoner
500
messages / user / day
5,000 total across 15 users · SambaNova DeepSeek V3.2 · DeepSeek V3.1 · V3.1-cb · Groq GPT-OSS 120B · Cerebras Qwen 3 235B · Llama 4 Maverick
Live
1,000
messages / user / day
10,000 total across 15 users · Gemini (Flash + Lite + Gemini 3) with 11 rotating connections · Tavily search
Fast
5,000
messages / user / day
50,000 total across 15 users · Cerebras Llama 3.1 8B (2,000 tok/sec) · Groq GPT-OSS 20B (41ms) · Llama 3.1 8B · OpenRouter
Vision
500
messages / user / day
5,000 total across 15 users · SambaNova Llama 4 Maverick · Gemini 2.5 Flash + Flash Lite · Groq Scout · Gemma 4 · Nemotron VL
Coder
800
messages / user / day
8,000 total across 15 users · SambaNova DeepSeek V3.2 · Groq GPT-OSS 120B · Cerebras Qwen 3 235B · DeepSeek V3.1 · Groq Qwen 3 32B · OpenRouter :free (GLM-4.5, GPT-OSS, Qwen3 Coder, Arcee)

Never fails.

Every mode has a deep cascade of free-tier engines. If Groq is down, SambaNova fires. If SambaNova is busy, Cerebras takes over. If all primary providers exhaust, OpenRouter :free models catch the request. Under 50ms to switch. Users never see an error — just a different engine label.

SMART CASCADE (11 STEPS)
1. SambaNova DeepSeek V3.2 (685B)
2. Groq GPT-OSS 120B
3. SambaNova Llama 4 Maverick
4. SambaNova DeepSeek V3.1
5. Cerebras Qwen 3 235B
6. Groq Llama 3.3 70B
7. OpenRouter GPT-OSS 120B :free
8. OpenRouter Nemotron 120B :free
9. OpenRouter GLM-4.5 Air :free
10. OpenRouter Arcee Trinity :free
11. OpenRouter Gemma 3 12B :free
FAST CASCADE (9 STEPS)
1. Cerebras Llama 3.1 8B (2,000 tok/sec)
2. Groq GPT-OSS 20B (41ms)
3. Groq Llama 3.1 8B Instant
4. OpenRouter Nemotron Nano 9B :free
5. OpenRouter Liquid LFM 2.5 :free
6. OpenRouter GPT-OSS 20B :free
7. OpenRouter Gemma 3 4B :free
8. OpenRouter Gemma 3n 4B :free
9. OpenRouter Gemma 3n 2B :free
LIVE CASCADE (4 STEPS)
1. Gemini 2.5 Flash + Google Search
2. Gemini 2.5 Flash Lite + Search
3. Gemini 3 Flash Preview + Search
4. Groq GPT-OSS 120B + Tavily

Your data stays yours.

Chat history is encrypted in Supabase (PostgreSQL). File attachments are stored in Cloudflare R2 with per-user isolation. No data is sent to consumer AI products (ChatGPT, Claude.ai, Gemini app, Copilot). All processing routes through API-only inference endpoints with explicit no-training contracts — Anthropic Claude Haiku is used only as an emergency OCR fallback for receipts, under Anthropic's API privacy policy (zero retention, no training).

🔒
Encrypted
Chat history
Supabase PostgreSQL with RLS policies
☁️
R2
File storage
Cloudflare R2 · 10GB free · unlimited egress
🚫
Zero
Training on your data
API-only inference — providers cannot use your inputs
🏢
Owned
By Memo Fashion
Source code, deployment, data — 100% internal

Explore more.

Dive deeper into Memo AI — browse the models, see the journey, or start chatting right now.

Ready to try it?

You're already part of the team. Sign in with your Memo email and start chatting.

Sign in to Memo AI →