How Memo AI actually works.
Seven AI model tiers (including Auto-router). Persistent memory, RAG knowledge base, voice loop and Artifacts panel in v2.0. 40+ engines across 9 infrastructure partners. FLUX.2 klein for image editing, Gemini for live search, DeepSeek V3.2 for frontier reasoning. £0/month.
The request flow.
Every message you send travels through 8 stages. Most complete in under 200 milliseconds.
User sends message
Frontend captures input, attachments (PDF/Excel/Word/images), voice, or drag-drop. Streams via SSE.
Quota check
Per-user, per-model daily limit verified. 1K Smart · 500 Reasoner · 1K Live · 5K Fast · 800 Coder · 500 Vision.
Intent detection
Auto-detect: image generation? Image editing? Web search? Vision? Document analysis? Code? Plain chat?
Cascade routing
Selected model's cascade fires. Multiple API connections across 7 providers rotate via round-robin — no single connection is always first.
Inter-model fallback
If provider returns 429/5xx, falls through to the next cascade step in <50ms. Smart has 11 steps, Reasoner 9, Fast 9, Coder 9, Vision 8, Live 4.
Stream tokens
Real-time SSE streaming. <think> blocks are separated and hidden behind a toggle. Backend label + latency shown per reply.
Markdown rendering
Custom parser: tables, syntax-highlighted code blocks, lists, images. Theme-aware styling in light and dark mode.
Persist & log
Chat history saved to encrypted Supabase. File attachments persist in Cloudflare R2. Events logged for admin dashboard.
Model parameters.
Specs for every brain inside Memo AI — total parameters, active parameters, context window and primary strength.
Model benchmarks.
How our primary AI engines compare to the industry leaders. These are real benchmark scores from published evaluations — not marketing claims.
| Benchmark | DeepSeek V3.2 | GPT-4o | Claude Sonnet | Gemini 2.5 |
|---|---|---|---|---|
| MMLU (knowledge) | 87.1% | 88.7% | 88.7% | 90.0% |
| HumanEval (code) | 92.7% | 90.2% | 92.0% | 89.5% |
| MATH-500 (maths) | 90.2% | 76.6% | 78.3% | 83.2% |
| SWE-bench (coding) | 42.0% | 33.2% | 49.0% | 38.8% |
| Arena ELO (overall) | 1318 | 1287 | 1271 | 1299 |
| Context window | 128K | 128K | 200K | 1M |
| Cost to you | £0 | £20+/user | £18+/user | £18+/user |
DeepSeek V3.2 (685B MoE, 37B active) is the primary Smart, Reasoner, and Coder engine in Memo AI. It's a mixture-of-experts model — only 37 billion parameters are active per token, but the full 685B knowledge base is available. On maths (MATH-500) and code (HumanEval) it outperforms GPT-4o. Memo AI gets it for free via SambaNova Cloud.
Cascade depth & capacity.
How many free-tier engines each model can fall back through (left), and how many messages each user gets per day (right). More depth = more resilient. More capacity = more freedom.
Multi-provider redundancy.
Every provider runs independently. If Groq goes down entirely, SambaNova picks up. If SambaNova is busy, Cerebras fires. If all primary providers fail, OpenRouter's 17 :free models catch the request. Total redundancy.
Throughput per provider.
Combined requests per minute (RPM) and tokens per minute (TPM) across all connections per provider. Higher = more concurrent users supported without rate limits.
Performance by the numbers.
Time to first token (lower is faster) and tokens per second (higher is faster). Measured live via scripts/full-key-audit.mjs on 12 May 2026.
Token economics.
Total daily processing capacity across all models, all providers, all engines combined. And what it costs Memo Fashion.
Memo AI vs the rest.
How Memo AI stacks up against the leading commercial AI products. We deliberately built the features that matter most for Memo Fashion's workflow.
| Feature | Memo AI | ChatGPT | Claude | Gemini |
|---|---|---|---|---|
| Real-time streaming | ✓ | ✓ | ✓ | ✓ |
| 7 models in one app | ✓ | ✗ | ✗ | ✗ |
| Web search built in | ✓ | ✓ | ✗ | ✓ |
| Image generation (FLUX.2) | ✓ | ✓ | ✗ | ✓ |
| Image editing (FLUX.2 klein) | ✓ | ✓ | ✗ | ✓ |
| Document upload (10 files) | ✓ | ✗ | ✓ | ✗ |
| Excel/Word/PDF download | ✓ | ✓ | ✗ | ✗ |
| Attachments persist in chat | ✓ | ✓ | ✓ | ✗ |
| Memory across all chats | ✓ | ✓ | ✓ | ✗ |
| 50 saved conversations | ✓ | ✓ | ✓ | ✓ |
| Visible reasoning trace | ✓ | ✓ | ✓ | ✗ |
| Temporary chat | ✓ | ✓ | ✗ | ✗ |
| Light & dark mode | ✓ | ✓ | ✓ | ✓ |
| PWA installable | ✓ | ✓ | ✗ | ✗ |
| Voice input | ✓ | ✓ | ✗ | ✗ |
| Drag & drop files | ✓ | ✓ | ✓ | ✗ |
| Custom branding | ✓ | ✗ | ✗ | ✗ |
| Fully owned & in-house | ✓ | ✗ | ✗ | ✗ |
| Cost to Memo Fashion | £0 | £20+/user | £18+/user | £18+/user |
Daily limits.
Realistic limits for 15 staff on 100% free-tier infrastructure. Every model cascades 7–12 engines deep — if one provider is rate-limited, the next fires in under 50ms. You'll never notice.
Never fails.
Every mode has a deep cascade of free-tier engines. If Groq is down, SambaNova fires. If SambaNova is busy, Cerebras takes over. If all primary providers exhaust, OpenRouter :free models catch the request. Under 50ms to switch. Users never see an error — just a different engine label.
2. Groq GPT-OSS 120B
3. SambaNova Llama 4 Maverick
4. SambaNova DeepSeek V3.1
5. Cerebras Qwen 3 235B
6. Groq Llama 3.3 70B
7. OpenRouter GPT-OSS 120B :free
8. OpenRouter Nemotron 120B :free
9. OpenRouter GLM-4.5 Air :free
10. OpenRouter Arcee Trinity :free
11. OpenRouter Gemma 3 12B :free
2. Groq GPT-OSS 20B (41ms)
3. Groq Llama 3.1 8B Instant
4. OpenRouter Nemotron Nano 9B :free
5. OpenRouter Liquid LFM 2.5 :free
6. OpenRouter GPT-OSS 20B :free
7. OpenRouter Gemma 3 4B :free
8. OpenRouter Gemma 3n 4B :free
9. OpenRouter Gemma 3n 2B :free
2. Gemini 2.5 Flash Lite + Search
3. Gemini 3 Flash Preview + Search
4. Groq GPT-OSS 120B + Tavily
Your data stays yours.
Chat history is encrypted in Supabase (PostgreSQL). File attachments are stored in Cloudflare R2 with per-user isolation. No data is sent to consumer AI products (ChatGPT, Claude.ai, Gemini app, Copilot). All processing routes through API-only inference endpoints with explicit no-training contracts — Anthropic Claude Haiku is used only as an emergency OCR fallback for receipts, under Anthropic's API privacy policy (zero retention, no training).
Explore more.
Dive deeper into Memo AI — browse the models, see the journey, or start chatting right now.
Ready to try it?
You're already part of the team. Sign in with your Memo email and start chatting.
Sign in to Memo AI →