Technical deep dive

How Memo AI actually works.

Five AI models. One unified interface. A custom-built cascade architecture, real-time streaming, and zero-cost operation. Here's the full picture.

The request flow.

Every message you send travels through 8 stages. Most complete in under 200 milliseconds.

01

User sends message

Frontend captures input, attachments, voice, or drag-drop. Streams via SSE.

02

Quota check

Per-user, per-model rate limit verified against company-wide hardware capacity.

03

Intent detection

Auto-detect: image generation? web search? vision? document analysis? plain chat?

04

Cascade routing

Selected model's primary engine is tried first. If rate-limited, rotates through 9 keys.

05

Inter-model fallback

If all 9 keys exhausted, falls through to next model in the cascade chain. <50ms.

06

Stream tokens

Reasoning trace separated. <think> blocks captured & hidden behind toggle.

07

Markdown rendering

Custom parser handles tables, code blocks, lists. Theme-aware styling.

08

Persist & log

Saved to encrypted Supabase. Event logged for admin dashboard. Sources cited.

Model parameters.

Specs for every brain inside Memo AI — total parameters, active parameters, context window and primary strength.

Smart
Kimi K2 (1T MoE)
Total params1T
Active32B
Context131K
KnowledgeApril 2025
General · Writing · Reasoning
Reasoner
GPT-OSS 120B
Total params120B
Active120B
Context128K
KnowledgeOctober 2024
Code · Logic · Maths
Live
Gemini 2.5 Flash
Total params
Active
Context1M
KnowledgeReal-time
Current events · Web search
Fast
Llama 3.1 8B Instant
Total params8B
Active8B
Context128K
KnowledgeDecember 2023
Speed · Quick lookups
Vision
Llama 4 Scout 17B
Total params17B
Active17B
Context128K
KnowledgeAugust 2024
Image OCR · Visual analysis

Performance by the numbers.

Time to first token (lower is faster) and tokens per second (higher is faster). Measured across thousands of production requests.

Time to first token
Lower = snappier feel
Smart320 ms
Reasoner480 ms
Live850 ms
Fast95 ms
Vision410 ms
Tokens per second
Higher = streams faster
Fast750 tok/s
Vision180 tok/s
Reasoner120 tok/s
Smart85 tok/s

Token economics.

Total daily processing capacity across all models, all keys, all engines combined. And what it costs Memo Fashion.

165K
Requests / day
Combined capacity across 5 models
📊
16.2M
Tokens / day
~60 novels worth of text every day
🔑
27
Failover keys
9 keys × 3 providers
💰
£0
Cost / month
Free-tier infrastructure

Memo AI vs the rest.

How Memo AI stacks up against the leading commercial AI products. We deliberately built the features that matter most for Memo Fashion's workflow.

FeatureMemo AIChatGPTClaudeGemini
Real-time streaming
5 models in one app
Web search built in
Image generation
Document upload (10 files)
Excel/Word/PDF download
Visible reasoning trace
Temporary chat
Light & dark mode
PWA installable
Voice input
Drag & drop files
Custom branding
Fully owned & in-house
Cost to Memo Fashion£0£20+/user£18+/user£18+/user

Ready to try it?

You're already part of the team. Sign in with your Memo email and start chatting.

Sign in to Memo AI →