Billing
Memcone uses a compute unit model — you pay for work done, not for storage or number of memories.
Trial
Every new account gets 200 compute units to try Memcone — no credit card required. At typical usage patterns, that covers 50–200 API calls depending on call type.
Pricing
After the trial: $0.80 per 1,000 compute units.
| Call | Units | Notes |
|---|---|---|
| POST /v1/context (cache hit) | 1 | Served from Redis — fast and cheap |
| POST /v1/context (cache miss) | 3 | Full retrieval + cache population |
| POST /v1/context?mode=fresh | 5 | Bypasses cache, always recomputes |
| POST /v1/remember | 3 | Extraction, embedding, contradiction check |
| POST /v1/recall | 1 | Semantic search, no caching |
Estimating your bill
A typical SaaS AI app with 1,000 active users sending 10 messages/day:
- 10,000
/v1/remembercalls × 3 units = 30,000 units - 10,000
/v1/contextcalls at 70% hit rate:- 7,000 hits × 1 unit = 7,000
- 3,000 misses × 3 units = 9,000
- = 16,000 units
- Total: ~46,000 units/day
That's 1,380,000 units/month. At $0.80/1k = ~$1,104/month for 1,000 daily active users, or ~$1.10/user/month.
Reducing costs
The single biggest lever is cache hit rate. Every cache hit costs 1 unit instead of 3–5.
Cache hit rate improves when:
- Users return to the same scopeId across sessions (the cache is persistent)
- The
taskstring is consistent for similar calls (small punctuation and casing changes are normalized, but different intent still creates a new cache key) - You call
rememberin batches rather than per-message where possible
Check your current hit rate with GET /v1/usage.
Billing cycle
- Monthly billing
- Usage tracked from account creation date
- Trial units are one-time, not monthly
- View current usage in the dashboard
Paying
Upgrade your plan in the billing dashboard. Payments are processed by Stripe.
MCP rate limits
MCP tool traffic is metered separately from compute units: 60 requests per minute per API key across all MCP tools. See MCP docs and the rate limit callout on API Keys.
MCP remember, recall, and context still consume compute units on the same meter as direct POST /v1/* calls. Free-tier limits use that live total at request time.