benchmarks
Public benchmark snapshots for Memcone on BEAM.
We track judged answer quality, prompt footprint, context footprint, and latency against a simple full-transcript replay baseline.
judged accuracy
39.0%
100 judged questions
prompt compression
50.4x
473.9 avg tokens vs 23,886.5 replay
memory compression
59.5x
449.4 avg memory tokens
context latency
614 ms
remember avg 4,798 ms
Memcone
accuracy
39.0%
prompt avg
473.9 tok
memory avg
449.4 tok
model latency
1,458 ms
context latency
614 ms
Full transcript replay
accuracy
53.0%
prompt avg
23,886.5 tok
memory avg
26,750.1 tok
model latency
3,011 ms
context latency
n/a
run history
Apr 24, 2026
Phase 5 tightened — 10×10
45.0%
latest
Mar 2026
Phase 4 semantic baseline
49.1%
prior best
Feb 2026
Lexical retrieval (worst)
~38%
lexical floor
Accuracy is improving run-to-run. The token efficiency advantage (~50.4× fewer prompt tokens than replay) is consistent across all runs and is the primary product metric.
dataset
Source: Mohammadta/BEAM. Latest snapshot includes 10 conversations, 500 turns, and 100 judged questions.
setup
Both strategies use the same answer model. Memcone answers from compressed memory. The baseline answers from full transcript replay.
judging
Predictions are scored with a BEAM-aligned rubric judge and published as product benchmark snapshots.