MemAgent on vime#
MemAgent (Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent) is an RL-based memory agent workflow for very long documents. It splits a document into chunks, reads them sequentially, and compresses key information into a fixed-size memory (overwrite policy). After all chunks are processed, the model answers using only the problem statement and the memory, with the final answer in \boxed{}. Because memory size stays constant, inference scales linearly (O(N)) with document length—without changing model architecture or positional encodings.
The paper trains end-to-end with Multi-Conv DAPO (this example uses GRPO) on multi-turn, context-independent trajectories with verifiable rewards on HotpotQA and RULER. Qwen2.5-7B trained on 32K documents generalizes to million-token QA with near-lossless performance on RULER-HQA.
This example reproduces that pipeline on vime: HotpotQA multi-turn rollout, GRPO training, and RULER-HQA evaluation, with vLLM as the inference backend.
Files#
File |
Description |
|---|---|
|
Multi-turn MemAgent rollout + HotpotQA reward |
|
vLLM router client for multi-turn turns |
|
Unroll trajectories for GRPO training |
|
HotpotQA parquet/HF → JSONL |
|
RULER-HQA evaluation script |
Launch scripts#
Run from vime repo root (cd vime):
Script |
Purpose |
|---|---|
|
GRPO training (default 100 steps) |
|
RULER-HQA eval (vLLM serve + |
|
Megatron |
|
Check/download |
Shared setup lives in _common.sh (paths, MemAgent env vars, Ray launch).
Quick start#
Data download#
Training and evaluation data come from the MemAgent HuggingFace dataset BytedTsinghua-SIA/hotpotqa.
Training set — download the train split and convert to vime JSONL:
pip install datasets huggingface_hub pandas pyarrow
# Optional: use a mirror if huggingface.co is slow
export HF_ENDPOINT=https://hf-mirror.com
mkdir -p /data/datasets/hotpotqa_slime
python examples/mem_agent/prepare_data.py \
--hf-dataset BytedTsinghua-SIA/hotpotqa \
--hf-split train \
--output /data/datasets/hotpotqa_slime/train.jsonl
If you already have a local hotpotqa_train.parquet, pass --input instead of --hf-dataset.
Eval set (RULER-HQA) — eval_{50,100,200,...}.json files under DATA_ROOT (default /data/datasets/hotpotqa_hf):
mkdir -p /data/datasets/hotpotqa_hf
# Download all default lengths (50 … 6400)
bash examples/mem_agent/prepare-eval-data.sh --download
# Or only the lengths you need
LENGTHS="50 200 800" bash examples/mem_agent/prepare-eval-data.sh --download
To check which files are present without downloading:
LENGTHS="50 200 800" bash examples/mem_agent/prepare-eval-data.sh
Run pipeline#
cd vime
# 1. Training (100 steps by default; set TRAIN_DATA if you used a different path)
bash examples/mem_agent/run-qwen3-4b-train.sh
# 2. Eval (after convert to HF)
CONVERT=1 SINGLE_ITER=iter_0000099 bash examples/mem_agent/run-eval.sh
# Baseline (untrained Qwen3-4B)
MODEL_PATH=/data/models/Qwen3-4B SAVE_FILE=Qwen3-4B-base bash examples/mem_agent/run-eval.sh
Environment variables#
Paths (override for your cluster):
HF_CKPT,TORCH_DIST,TRAIN_DATA,SAVE_PATH,DATA_ROOT
Training:
NUM_ROLLOUT,MEM_CHUNK_TOKENS,MEM_MAX_MEMORY,MEM_MAX_FINAL,MEM_MAX_CHUNKS
Eval:
MODEL_PATH,LENGTH(e.g."50 200 800"),CONVERT,SINGLE_ITER,SAVE_FILE
Example paths#
export HF_CKPT=/data/models/Qwen3-4B
export TORCH_DIST=/data/models/Qwen3-4B_torch_dist
export TRAIN_DATA=/data/datasets/hotpotqa_slime/train.jsonl
export SAVE_PATH=/data/models/MemAgent_Qwen3-4B-RL
export DATA_ROOT=/data/datasets/hotpotqa_hf
bash examples/mem_agent/run-qwen3-4b-train.sh