MemAgent on vime#

MemAgent (Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent) is an RL-based memory agent workflow for very long documents. It splits a document into chunks, reads them sequentially, and compresses key information into a fixed-size memory (overwrite policy). After all chunks are processed, the model answers using only the problem statement and the memory, with the final answer in \boxed{}. Because memory size stays constant, inference scales linearly (O(N)) with document length—without changing model architecture or positional encodings.

The paper trains end-to-end with Multi-Conv DAPO (this example uses GRPO) on multi-turn, context-independent trajectories with verifiable rewards on HotpotQA and RULER. Qwen2.5-7B trained on 32K documents generalizes to million-token QA with near-lossless performance on RULER-HQA.

This example reproduces that pipeline on vime: HotpotQA multi-turn rollout, GRPO training, and RULER-HQA evaluation, with vLLM as the inference backend.

Files#

File

Description

rollout.py

Multi-turn MemAgent rollout + HotpotQA reward

rollout_client.py

vLLM router client for multi-turn turns

custom_convert.py

Unroll trajectories for GRPO training

prepare_data.py

HotpotQA parquet/HF → JSONL

eval_ruler_hqa.py

RULER-HQA evaluation script

Launch scripts#

Run from vime repo root (cd vime):

Script

Purpose

run-qwen3-4b-train.sh

GRPO training (default 100 steps)

run-eval.sh

RULER-HQA eval (vLLM serve + eval_ruler_hqa.py)

convert-to-hf.sh

Megatron iter_* → HuggingFace

prepare-eval-data.sh

Check/download eval_{length}.json

Shared setup lives in _common.sh (paths, MemAgent env vars, Ray launch).

Quick start#

Data download#

Training and evaluation data come from the MemAgent HuggingFace dataset BytedTsinghua-SIA/hotpotqa.

Training set — download the train split and convert to vime JSONL:

pip install datasets huggingface_hub pandas pyarrow

# Optional: use a mirror if huggingface.co is slow
export HF_ENDPOINT=https://hf-mirror.com

mkdir -p /data/datasets/hotpotqa_slime

python examples/mem_agent/prepare_data.py \
  --hf-dataset BytedTsinghua-SIA/hotpotqa \
  --hf-split train \
  --output /data/datasets/hotpotqa_slime/train.jsonl

If you already have a local hotpotqa_train.parquet, pass --input instead of --hf-dataset.

Eval set (RULER-HQA)eval_{50,100,200,...}.json files under DATA_ROOT (default /data/datasets/hotpotqa_hf):

mkdir -p /data/datasets/hotpotqa_hf

# Download all default lengths (50 … 6400)
bash examples/mem_agent/prepare-eval-data.sh --download

# Or only the lengths you need
LENGTHS="50 200 800" bash examples/mem_agent/prepare-eval-data.sh --download

To check which files are present without downloading:

LENGTHS="50 200 800" bash examples/mem_agent/prepare-eval-data.sh

Run pipeline#

cd vime

# 1. Training (100 steps by default; set TRAIN_DATA if you used a different path)
bash examples/mem_agent/run-qwen3-4b-train.sh

# 2. Eval (after convert to HF)
CONVERT=1 SINGLE_ITER=iter_0000099 bash examples/mem_agent/run-eval.sh

# Baseline (untrained Qwen3-4B)
MODEL_PATH=/data/models/Qwen3-4B SAVE_FILE=Qwen3-4B-base bash examples/mem_agent/run-eval.sh

Environment variables#

Paths (override for your cluster):

  • HF_CKPT, TORCH_DIST, TRAIN_DATA, SAVE_PATH, DATA_ROOT

Training:

  • NUM_ROLLOUT, MEM_CHUNK_TOKENS, MEM_MAX_MEMORY, MEM_MAX_FINAL, MEM_MAX_CHUNKS

Eval:

  • MODEL_PATH, LENGTH (e.g. "50 200 800"), CONVERT, SINGLE_ITER, SAVE_FILE

Example paths#

export HF_CKPT=/data/models/Qwen3-4B
export TORCH_DIST=/data/models/Qwen3-4B_torch_dist
export TRAIN_DATA=/data/datasets/hotpotqa_slime/train.jsonl
export SAVE_PATH=/data/models/MemAgent_Qwen3-4B-RL
export DATA_ROOT=/data/datasets/hotpotqa_hf

bash examples/mem_agent/run-qwen3-4b-train.sh