Well Executed Reasoning RL
- Skywork Open Reasoner 1 Technical Report
- AceReason‑Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
- GLM‑4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
- Seed1.5‑Thinking: Advancing Superb Reasoning Models with Reinforcement Learning
- Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
- AM‑Thinking‑v1: Advancing the Frontier of Reasoning at 32B Scale
- MiniMax‑M1: Scaling Test‑Time Compute Efficiently with Lightning Attention
- Hunyuan A13B Technical Report
- POLARIS: A POst-training recipe for scaling reinforcement Learning on Advanced ReasonIng modelS
- DeepCoder: A Fully Open Source 14B Coder at O3-mini Level
- Your Efficient RL Framework Secretly Brings You Off-Policy RL Training
Infrastructure & More Good Stuff
Agentic Related
- SFR‑DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents
- https://arxiv.org/abs/2509.10446
- AgentGym‑RL: Training LLM Agents for Long‑Horizon Decision Making through Multi‑Turn Reinforcement Learning
- WebExplorer: Explore and Evolve for Training Long‑Horizon Web Agents
- DeepSWE: Training a Fully Open-sourced State-of-the-Art Coding Agent by Scaling RL
- SWE‑Swiss: A Multi‑Task Fine‑Tuning and RL Recipe for High Performance Issue Resolution
General RL and Cool Ideas
- Writing‑Zero: Bridge the Gap Between Non‑verifiable Tasks and Verifiable Rewards
- Kimi K2: Open Agentic Intelligence
- Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning
- The Majority is not always right: RL training for solution aggregation
- DuPO: Enabling Reliable LLM Self‑Verification via Dual Preference Optimization
- Inference‑Time Scaling for Generalist Reward Modeling