C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.
[CVPR 2026] TeamHOI: Learning a Unified Policy for Cooperative Human-Object Interactions with Any Team Size
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
Loading…
Loading…