Reinforcement - Moozonian

Titan-Apex v9.4 is analyzing data for 'Reinforcement'...

http://arxiv.org/abs/2210.07730v1

DroneARchery: Human-Drone Interaction through Augmented Reality w...

We propose a novel concept of augmented reality (AR) human-drone interaction driven by RL-based swarm behavior to achieve intuitive and immersive control of a swarm formation of unmanned aerial vehicl...

http://arxiv.org/abs/1006.5224v1

Natural rubber-clay nanocomposites: mechanical and structural pro...

The mechanical properties of non-vulcanized natural rubber and dialyzed natural rubber-clay nanocomposites have been studied by uniaxial deformations to evaluate the reinforcement efficiency of the cl...

http://arxiv.org/abs/2508.16474v1

Reinforcement Learning-based Control via Y-wise Affine Neural Net...

This work presents a novel reinforcement learning (RL) algorithm based on Y-wise Affine Neural Networks (YANNs). YANNs provide an interpretable neural network which can exactly represent known piecewi...

http://arxiv.org/abs/2507.20150v1

The Policy Cliff: A Theoretical Analysis of Reward-Policy Maps in...

Reinforcement learning (RL) plays a crucial role in shaping the behavior of large language and reasoning models (LLMs/LRMs). However, it often produces brittle and unstable policies, leading to critic...

http://arxiv.org/abs/1805.07813v4

Learning Real-World Robot Policies by Dreaming

Learning to control robots directly based on images is a primary challenge in robotics. However, many existing reinforcement learning approaches require iteratively obtaining millions of robot samples...

https://github.com/datamllab/rlcard

datamllab/rlcard

Reinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO. (⭐ 3412)

https://www.bing.com/ck/a?!&&p=3b3c7bf6b1f8655b8fd65b7a56791db424accc5810905c19b9cab86f734d2a38JmltdHM9MTc3Mjc1NTIwMA&ptn=3&ver=2&hsh=4&fclid=086ed158-78b9-6692-250c-c64c79466711&u=a1aHR0cHM6Ly93d3cubWF5b2NsaW5pY2hlYWx0aHN5c3RlbS5vcmcvaG9tZXRvd24taGVhbHRoL3NwZWFraW5nLW9mLWhlYWx0aC9wYXJlbnRpbmctYS1jaGlsZC1vci10ZWVuLXdpdGgtYWRoZA&ntb=1

Parenting a child, teen with ADHD - Mayo Clinic Health System

Sep 13, 2022 · Parenting a child or teen with ADHD can be difficult, but behavioral parent training using positive reinforcement can help.

http://arxiv.org/abs/2507.10619v1

Meta-Reinforcement Learning for Fast and Data-Efficient Spectrum ...

The dynamic allocation of spectrum in 5G / 6G networks is critical to efficient resource utilization. However, applying traditional deep reinforcement learning (DRL) is often infeasible due to its imm...

http://arxiv.org/abs/2307.08780v5

Discounted-Sum Automata with Multiple Discount Factors

Discounting the influence of future events is a key paradigm in economics and it is widely used in computer-science models, such as games, Markov decision processes (MDPs), reinforcement learning, and...

http://arxiv.org/abs/1810.07207v1

Reinforcement Learning Decoders for Fault-Tolerant Quantum Comput...

Topological error correcting codes, and particularly the surface code, currently provide the most feasible roadmap towards large-scale fault-tolerant quantum computation. As such, obtaining fast and f...

http://arxiv.org/abs/2506.02507v3

AURA: Autonomous Upskilling with Retrieval-Augmented Agents

Designing reinforcement learning curricula for agile robots traditionally requires extensive manual tuning of reward functions, environment randomizations, and training configurations. We introduce AU...

http://arxiv.org/abs/2409.11191v1

Linear Jamming Bandits: Learning to Jam 5G-based Coded Communicat...

We study jamming of an OFDM-modulated signal which employs forward error correction coding. We extend this to leverage reinforcement learning with a contextual bandit to jam a 5G-based system implemen...

https://github.com/vwxyzjn/cleanrl

vwxyzjn/cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG) (⭐ 9222)

http://arxiv.org/abs/2405.07087v1

Auditing an Automatic Grading Model with deep Reinforcement Learn...

We explore the use of deep reinforcement learning to audit an automatic short answer grading (ASAG) model. Automatic grading may decrease the time burden of rating open-ended items for educators, but ...

http://arxiv.org/abs/2506.02355v2

Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpeni...

Reinforcement learning is emerging as a primary driver for improving language model reasoning capabilities. A fundamental question is whether current reinforcement learning algorithms -- such as Group...

http://arxiv.org/abs/2407.08065v1

Towards Interpretable Foundation Models of Robot Behavior: A Task...

Foundation models are a promising path toward general-purpose and user-friendly robots. The prevalent approach involves training a generalist policy that, like a reinforcement learning policy, uses ob...

http://arxiv.org/abs/2602.17062v1

Retaining Suboptimal Actions to Follow Shifting Optima in Multi-A...

Value decomposition is a core approach for cooperative multi-agent reinforcement learning (MARL). However, existing methods still rely on a single optimal action and struggle to adapt when the underly...

http://arxiv.org/abs/2410.09362v1

SeRA: Self-Reviewing and Alignment of Large Language Models using...

Direct alignment algorithms (DAAs), such as direct preference optimization (DPO), have become popular alternatives for Reinforcement Learning from Human Feedback (RLHF) due to their simplicity, effici...

http://arxiv.org/abs/2103.02315v1

Reinforcement Learning Control of a Forestry Crane Manipulator

Forestry machines are heavy vehicles performing complex manipulation tasks in unstructured production forest environments. Together with the complex dynamics of the on-board hydraulically actuated cra...

http://arxiv.org/abs/2602.00403v1

DROGO: Default Representation Objective via Graph Optimization in...

In computational reinforcement learning, the default representation (DR) and its principal eigenvector have been shown to be effective for a wide variety of applications, including reward shaping, cou...