Total: 1
The Generative Flow Network (GFlowNet) is a probabilistic framework in which an agent learns a stochastic policy and flow functions to sample objects with probability proportional to an unnormalized reward function. GFlowNets share a strong connection with reinforcement learning (RL) that typically aims to maximize reward. A number of recent works explored connections between GFlowNets and maximum entropy (MaxEnt) RL, which incorporates entropy regularization into the standard RL objective. However, the relationship between GFlowNets and standard RL remains largely unexplored, despite the inherent similarities in their sequential decision-making nature.While GFlowNets can discover diverse solutions through specialized flow-matching objectives, connecting them to standard RL can simplify their implementation through well-established RL principles and also improve RL's capabilities in diverse solution discovery (a critical requirement in many real-world applications), and bridging this gap can further unlock the potential of both fields. In this paper, we bridge this gap by revealing a fundamental connection between GFlowNets and one of the most basic components of RL -- policy evaluation. Surprisingly, we find that the value function obtained from evaluating a uniform policy is closely associated with the flow functions in GFlowNets. Building upon these insights, we introduce a rectified random policy evaluation (RPE) algorithm, which achieves the same reward-matching effect as GFlowNets based on simply evaluating a fixed random policy, offering a new perspective. Empirical results across extensive benchmarks demonstrate that RPE achieves competitive results compared to previous approaches, shedding light on the previously overlooked connection between (non-MaxEnt) RL and GFlowNets.