Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path (2008)

by András Antos, Csaba Szepesvári, Rémi Munos
Venue:MACHINE LEARNING JOURNAL (2008) 71:89-129