| Leonid Peshkin and Sayan Mukherjee, Bounds on sample size for policy evaluation in Markov environments, Proceedings of the Fourteenth Annual Conf. on Computational Learning Theory, 2001, pp. 608-15. |
....use importance sampling to estimate Q values for MDPs with function approximation for the case where all data have been collected using a single policy. Meuleau et al. 2001] uses importance sampling for POMDPs, but to modify the REINFORCE algorithm [Williams, 1992] which ignores past trials. [Peshkin and Mukherjee, 2001] considers estimators very similar to the ones developed here and prove theoretical PAC bounds for them. This paper di#ers from previous work in that it allows multiple sampling policies, uses normalized estimators for POMDP problems, derives exact bias and variance formulas for normalized and ....
Peshkin, L. and Mukherjee, S. (2001). Bounds on sample size for policy evaluation in markov environments. In Fourteenth Annual Conference on Computational Learning Theory.
No context found.
Leonid Peshkin and Sayan Mukherjee, Bounds on sample size for policy evaluation in Markov environments, Proceedings of the Fourteenth Annual Conf. on Computational Learning Theory, 2001, pp. 608-15.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC