2 citations found. Retrieving documents...
Leonid Peshkin and Sayan Mukherjee, Bounds on sample size for policy evaluation in Markov environments, Proceedings of the Fourteenth Annual Conf. on Computational Learning Theory, 2001, pp. 608-15.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Policy Improvement for POMDPs using Normalized Importance Sampling - Shelton (2001)   (1 citation)  (Correct)

....use importance sampling to estimate Q values for MDPs with function approximation for the case where all data have been collected using a single policy. Meuleau et al. 2001] uses importance sampling for POMDPs, but to modify the REINFORCE algorithm [Williams, 1992] which ignores past trials. [Peshkin and Mukherjee, 2001] considers estimators very similar to the ones developed here and prove theoretical PAC bounds for them. This paper di#ers from previous work in that it allows multiple sampling policies, uses normalized estimators for POMDP problems, derives exact bias and variance formulas for normalized and ....

Peshkin, L. and Mukherjee, S. (2001). Bounds on sample size for policy evaluation in markov environments. In Fourteenth Annual Conference on Computational Learning Theory.


Reinforcement Learning by Policy Search - Peshkin (2001)   (7 citations)  Self-citation (Peshkin)   (Correct)

No context found.

Leonid Peshkin and Sayan Mukherjee, Bounds on sample size for policy evaluation in Markov environments, Proceedings of the Fourteenth Annual Conf. on Computational Learning Theory, 2001, pp. 608-15.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC