Solving deterministic policy (PO)MDPs using expectationmaximisation and antifreeze (2009)

by David Barber, Tom Furmston
Venue:In European Conference on Machine Learning (LEMIR workshop