Online Learning and Planning in Resource Conservation Games

Publication Type: 
Authors: 
Yundi Qian
William B. Haskell
Albert Xin Jiang
Milind Tambe
Description/Abstract: 
Protecting our environment and natural resources is a major global challenge. “Protectors” (law enforcement agencies) try to protect these natural resources, while “extractors” (criminals) seek to exploit them. In many domains, such as illegal fishing, the extractors know more about the distribution and richness of the resources than the protectors, making it extremely difficult for the protectors to optimally allocate their assets for patrol and interdiction. Fortunately, extractors carry out frequent illegal extractions, so protectors can learn the richness of resources by observing the extractor’s behavior. This paper presents an approach for allocating protector assets based on learning from extractors. We make the following four specific contributions: (i) we model resource conservation as a repeated game and transform this repeated game into a POMDP, which cannot be solved by the latest general POMDP solvers due to its exponential state space; (ii) in response, we propose GMOP, a dedicated algorithm that combines Gibbs sampling with Monte Carlo tree search for online planning in this POMDP; (iii) for a specific class of our game, we speed up the GMOP algorithm without sacrificing solution quality, as well as provide a heuristic that trades off solution quality for lower computational cost; (iv) we explore the continuous utility scenario where the POMDP becomes a continuous-state POMDP, and provide a solution in special cases.