Research
Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop Planning
We propose Stable Yet Memory Bounded Open-Loop (SYMBOL) planning, a general memory bounded approach to partially observable open-loop planning. SYMBOL maintains an adaptive stack of Thompson Sampling bandits, whose size is bounded by the planning horizon and can be automatically...
Subgoal-Based Temporal Abstraction in Monte-Carlo Tree Search
We propose an approach to general subgoal-based temporal abstraction in MCTS. Our approach approximates a set of available macro-actions locally for each state only requiring a generative model and a subgoal predicate. For that, we modify the expansion step of...
Distributed Policy Iteration for Scalable Approximation of Cooperative Multi-Agent Policies (Extended Abstract)
We propose Strong Emergent Policy (STEP) approximation, a scalable approach to learn strong decentralized policies for cooperative MAS with a distributed variant of policy iteration. For that, we use function approximation to learn from action recommendations of a decentralized multi-agent...
Memory Bounded Open-Loop Planning in Large POMDPs using Thompson Sampling
State-of-the-art approaches to partially observable planning like POMCP are based on stochastic tree search. While these approaches are computationally efficient, they may still construct search trees of considerable size, which could limit the performance due to restricted memory resources. In...
Leveraging Statistical Multi-Agent Online Planning with Emergent Value Function Approximation
Making decisions is a great challenge in distributed autonomous environments due to enormous state spaces and uncertainty. Many online planning algorithms rely on statistical sampling to avoid searching the whole state space, while still being able to make acceptable decisions....