AI bandits on a budget

Posted by a.hay on 4 February 2014 - 2:02pm

By Long Tran-Thanh, researcher at the School of Electronics and Computer Science, University Of Southampton, and ORCHID.

This article is part of our series: a day in the software life, in which we ask researchers from all disciplines to discuss the tools that make their research possible.

One of the goals of artificial intelligence research is to build an autonomous machine that can make its own decisions. This would be able to act without a human needing to operate it and under a wide range of conditions. It would, however, face a big problem. If the machine has to make a choice and does not know if that choice will pay off or not, what should it choose and how? The machine needs to find the right balance between what we AI researchers know as exploitation and exploration. That is to say, between getting the best payoff there and then and a potential benefit in the future.

One answer comes in the form of multi-armed bandits. These are problems used by researchers to find the best way to model the trade-off between exploration and exploitation. As the name suggests, this alludes to gamblers having to pick the right fruit machine at the right time in order to get the most reward.  One bandit model consists of a slot machine with a choice of arms to pull. By choosing to pull an arm, the AI machine gets a fixed reward. Each arm, however, may give more or less reward than the others. The answer is an algorithm, or list of instructions, that lets the machine work out how to get the most reward by making the right choices. This can work very well, but its main flaw is that it can only choose what arm to pull, not whether that choice is a good one in the long run.

The answer lies in imposing a budget on the machine. This means the machine only has a limited number of chances to pull the right arm before it runs out of that budget. My idea was inspired by two existing models. The first of these is how wireless sensor networks (WSNs) learn the optimal range of actions they can do before their batteries run out. Also, online ads, where firms need to explore the click-through rate for their banners and exploit how many of these they need to get the most from their outlay.

Since the old bandit model can’t do this, the answer lies in the form of new algorithms that I helped develop with colleagues at ECS as part of my work in the Orchid Project. This looks into how humans and machines will be able to collaborate with one another in the future. The algorithms I wrote will allow for machines to perform well on their own in a wide range of cases. However, this needed to be pressure tested and so shown to work. With that in mind, we set the algorithms to work on a series of research problems.

The first of these was to find out how to make unmanned aerial vehicles, or UAVs, more autonomous and so work with far less input from their user. Once we used the new budget bandit model, the UAVs learned how to choose the best action even when they were not sure of the situations they found themselves in.

Next, we looked into improving long term information collection. This is how researchers get the data they need for fields such as area surveillance, as well as habitat monitoring and disaster response management. We began by splitting the information collection process in two. Firstly, we found the most economical use of energy (the budget in this case) which was solved by our bandit model. We also showed the best way to decentralise the gathering of data via a new algorithm that allowed for this. The end result was able to outdo the best methods in use at the present time.

Finally, we showed how to make a group of workers improve their output even when the budget (in this case, the funds for hiring more help) was limited. The model we used was able to spot which workers were able to get the most done and when, and its effectiveness was, again, better than any other method - 300% times so, in fact. It was also able to work well even when there was a worst case scenario.

Our work on this last case was well received. It went on to win the Best Paper (Honourable Mention) Award at ECAI 2012. My next step will be to research better budget bandit algorithms and how they can be best used for crowdsourcing.

Share this page