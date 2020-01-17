When the future is uncertain, future rewards can be represented as a probability distribution. some possible future contracts are good (teal), others are bad (red). Distribution reinforcement learning can learn about this distribution through the anticipated rewards through a variation of the TD algorithm. Credit: Nature (2020). DOI: 10.1038 / s41586-019-1924-6

A team of researchers from DeepMind, University College, and Harvard University has found that lessons learned from applying learning techniques to AI systems can help us explain how reward pathways work in the brain. In their article published in the journal Nature, the team describes the comparison of learning to distribute weapons on a dopamine-processing computer in the mouse brain and what they learned from it.

Previous research has shown that dopamine produced in the brain is involved in reward processing – it is produced when something good happens and its expression results in feelings of pleasure. Some studies have also suggested that dopamine-responsive brain neurons respond in the same way – a fact that causes the person or mouse to feel either good or bad. Other studies have shown that the neuronal response is more steep. In this new effort, the researchers found evidence supporting the latter theory.

Mental reinforcement learning is a type of reinforcement based learning. It is often used when designing games like Starcraft II or Go. It monitors good moves against bad moves and learns to reduce the number of bad moves, improving its performance the more it plays. But such systems do not treat all good and bad movements the same – each movement is measured as it is recorded and the weights are part of the calculations used in future motion choices.

Researchers have noticed that people seem to use a similar strategy to improve their level of play, too. Researchers in London suspect that similarities between AI systems and the way the brain processes reward processing were likely similar. To determine if they were correct, they performed mouse experiments. They introduced devices into their brains that were able to record responses from individual dopamine neurons. The mice were then trained to perform a task in which they were rewarded for responding in the desired manner.

Mouse neuron responses revealed that not everyone responded in the same way as previous theory had predicted. Instead, they responded in credible different ways – an indication that the levels of pleasure the mice experienced were more than a slope, as the team had predicted.

Distributive TD learns value estimations for many different parts of the reward distribution. the portion of which covers a particular estimate is determined by the type of asymmetric information applied to that estimate. (a) A “pessimistic” cell would amplify the negative updates and ignore the positive updates; an “optimistic” cell would amplify the positive updates and ignore the negative updates. (b) This leads to a variety of pessimistic or optimistic value judgments, which appear here as points in the cumulative distribution of rewards, which capture (c) The full distribution of rewards. Credit: Nature (2020). DOI: 10.1038 / s41586-019-1924-6

Sugar changes the chemistry of your brain

More information:

Will Dabney et al. A distribution code for the value of learning dopamine-based aid, Nature (2020). DOI: 10.1038 / s41586-019-1924-6

© 2020 Science X Network

Reference:

AI learning technique can visualize the functioning of reward pathways in the brain (2020, 17 January)

retrieved on 17 January 2020

from https://techxplore.com/news/2020-01-ai-technique-function-reward-pathways.html

This document is subject to copyright. Except for any fair transaction for private study or research purposes, no

part may be reproduced without written permission. The content is provided for informational purposes only.