Attempting a toy model of vertebrate understanding

Tag: perseveration

Essay 22 issues: subthalamic nucleus simulation

The essay 22 simulation explored a striatum model where the two decision paths competed: odor seeking vs random exploration, using dopamine to bias between exploration and seeking. This model resembled striatum theories like [Bariselli et al. 2020] that consider the stratum’s direct and indirect paths as competing between approach and avoidant actions.

Issues in essay 22 include both neuroscience divergence and simulation problems. Although the simulation is a loose functional model, that laxity isn’t infinite and it may have gone too far from the neuroscience.

Adenosine and perseveration

Seeking and foraging have a perseveration problem: the animal must eventually give up on a failed cue, or it will remain stuck forever. The give-up circuit in essay 22 uses the lateral habenula (Hb.l) to integrate search time until it reaches a threshold to give up. An alternative circuit in the stratum itself involves the indirect path (S.d2), the D2 dopamine receptor and adenosine, with a behaviorally relevant time scale.

When fast neurotransmitters are on the order of 10 milliseconds, creating a timeout on the order of a few minutes is a challenge. Two possible solutions in that timescale are long term potentiation (LTP) where “long” means about 20 minutes, and astrocyte calcium accumulation, which is also about 10 to 20 minutes.

Adenosine receptors (A2r) in the striatum indirect path (S.d2) measure broad neural activity from ATP byproducts that accumulate in the intercellular space. Over 10 minutes those A2r can produce internal calcium ion (Ca) in the astrocytes or via LTP to enhance the indirect path. Enhancing the indirect path (exploration), eventually causes a switch from the direct path (seeking) to exploration, essentially giving-up on the seeking.

Ventral striatum

Although the essay models the dorsal striatum (S.d), the ventral striatum (S.v aka nucleus accumbens) is more associated with exploration and food seeking. In particularly, the olfactory path for food seeking goes through S.v, while midbrain motor actions use S.d. In salamanders, the striatum only processes midbrain (“collo-“) thalamic inputs, while olfactory and direct senses (“lemno-“) go to the cortex [Butler 2008]. Assuming the salamander path is more primitive, the essay’s use of S.d in the model is a likely mistake.

But S.v raises a new issue because S.v doesn’t use the subthalamus (H.stn) [Humphries and Prescott 2009]. Although, that model only applies to the S.v shell (S.sh) not the S.v core (S.core).

Ventral striatum pathway. MLR midbrain locomotive region, P.v ventral pallidum, S.sh ventral striatum shell, Vta ventral tegmental area.

In the above diagram of a striatum shell circuit, an odor-seek path is possible through the ventral tegmental area (Vta) but there is no space for an alternate explore path.

Low dopamine and perseveration

[Rutledge et al. 2009] investigates dopamine in the context of Parkinson’s disease (PD), which exhibits perseveration as a symptom. In contrast to the essay, PD is a low dopamine condition, and adding dopamine resolves the perseveration. But that resolve is the opposite of essay 22’s dopamine model, where low dopamine resolved perseveration.

Now, it’s possible that give-up perseveration and Parkinson’s perseveration are two different symptoms, or it’s possible that the complete absence of dopamine differs from low tonic dopamine, but in either case, the essay 22 model is too simple to explain the striatum’s dopamine use.

Dopamine burst vs tonic

Dopamine in the striatum has two modes: burst and tonic. Essay 22 uses a tonic dopamine, not phasic. The striatum uses phasic dopamine to switch attention to orient to a new salient stimulus. The phasic dopamine circuit is more complicated than the tonic system because it requires coordination with acetylcholine (ACh) from the midbrain laterodorsal tegmentum (V.ldt) and pedunculopontine (V.ppt) nuclei.

A question for the essays is whether that phasic burst is primitive to the striatum, or a later addition, possibly adding an interrupt for orientation to an earlier non-interruptible striatum.

Explore semantics

The word “explore” is used differently by behavioral ecology and in reinforcement learning, despite both using foraging-like tasks. These essays have been using explore in the behavioral ecology meaning, which may cause confusion on the reinforcement learning sense. The different centers on a fixed strategy (policy) compared with changing strategies.

In behavioral ecology, foraging is literal foraging, animals browsing or hunting in a place and moving on (giving up) if the place doesn’t have food [Owen-Smith et al. 2010]. “Exploring” is moving on from an unproductive place, but the policy (strategy) remains constant because moving on is part of the strategy. The policy for when to stay and when to go [Headon et al. 1982] often follows the marginal value theorem [Charnov 1976], which specifies when the animal should move on.

In contract, reinforcement learning (RL) uses “explore” to mean changing the policy (strategy). For example, in a two-armed bandit situation (two slot machines), the RL policy is either using machine A or using machine B, or a fixed probabilistic ratio, not a timeout and give-up policy. In that context, exploring means changing the policy not merely switching machines.

[Kacelnick et al. 2011] points out that the two-choice economic model doesn’t match vertebrate animal behavior, because vertebrates use an accept-reject decision [Cisek and Hayden 2022]. So, while the two-armed bandit may be useful in economics, it’s not a natural decision model for vertebrates.

Avoidance (nicotinic receptors in M.ip)

The simulation uncovered a foraging problem, where the animal remained around an odor patch it had given up on, because the give-up strategy reverts to random search. Instead, the animal should leave the current place and only resume search when its far away.

Path of simulated animal after giving up on a food odor.

In the diagram above, the animal remains near the abandoned food odor. The tight circles are the earlier seek before giving up, and the random path afterwards is the continued search. A better strategy would leave the green odor plume and explore other areas of the space.

As a possible circuit, the habenula (Hb.m) projects to the interpeduncular nucleus (M.ip) uses both glutamate and ACh as neurotransmitters, where ACh amplifies neural output. For low signals without ACh, the animal approaches the object, but high signals with ACh switch approach to avoidance. This avoidance switching is managed by the nicotine receptor (each) which is studied for nicotine addiction [Lee et al. 2019].

An interesting future essay might explore using nicotinic aversion to improve foraging by leaving an abandoned odor plume.

References

Bariselli S, Fobbs WC, Creed MC, Kravitz AV. A competitive model for striatal action selection. Brain Res. 2019 Jun 15;1713:70-79.

Butler, Ann. (2008). Evolution of the thalamus: A morphological and functional review. Thalamus & Related Systems. 4. 35 – 58.

Charnov, Eric L. Optimal foraging, the marginal value theorem. Theoretical population biology 9.2 (1976): 129-136.

Cisek P, Hayden BY. Neuroscience needs evolution. Philos Trans R Soc Lond B Biol Sci. 2022 Feb 14;377(1844):20200518.

Headon T, Jones M, Simonon P, Strummer J (1982) Should I Stay or Should I Go. On Combat Rock. CBS Epic.

Humphries MD, Prescott TJ. The ventral basal ganglia, a selection mechanism at the crossroads of space, strategy, and reward. Prog Neurobiol. 2010 Apr;90(4):385-417.

Kacelnik A, Vasconcelos M, Monteiro T, Aw J. 2011. Darwin’s ‘tug-of-war’ vs. starlings’ ‘horse-racing’: how adaptations for sequential encounters drive simultaneous choice. Behav. Ecol. Sociobiol. 65, 547-558.

Lee HW, Yang SH, Kim JY, Kim H. The Role of the Medial Habenula Cholinergic System in Addiction and Emotion-Associated Behaviors. Front Psychiatry. 2019 Feb 28

Owen-Smith N, Fryxell JM, Merrill EH. Foraging theory upscaled: the behavioural ecology of herbivore movement. Philos Trans R Soc Lond B Biol Sci. 2010 Jul 27;365(1550):2267-78. 

Rutledge RB, Lazzaro SC, Lau B, Myers CE, Gluck MA, Glimcher PW. Dopaminergic drugs modulate learning rates and perseveration in Parkinson’s patients in a dynamic foraging task. J Neurosci. 2009 Dec 2

15: Seeking Food: Perseveration and Habituation

Essay 15 is adding food-seeking to the simulated slug. Before the change in essay 14, the slug didn’t seek from a distance, but it does slow when it’s above food to improve feeding efficiency. The slug doesn’t have food-approach behavior, but it does have consummatory behavior. Because the slug doesn’t seek food, it only finds food when it randomly crosses a tile. Most of its movement is random, except for avoiding obstacle.

In the screenshot above, the slug is moving forward with no food senses and no food approach. Its turns are for obstacle avoidance. The food squares have higher visitation because of the slower movement over food. Notice that all areas are visited, although there is a statistical variation because of the obstacle.

Although the world is tiled for simulation simplicity, the slug’s direction, location, and movement is floating-point based. The simulation isn’t an integer world. This continuous model means that timing and turn radius matters.

The turning radius affects behavior in combination with timing, like the movement-persistence circuit in essay 14. The tuning affects the heat map. Some turning choices result in the animal spending more time turning in corners when the dopamine runs out. This turn-radius dependence occurs in animals as well. The larva zebrafish has 13 stereotyped basic movements, and its turns have different stereotyped angles depending on the activity.

Food approach

Food seeking adds complexity to the neural circuit, but it’s still uses a Braitenberg vehicle architecture. (See the discussion on odor tracking for avoided complexity.) Odor sensors stimulate muscles in uncrossed connections to approach the odor. Touch sensors use crossed connections to avoid obstacles. For simultaneous odor and touch, an additional command neuron layer resolves the conflict to favor touch.

Circuit for obstacle avoidance and food approach for simulated slug.

The command neurons correspond to vertebrate reticulospinal neurons (B.rs) in the hindbrain. Interestingly, the zebrafish circuit does seem to have direct connections from the touch sensors to B.rs neurons, exactly as pictured. In contrast, the path from odor receptors to B.rs neurons is a longer, more complicated path.

For the slug’s evolutionary parallel, the odor’s attractively is hardcoded, as if evolution has selected an odor that leads to food. Even single-celled animals follow attractive chemicals and avoid repelling chemicals, and in mammals some odors are hardcoded as attractive or repelling. For now, no learning is occurring.

Perseveration (tunnel vision)

Unfortunately, this simple food circuit has an immediate, possibly fatal, problem. Once the slug detects an odor, it’s stuck moving toward it because our circuit can’t break the attraction. Although the slug never stops, it orbits the food scent, always turning toward it. The hoped-for improvement of following an odor to find food is a disaster.

In psychology, this inability to switch away is called perseveration, which is similar to tunnel vision but more pathological. Once a goal is started, the person can’t break away. In reinforcement learning terminology, the inability to switch is like an animal stuck on exploiting and incapable of breaking away to explore.

In the screenshot, the heat map shows the slug stuck on a single food tile. The graph shows the slug turning counter-clockwise, slowed down (sporadically arrested) over the food.

To solve the problem, one option is to re-enabled satiation for the simulation, as was added in essay 14, but satiation only solves the problem if the tile has enough food to satiate the animal. Unfortunately, the tile might have a lingering odor but no food, or possibly an evolutionary food odor that’s unreliable, only signaling food 25% of the time. Instead, we’ll introduce habituation: the slug will become bored of the odor and start ignoring it for a time.

In the fruit fly the timescale for habituation is about 20 minutes to build up and 20-40 minutes to recover. The exact timing is likely adjustable because of the huge variability in biochemical receptors. So, 20 minutes probably isn’t a hard constant across different animals or circuits for habituation, but more of a range between a few minutes or an hour or so.

Because the minutes to hour timescale for habituation is much wider than the 5ms to 2s range for neurotransmitters, the biochemical implementation is very different. Habituation seems to occur by adding receptors to increase receptor and/or adding more neurotransmitter generators to produce a bigger signal.

Fruit fly odor habituation

[Das 2011] studies the biochemical circuit for fruit fly odor habituation. The following diagram tries to capture the essence of the circuit. The main, unhabituated connection is from the olfactory sensory neuron (ORN) to the projection neuron (PN), which projects to the mushroom body. The main fast neurotransmitter for insects is acetylcholine (ACh), represented by beige.

The key player in the circuit is the NMDA receptor on the PN neuron’s dendrite, which works with the inhibitory LN1 GABA neuron to increase inhibition over time to habituate the odor.

The LN1 neuron drives habituation. Its GABA neurotransmitter release inhibits PN, which reduces the olfactory signal. Because LN1 itself can be inhibited, this circuit allows for a quick reversal of habituation. Habituation adds to simple inhibition by increasing the synapse strength (weight) over time when its used and decreasing the weight when its idle.

An NMDA receptor needs both a chemical stimulus (glutamate and glycine) and a voltage stimulus (post-synaptic activation, PN in this case). When activated, it triggers a long biochemical chain with many genetic variations to change synapse weight. In this case, it triggers a retrograde neurotransmitter (such as nitrous oxide, NO) to the pre-synaptic LN1 axon, directing it to add new GABA release vesicles. Because adding new vesicles takes time (20 minutes) and removing the vesicles also takes time (20-40 minutes), habituation adds longer, useful behavior that will help solve the odor perseveration problem.

Slug with habituation

The next slug simulation adds trivial habituation following the example of the fruit fly. The simulation adds a value when the slug senses an odor and decrements the value when the slug doesn’t detect an odor. When habituation crosses a threshold, the odor sensor is cut off from the command neurons. The behavior and heat map look like the following:

As the heat map shows, habituation does solve the fatal problem of perseveration for food approach. The slug visits all the food nodes without having any explicit directive to visit multiple nodes. Adding a simple habituation circuit, creates behavior that looks like an explore vs exploit pattern without the simulation being designed around reinforcement learning. Explore vs exploit emerges naturally from the problem itself.

In the screenshot, the bright tile has no intrinsic meaning. Because habituation increases near any food tile, it can only decay when away from food. That bright tile is near a big gap that lets habituation to drop and recharge.

The code looks like the following:

impl Habituation {
  fn update_habituation(&mut self, is_food_sensor: bool) {
    if is_food {
      self.food = (self.food + Self::INC).min(1.);
    } else {
      self.food = (self.food - Self::DEC).max(0.);
    }
  }

  fn is_active(&self) -> bool {
    self.food < Self::THRESHOLD
  }
}

Discussion

The essays aren’t designed as solutions; they’re designed as thought experiments. So, their main value is generally the unexpected implementation details or issues that come up in the simulation. For example, how habituation thresholds and timing affect food approach. The bright tile in the last heat map occurs because that food source is isolated from other food sources.

If the slug passes near food sources, the odor will continually recharge habituation and it won’t decay enough to re-enable chemotaxis. The slug needs to be away from food for some time for odor tracking to re-enable. In theory, this behavior could be a problem for an animal. Suppose the odor range is very large and the animal is very slow, like a slug. If simple habituation occurs, the slug might habituate to the odor before it reaches the food, making it give up too soon.

As a possible solution, the LN1 inhibitory neuron that implements habituation could itself be disabled, although that beings back the issue of the animal getting stuck. But perhaps it would instead be diminished instead of being cut off, giving the animal persistence without devolving into perseveration.

Odor vs actual food

Another potential issue is the detection of food itself as opposed to just its odor. If the food tile has food, the animal should have more patience than if the tile is empty with just the food odor. That scenario might explain why other habituation circuits include serotonin or dopamine as modulators. If food is actually present, there should be less habituation.

The issue of precise values raises calibration as an issue, because evolution likely can’t precisely calibrate cutoff values or calibrate one neuron to another. Some of the habituation and related synapse adjustment may simply be calibration, adjusting neuron amplification to the system. In a sense, that calibration would be learning but perhaps atypical learning.

References

Das, Sudeshna, et al. “Plasticity of local GABAergic interneurons drives olfactory habituation.” Proceedings of the National Academy of Sciences 108.36 (2011): E646-E654.

Shen Y, Dasgupta S, Navlakha S. Habituation as a neural algorithm for online odor discrimination. Proc Natl Acad Sci U S A. 2020 Jun 2;117(22):12402-12410. doi: 10.1073/pnas.1915252117. Epub 2020 May 19. PMID: 32430320; PMCID: PMC7275754.

Powered by WordPress & Theme by Anders Norén