Attempting a toy model of vertebrate understanding

Tag: foraging

Essay 33: Klinotaxis

When seeking an odor, vertebrate swimming undulates left and right, naturally moving the nose perpendicular to the body motion. This lateral motion can help navigation if odor sampling can be coordinated with the movement, enabling a spatiotemporal gradient calculation along the path of the nose movement. This lateral sampling over time is called klinotaxis (“leaning navigation”) or weathervaning.

Essay 24 and essay 25 explored head-direction navigation as inspired by the fruit fly Drosophila fan-shaped body and ellipsoid body. The idea was to use head direction to translate egocentric movement into an allocentric memory of past samples, independent of the current body direction. In contrast, klinotaxis uses an egocentric system, where the lateral motion is relative to the current direction, not an independent, compass or map-like system.

Klinotaxis in Drosophila larva and C. elegans

Klinotaxis has been largely studied in the fruit fly Drosophila larva and the roundworm C. elegans. Drosophila larva have a distinct “cast” movement, where they pause and wave their heads side to side, either a single time (1-cast) or multiple times (n-cast) [Zhao et al 2017]. Larva movements break down into five major types [Gomez-Marin and Louis 2014]:

  • Forward
  • Backward
  • Stop
  • Turn
  • Cast

C. elegans has two major seek movements: pirouettes and weathervaning [Lockery 2011]. Pirouettes are a u-turn when the animal is moving away from the odor. Weathervaning is a side-to-side head movement that manages turning.

Both systems are temporal gradient systems, requiring measurements at different times and a memory of the older measurement [Chen X and Engert 2014]. Klinotaxis requires a basic form of memory [Karpenko et al 2020], but the comparison can be a simple ON or OFF result [Lockery 2011]. Pirouetts use a gradient parallel to body motion and reverse direction when the animal is moving away from the odor [Iino and Yoshida 2009]. Weathervaning uses a gradient perpendicular to body motion, measured with a lateral head movement [Lockery 2011].

This klinotaxis contrasts with a bilateral spatial navigation that compares two lateral sensors [Chen X and Engert 2014], such as bilateral eyes, ears, or nostrils. In Drosophila larva, odor turning is proportional to the lateral gradient more than the parallel gradient [Martinez 2014]. The odor navigation is not simply bilateral because disabling one side of O.sn (olfactory sensory neuron) only minimally impairs navigation [Gomez-Marin and Louis 2014].

As a slight digression, let’s return to the adult Drosophila navigation, because the structure can be a useful analogy for understanding vertebrate klinotaxis navigation, despite using a different allocentric system.

Adult Drosophila FSB

Below is a rough sketch of the Drosophila navigation circuit, focused on the fan-shaped body [Hulse et al 2021]. The ellipsoid body (EB) and protocerebral bridge (PB) calculate head direction and sort it into 18 columns. This head direction is allocentric, independent of the animal’s current direction, like a compass direction or a map. Input from odor areas like the mushroom body (MB) and lateral horn (LN) are organized into 9 rows. The fan-shaped body combines these 18 head direction columns and 9 sense data rows into a memory table.

Drosophila navigation
Drosophila navigation, focusing on head direction from PB, odor data from MB and LH, and allocentric table of FB. EB ellipsoid body, FB fan-shaped body, LH lateral horn, MB mushroom body.

Motor navigation reads out from the fan-shaped-body table. These motor commands include left and right, but also include a separate u-turn command [Westeinde et al. 2022]. Although this allocentric navigation system differs from egocentric klinotaxis, its motor output includes both the left vs right from weathervaning and the u-turn from pirouette.

The previous essay 24 and essay 25 attempts followed this model. As the animal moves in space, the model saved the forward odor gradient according to the current head direction. By comparing stored values for other head directions, the animal would improve its heading toward the direction with the strongest odor.

The fan-shaped body then becomes a record of samples of all the older directions that the animal had measured. Output is then calculated for left (PFL3L), right (PFL3R), and u-turn (PFL2) signals. [Westeinde et al 2024]. The current head direction is represented as a sinusoidal neural pattern and combined with the stored values to produce an output.

This system was only partially successful for the essay. Although it was an improvement over no memory, because the animal was continually moving in space, the table was always obsolete. Even when the table memory times out to represent loss in accuracy as the animal moves, the rapid obsolescence made navigation difficult, particularly as the animal neared the target.

So, this essay simplifies the circuit and lowers the ambition. Instead of trying to record every direction and keeping perfect allocentric compass direction, the animal could simple save its left and right oscillation as it swims naturally.

Vertebrate Hb.m and R.ip

The vertebrate Hb.m (medial habenula) to R.ip (interpeduncular nucleus) is used for phototaxis [Chen X and Engert 2014], Chemotaxis [Chen WY et al 2019] and thermotaxis [Palieri et al 2024]. In a clever experiment creating a virtual light circle, Chen and Engert shows that the zebrafish phototaxis is not simply comparing light between the eyes for a spatial gradient (tropotaxis) but is a temporally-based gradient (klinotaxis), relying on a short term memory of the previous light. This phototaxis uses the Hb.m to R.ip circuit [Chen X and Engert 2014].

Vertebrate olfactory klinotaxis circuit. Ob (olfactory bulb), Hb.m (medial habenula), P.ldt (laterodorsal tegmental nucleus), R.dtg (dorsal tegmental nucleus of Gudden), R.ip (interpeduncular nucleus), R.rs (reticulospinal), V.mr (median raphe)

Head direction from R.dgt (dorsal tegmental nucleus) tiles R.ip vertically [Petrucco et al 2023], while olfactory and light input is organized horizontally [Chen WY et al 2019], [Zaupa et al 2021]. After combining the odor with the head direction and comparing with the stored values, it sends motor commands to R.rs (reticulospinal) using P.ldt (laterodorsal tegmental nucleus) and V.mr (median raphe). The vertebrate R.ip has 6 columns of head direction input from R.dtg, resembling the Drosophila fan-shaped body, but instead of 18 columns for the fan-shaped body, R.ip only has 6, three to a side [Petrucco et al 2023].

Essay 25 explored a model which used the Drosophila fan-shaped body allocentric navigation in R.ip with some limited but not overwhelming success. Instead, this essay will try a different interpretation, where R.ip is only storing side to side weathervaning of the head while swimming, instead of a full 360 degree table like Drosophila.

Vertebrate klinotaxis

As a different approach, suppose the head direction to R.ip is not an allocentric map-making coordinator as in the adult Drosophila, but a simpler egocentric weathervaning or casting coordinator, storing only the lateral gradient from head direction changes from natural swimming, or possibly deliberate larger turns like casting to gather wider lateral gradient information.

Klinotaxis simplifies the need for precise head direction. Instead of the Drosophila 18 head direction columns calibrated to the outside world, we use only three, two lateral and one central, that only require motor efference copies of left and right muscle turns. Studies from the zebrafish R.ip suggest three columns to a side, which isn’t connected to the vestibular system [Petrucco et al 2023]. To me, this suggests to me that the head direction might not be an allocentric signal that requires precise direction, but a simple egocentric lateral measurement, which doesn’t need vestibular information.

Vertebrate thigmotaxis circuit. Hb.m (medial habenula), Ob (olfactory bulb), R.dtg (dorsal tegmental nucleus), R.ip (interpeduncular nucleus).

The above diagram illustrates the system. Olfactory samples arrive through Hb.mand head direction arrives from R.dtg. Like the Drosophila fan-shaped body, R.ip combines odor samples with lateral head movement into a simple memory table, and it reads out left and right motor commands. A similar system can save odor measurements parallel to body movement, using velocity instead of head direction, to trigger a u-turn when the animal is moving away from the odor.

Discussion

Compared to the parallel-only gradient, allocentric system of essay 25, this lateral navigation is far simpler and more effective. Even with only three bins compared to the 8 bins in essay 25, the lateral weathervaning turned out to be more effective and less brittle. If R.ip does implement a lateral klinotaxis system like this essay, it’s plausible that the 6 directions reported by [Westeinde et al 2024] are sufficient for accurate seek navigation. In contract, those 6 directions seem insufficient for an allocentric navigation compared to the Drosophila 18 directions.

Interestingly, the pirouette also highly effective, even without lateral klinotaxis. In the simulation, when the animal moved away from the odor source, it makes a u-turn. This system served to ratchet the animal closer and closer to the target. Even when most of the movement was random, the pirouette locks in any improvement. Pirouette itself is also simple, only requiring two averages: a short average and a long average, where a short average tracks the odor across a single swim cycle and a long average uses two swim cycles. When the short average has a stronger odor value than the long average, the animal is moving toward the odor.

In both cases, the simulation used a binary OFF for the motor command instead of attempting finer precision from the gradient. This simple OFF strategy was sufficient for the simulation. A C. elegans study suggested that ON-OFF coding was energy efficient, and the worm rarely orients perfectly to the gradient [Lockery 2011].

References

Chen WY, Peng XL, Deng QS, Chen MJ, Du JL, Zhang BB. Role of Olfactorily Responsive Neurons in the Right Dorsal Habenula-Ventral Interpeduncular Nucleus Pathway in Food-Seeking Behaviors of Larval Zebrafish. Neuroscience. 2019 Apr 15;404:259-267. 

Chen X, Engert F. Navigational strategies underlying phototaxis in larval zebrafish. Front Syst Neurosci. 2014 Mar 25;8:39.

Gomez-Marin A., Louis M. (2014). Multilevel control of run orientation in Drosophila larval chemotaxis. Front. Behav. Neurosci. 8:38 10.3389/fnbeh.2014.00038.

Hulse, B. K., Haberkern, H., Franconville, R., Turner-Evans, D., Takemura, S. Y., Wolff, T., … & Jayaraman, V. (2021). A connectome of the Drosophila central complex reveals network motifs suitable for flexible navigation and context-dependent action selection. Elife, 10.

Iino Y, Yoshida K. Parallel use of two behavioral mechanisms for chemotaxis in Caenorhabditis elegans. J Neurosci. 2009 Apr 29;29(17):5370-80. 

Karpenko S, Wolf S, Lafaye J, Le Goc G, Panier T, Bormuth V, Candelier R, Debrégeas G. From behavior to circuit modeling of light-seeking navigation in zebrafish larvae. Elife. 2020 Jan 2;9:e52882. 

Lockery SR. The computational worm: spatial orientation and its neuronal basis in C. elegans. Curr Opin Neurobiol. 2011 Oct;21(5):782-90. 

Martinez D. Klinotaxis as a basic form of navigation. Front Behav Neurosci. 2014 Aug 14;8:275. 

Palieri V, Paoli E, Wu YK, Haesemeyer M, Grunwald Kadow IC, Portugues R. The preoptic area and dorsal habenula jointly support homeostatic navigation in larval zebrafish. Curr Biol. 2024 Feb 5;34(3):489-504.e7.

Petrucco L, Lavian H, Wu YK, Svara F, Štih V, Portugues R. Neural dynamics and architecture of the heading direction circuit in zebrafish. Nat Neurosci. 2023 May;26(5):765-773. 

Westeinde EA, Kellogg E, Dawson PM, Lu J, Hamburg L, Midler B, Druckmann S, Wilson RI. Transforming a head direction signal into a goal-oriented steering command. Nature. 2024 Feb;626(8000):819-826. 

Zaupa M, Naini SMA, Younes MA, Bullier E, Duboué ER, Le Corronc H, Soula H, Wolf S, Candelier R, Legendre P, Halpern ME, Mangin JM, Hong E. Trans-inhibition of axon terminals underlies competition in the habenulo-interpeduncular pathway. Curr Biol. 2021 Nov 8;31(21):4762-4772.e5. 

Zhao W, Gong C, Ouyang Z, Wang P, Wang J, Zhou P, Zheng N, Gong Z. Turns with multiple and single head cast mediate Drosophila larval light avoidance. PLoS One. 2017 Jul 11;12(7):e0181193. 

Essay 31: Striatum as Timeout

Let’s return to the task of essay 16 on give-up time in foraging, which covered food search with a timeout. At first the animal uses a general roaming search and if it smells a food odor, it switches to a targeted seek following the odor with chemotaxis. If the animal finds food in the odor plume, it eats the food, but if it doesn’t find food, it will eventually give up and avoid the local area before returning to the roaming search.

Search state machine. Roam is the starting state, switching to seek when it detects odor, and switching to avoid after a timeout.

For another attempt at the problem, let’s take the striatum (basal ganglia) as implementing the timeout portion of this task using the neurotransmitter adenosine as a timeout signal and incorporating the multiple action path discussion from essay 30 on RTPA. Adenosine is a byproduct of ATP breakdown and is a measure of cellular activity. With sufficiently high adenosine, the striatum switches from the active seek path to an avoidance path. These circuits are where caffeine works to suppress the adenosine timeout, allowing for longer concentration.

Mollusk navigation

As mentioned in essay 30, the mollusk sea slug has a food search circuit with a similar logic to what we need here. The animal seeks food odors when it’s hungry, but it avoids food odors when it’s not hungry [Gillette and Brown 2015].

Mollusk food search circuit, modulated by hunger.
Mollusk food search circuit, illustrating a hunger-modulated switchboard. When the animal is not hungry, the switchboard reverses the odor to motor links turning it away from food.

This essay uses the same idea but replaces the hunger modulation with a timeout. When the timeout occurs, the circuit switches from a food seek action path to a food avoid action path.

Odor action paths

Two odor-following actions paths exist in the lamprey, one using Hb.m (medial habenula) and one using V.pt (posterior tuberculum). The Hb.m path is a chemotaxis path following a temporal gradient. The V.pt path projects to MLR (midbrain locomotor region), but The lamprey Ob.m (medial olfactory bulb) projects to both Hb.m (medial habenula) and to V.pt (posterior tuberculum), which each project to different locomotor paths [Derjean et all 2010], Hb.m to R.ip (interpeduncular nucleus) and V.pt to MLR (midbrain locomotor region). The zebrafish also has Ob projections to Hb and V.pt [Imamura et al 2020], [Kermen et al 2013].

Dual odor-seeking action paths in the lamprey and zebrafish. Hb (habenula), Ob.m (medial olfactory bulb), V.pt (posterior tectum).

Further complicating the paths, the Hb.m itself contains both an odor seeking path and an odor avoiding path [Beretta et al 2012], [Chen et al 2019]. Similarly Hb.m has dual action paths for social winning and losing [Okamoto et al 2021]. So, this essay could use the dual paths in Ob.m instead of contrasting Ob.m with V.pt, but the larger contract should make the simulation easier to follow.

This essay’s simulation makes some important simplifications. The Hb to R.ip path is a temporal gradient path used for chemotaxis, phototaxis and thermotaxis. In a real-world marine environment, odor diffusion and water turbulence is much more complicated, producing more clumps and making a simple gradient ascent more difficult [Hengenius et al 2012]. Because this essay is only focused on the switchboard effect, this simplification should be fine.

Striatum action paths with adenosine timeout

The timeout circuit uses the striatum, which has two paths: one selecting the main action, and the second either stopping the action, or selecting an opposing action [Zhai et al 2023]. The two paths are distinguished by their responsiveness to dopamine with S.d1 (striatal projection with D1 G-s stimulating) or S.d2 (striatal projection with D2 G-i inhibiting) marking the active and alternate paths respectively. This model is a simplification of the mammalian striatum where the two paths interact in a more complicated fashion [Cui et al 2013].

Essay odor seek with timeout circuit. The seek path flows from Ob, through S.d1 to P.v to V.pt. The avoid path flows from Obj, though S.d2 to Pv. to Hb. Ad (adenosine), Hb (habenula), Ob (olfactory bulb), Pv (ventral pallidum), S.d1 (striatum D1 projection neuron), S.d2 (striatum D2 projection neuron), V.pt (posterior tuberculum)

As mentioned, the two actions paths are the seek path from Ob to V.pt and the avoid path from Ob to Hb. For the timeout and switchboard, the Ob has a secondary projection to the striatum. Although this circuit is meant as a proto-vertebrate simplification, Ob does project to S.ot (olfactory tubercle) and to the equivalent in zebrafish [Kermen et al 2013].

The timeout is managed by adenosine, which is a neurotransmitter derived from ATP and a measure of neural activity. The striatum has three sub-circuits for this kind of functionality, which I’ll cover in order of complexity.

S.d1 and adenosine inhibition

The first circuit only uses the direct S.d1 path and adenosine as a timeout mechanism. When the animal follows an odor, the Ob to S.d1 signal enables the seek action. As a timeout, ATP from neural activity degrades to adenosine and the buildup of adenosine is a decent measure of activity over time. The longer the animal seeks, the more adenosine builds up. Of the Ob projection axis contains an A1i (adenosine G-i inhibitory) receptor, the adenosine will inhibit the release of glutamate from Ob, which will eventually self-disable the seek action.

S.d1 action path inhibited by adenosine buildup as a timeout. A1i (adenosine G-i inhibitory receptor), Ad (adenosine), mGlu5q (metabotropic glutamate G-q receptor), Ob (olfactory bulb), S.d1 (D1-type striatal projection neuron)

In practice, the striatum uses astrocytes to manage the glutamate release. An astrocyte that envelops the synapse measures glutamate release with an mGlu5q (metabotropic glutamate with G-q/11 binding) receptor and accumulates internal calcium [Cavaccini et al 2020]. The astrocyte’s calcium triggers an adenosine release as a gliotransmitter, making the adenosine level a timeout measure of glutamate activity. The presynaptic A1i receptor then inhibits the Ob signal. The timeframe is on the order of 5 to 20 minutes with a recovery of about 60 minutes, although the precise timing is probably variable. Interestingly, the time-out is a log function instead of linear measure of activity [Ma et al 2022].

This circuit doesn’t depend on the postsynaptic S.d1 firing [Cavaccini et al 2020], which contrasts with the next LTD (long term depression) circuit which only inhibits the axon if the S.d1 projection neuron fires.

S.d1 presynaptic LTD using eCB

S.d1 self-activating LTD uses retrotransmission to inhibit its own input using eCB (endocannabiniods) as a neurotransmitter. Like the astrocyte in the previous circuit, S.d1 uses a mGlu5q receptor to trigger eCB release, but also require that S.d1 fire, as triggered by NMDA glutamate receptor. The axon receives the eCB retrotransmission with a CB1i (cannabinoid G-i inhibitory) receptor and trigger presynaptic LTD [Shen et al 2008], [Wu et al 2015]. Like the previous circuit, the timeframe seems to be on the order of 10 minutes, lasting for 30 to 60 minutes.

S.d1 LTD circuit. A coincidence of glutamate detection with mGlu5q and S.d1 activation with NMDA triggers eCB release, which activates CB1i leading to presynaptic LTD. CB1i (cannabinoid G-i inhibitory receptor), mGlu5q (glutamate G-q receptor), Ob (olfactory bulb), S.d1 (striatum D1-type projection neuron).

This circuit inhibits itself over time without using adenosine or astrocytes. In the full striatum circuit, high dopamine levels suppress this LTD suppression, meaning that dopamine inhibits the timeout [Shen et al 2008].

The next circuit adds the S.d2 path, which uses adenosine and self-activity to trigger postsynaptic LTD.

S.d2 postsynaptic LTP via A2a.s

Consider a third circuit that has the benefits of both previous circuits because it uses adenosine as a timer managed by astrocytes and is also specific to postsynaptic activity. In addition, it allows for a second action path, changing the circuit from a Go/NoGo system to a Go/Avoid action pair. This circuit uses LTP (long term potentiation) on the S.d2 striatum neurons.

Timeout circuit using postsynaptic LTD at the S.d2 neuron and adenosine as a timeout signal. As adenosine accumulates, it stimulates S.d2, which both disables S.d1 and drives the avoid path. A2a.s (adenosine G-s stimulatory receptor), Ad (adenosine), mGlu5q (glutamate G-q metabotropic receptor), Ob (olfactory bulb), S.d1 (striatum D1-type projection neuron), S.d2 (striatum D2-type projection neuron)

When the odor first arrives, Ob activates the S.d1 path, seeking toward the odor. S.d1 is activated instead of S.d2 because of dopamine. In this simple model, the Ob itself could provide the initial dopamine like c. elegans odor-detecting neurons or the tunicate’s coronal cells or the dual glutamate and dopamine neurons in Vta (ventral tegmental area).

As time goes on, adenosine from the astrocyte builds up, which activates the S.d2 A2s.a (adenosine G-s stimulatory receptor) until it overcomes dopamine suppression and increases the S.d2 activity with LTP [Shen et al 2008]. Once S.d2 activates, it suppresses S.d1 [Chen et al 2023] and drives the avoid path.

The combination of these circuits looks like it’s precisely what the essay needs.

Simulation

In the simulation, when the animal is hunting food and finds a food odor plume, it directly seeks toward the center and eats if it find food. In the screenshot below, the animal is eating.

Simulation showing the animal eating food after seeking the odor plume.

Satiation disables the food seek. This might sound obvious, but hunger gating of food seeking requires specific satiety circuits to any seek path that’s food specific, which means the involvement of H.l (lateral hypothalamus) and related areas like H.arc (arcuate hypothalamus) and H.pv (periventricular hypothalamus). And, of course, the simulation requires simulation code to only enable food odor seek when the animal is searching for food.

The next screenshot shows the central problem of the essay, when the animal seeks a food odor but there’s no food at the center.

Screenshot showing the animal stuck in the middle of the food odor plume before the timeout.

Without a timeout, the animal circles the center of the food odor plume endlessly. After a timeout, the animal actively leaves the plume and avoid that specific odor until the timeout decays.

Screenshot showing the animal escaping from the odor plume after the timeout.

This system is somewhat complex because of the need for hysteresis. A too-simple solution with a single threshold can oscillate, because as soon as the animal starts leaving the timeout decays, which then re-enables the food-seek, which then quickly times out, repeating. Instead, the system needs to make re-enabling of the food seek more difficult after a timeout.

But that adds a secondary issue because if food seek is a lower threshold, then the sustain of seek needs to raise the threshold while the seek occurs. So, the sustain of seek needs a lower threshold than starting seek. This hysteresis and seek sustain presumably needs to be handled by the actual striatum circuit.

Discussion

I think this essay shows that using the stratum for an action timeout for food seek is a plausible application. The circuit is relatively simple and is effective, improving search by avoiding failed areas.

However, the simulation does raise some issues, particularly hysteresis problem. If the striatum does provide a timeout along these lines, it must somehow solve the hysteresis problem. While the animal is seeking, the ongoing LTP/LTD inhibition should use a high threshold to stop seeking, but once avoidance starts, there needs to be a high threshold to return to seeking to avoid oscillations between the two action paths.

Because LTD/LTP is a relatively long chemical process (minutes) internal to the neurons, as opposed to an instant switch in the simulation, the delay itself might be sufficient to solve the oscillation problem. It’s also possible that some of the more complicated parts of the circuit, such as P.ge (globus pallidus) and its feedback to the striatum or H.stn (subthalamic nucleus) might affect the sustain of seek or breaking it and so control the hysteresis problem.

The simulation also reinforced the absolute requirement that action paths need to be modulated by internal state like hunger. For the seek paths, both Hb.m and V.pt are heavily modulated by H.l and other hypothalamic hunger and satiety signals.

As expected, the simulation also illustrated the need for context information separate from the target odor. While the food odor is timed out, the animal can’t search the other odor plume because this essay’s animal can’t distinguish between the odor plumes, and therefore avoids both odors. With a long timeout and many odor plumes, this delays the food search. A future enhancement is to add context to the timeout. If the animal can timeout a specific odor plume, it can search alternatives even if the food odor itself is identical.

References

Beretta CA, Dross N, Guiterrez-Triana JA, Ryu S, Carl M. Habenula circuit development: past, present, and future. Front Neurosci. 2012 Apr 23;6:51. 

Cavaccini A, Durkee C, Kofuji P, Tonini R, Araque A. Astrocyte Signaling Gates Long-Term Depression at Corticostriatal Synapses of the Direct Pathway. J Neurosci. 2020 Jul 22;40(30):5757-5768. 

Chen JF, Choi DS, Cunha RA. Striatopallidal adenosine A2A receptor modulation of goal-directed behavior: Homeostatic control with cognitive flexibility. Neuropharmacology. 2023 Mar 15;226:109421. 

Chen WY, Peng XL, Deng QS, Chen MJ, Du JL, Zhang BB. Role of Olfactorily Responsive Neurons in the Right Dorsal Habenula-Ventral Interpeduncular Nucleus Pathway in Food-Seeking Behaviors of Larval Zebrafish. Neuroscience. 2019 Apr 15;404:259-267. 

Cui G, Jun SB, Jin X, Pham MD, Vogel SS, Lovinger DM, Costa RM. Concurrent activation of striatal direct and indirect pathways during action initiation. Nature. 2013 Feb 14;494(7436):238-42.

Derjean D, Moussaddy A, Atallah E, St-Pierre M, Auclair F, Chang S, Ren X, Zielinski B, Dubuc R. A novel neural substrate for the transformation of olfactory inputs into motor output. PLoS Biol. 2010 Dec 21;8(12):e1000567. 

Gillette R, Brown JW. The Sea Slug, Pleurobranchaea californica: A Signpost Species in the Evolution of Complex Nervous Systems and Behavior. Integr Comp Biol. 2015 Dec;55(6):1058-69. 

Hengenius JB, Connor EG, Crimaldi JP, Urban NN, Ermentrout GB. Olfactory navigation in the real world: Simple local search strategies for turbulent environments. J Theor Biol. 2021 May 7;516:110607.

Imamura F, Ito A, LaFever BJ. Subpopulations of Projection Neurons in the Olfactory Bulb. Front Neural Circuits. 2020 Aug 28;14:561822. 

Kermen F, Franco LM, Wyatt C, Yaksi E. Neural circuits mediating olfactory-driven behavior in fish. Front Neural Circuits. 2013 Apr 11;7:62.

Ma L, Day-Cooney J, Benavides OJ, Muniak MA, Qin M, Ding JB, Mao T, Zhong H. Locomotion activates PKA through dopamine and adenosine in striatal neurons. Nature. 2022 Nov;611(7937):762-768.

Okamoto H, Cherng BW, Nakajo H, Chou MY, Kinoshita M. Habenula as the experience-dependent controlling switchboard of behavior and attention in social conflict and learning. Curr Opin Neurobiol. 2021 Jun;68:36-43. 

Shen W, Flajolet M, Greengard P, Surmeier DJ. Dichotomous dopaminergic control of striatal synaptic plasticity. Science. 2008 Aug 8;321(5890):848-51. 

Wu YW, Kim JI, Tawfik VL, Lalchandani RR, Scherrer G, Ding JB. Input- and cell-type-specific endocannabinoid-dependent LTD in the striatum. Cell Rep. 2015 Jan 6;10(1):75-87. 

Zhai S, Cui Q, Simmons DV, Surmeier DJ. Distributed dopaminergic signaling in the basal ganglia and its relationship to motor disability in Parkinson’s disease. Curr Opin Neurobiol. 2023 Dec;83:102798.

Essay 26: Ignoring distracting odors

I’ve been ignoring distracting cues in the previous essays for simplification. Since the simulated animal only encountered a single odor at a time, it never needed to select one and ignore the other. In essay 26, I’ll implement a very simple first approximation to ignoring distractors, using the P.bf (basal forebrain) control of the Ob (olfactory bulb) as a switchboard to let the selected odor through and inhibit the ignored distractor.

Simulated animal (triangle) encountering two odor plumes (circles).

In the diagram above, the animal (triangle) is seeking food using the purple odor cue as a gradient direction. When it encounters the distractor odor in blue, it should ignore the distractor, otherwise the two odors will mingle into an incorrect summed gradient and the animal will seek in the wrong direction [Cisek 2022].

Temporal chemotaxis

For essay 26, I’m switching chemotaxis (odor seeking) to use the apical temporal gradient search, using Hb.m (medial habenula) and B.ip (interpeduncular nucleus) like the phototaxis in essay 24. The apical system follows the chimera brain model of [Tosches and Arendt 2013], which suggests that odor senses and actions are distinct systems from bilateral tactile senses. For the essays, the shift is from a bilateral, Braitenberg-like [Braitenberg 1984] system to a modulated random walk like the bacterial tumble-and-run.

Olfactory tumble-and-run system using Hb.m and B.ip for temporal gradient direction, and B.rs for the modulated random walk. B.ip interpeduncular nucleus, B.rs hindbrain reticulospinal motor area, Hb.m medial habenula, Ob olfactory bulb.

The above diagram shows the problem with distractor odors. Because the tumble-and-run system uses a single temporal gradient, it necessarily adds both odors together for its input. The summed input goes to the Hb.m (medial habenula) and B.ip (interpeduncular nucleus) system to modulate the random walk direction.

When the animal crosses into the overlapping distractor odor, it will follow the combined signal, distracted from the original seek target. To avoid distraction, the system can either amplify the current odor A, or inhibit the distractors like odor B.

Analogy with nucleus isthmi

An earlier essay 19 also had an attention / distractor problem, with a different issue of action consistency, and used a zebrafish circuit in P.ni (nucleus isthmi) as a solution. In larval zebrafish P.ni works together with OT (optic tectum) to sustain attention on prey during a hunt [Henriques et al 2019]. P.ni is an ACh (acetylcholine neurotransmitter) and GABA (inhibiting neurotransmitter) system that both amplifies the predicted prey location and inhibits surrounding areas.

Nucleus isthmi circuit as adapted by essay 19. ACh acetylcholine, OT optic tectum, Pni nucleus isthmi.

In the above diagram for the essay 19 circuit, a simultaneous left and right touch would select one action at random and sustain that choice for subsequent movement with the P.ni positive feedback circuit. The outputs are crossed because it’s an avoidance circuit: an obstacle on the left triggers a right turn.

Importantly, the positive feedback is modulatory; it doesn’t trigger an action by itself. At a synapse level, ACh triggers mAChR (ACh metabotropic receptor, Gs stimulatory type) on the sensor axon, amplifying the sensor’s neurotransmitter release. The ACh and mAChR act as the decay timer, because they have a slow time constant on the order of a few seconds. If the sensor doesn’t stimulate the circuit, as when successfully avoiding the obstacle, the attention will decay over a few seconds, resetting the system to its original state.

A similar function applies to Ob and P.bf (basal forebrain), where P.bf acts like P.ni to sustain attention to the selected odor. “Basal forebrain” is a general name for a collection of functionally-related subcortical areas in the ventral (“basal”) forebrain, all pallidal-like (P). The specific P areas for the Ob are P.hdb (horizontal diagonal band) and Po.me (magnocellular preoptic area), but I’ll use P.bf for simplicity.

Olfactory bulb as a switchboard

In this model, Ob acts like a switchboard controlled by P.bf. P.bf selects attended odor paths in Ob, where Ob either passes the odor signal to its destination or inhibits the signal if it’s a distractor. P.bf opens and closes gated circuits in Ob.

Although the architecture of the Ob and P.bf circuit resembles the P.ni circuit, Ob appears to rely more heavily on inhibitory GABA for the gating operation, although ACh is also important [Böhm et al 2020], [de Saint Jan et al 2020], [Nunez-Parra et al 2000]. Since this essay is a first cut, simplified model, I’m using a single signal that represents a gating attention / inhibition signal, and glossing over the ACh vs GABA distinction.

Olfactory bulb switchboard using basal forebrain to gate selected odors. B.ip interpeduncular nucleus, B.rs reticulospinal motor, Hb.m medial habenula, Omt mitral/tufted output cells, Osn olfactory sensor neurons, P.bf basal forebrain.

In the above diagram where the switchboard selects odor A and inhibits odor B, the apical seek circuit receives only odor A’s signal. P.bf gates odors from Osn (olfactory sensory neurons) to Omt (mitral/tufted output cells), which then add to form a single signal for the temporal gradient tumble-and-run seek. For simplicity, I’ve shown the P.bf ACh and GABA signal as a simple gating control.

Once the system detects odor A, P.bf configures the switchboard to pass through A and inhibit other odors, locking out the distractor. Because the selecting signals are modulators, they don’t drive a signal until an odor signal arrives. Like the P.ni circuit, attention will timeout as ACh and its slow mACh receptor decay. When the animal leaves the odor plume, the system resets because the absence of odor A collapses the feedback loop.

Although the essay’s switchboard is an improvement over the naive summation of odor signals, it’s still quite limited. There’s no active selection of a best odor, and the system can’t switch to a better odor cue. Also, since the global give-up circuit isn’t integrated with P.bf, giving up on odor A can’t select odor B. Instead the animal must leave the plume and reset the system.

Slightly more complete Ob switchboard

The Ob is a surprisingly complex system; it’s not just a simple odor system. In addition to the P.bf, Opir (olfactory piriform cortex) also modulates the Ob system, and Ob itself has lateral inhibition between Omt (mitral cell output), which is plastic, learning to discriminate odors itself, as well as modulatory input from the serotonin and noradrenaline system.

In the real Ob, many Osn for the same odor feed into a single Ogl (olfactory glomeruli), which provides input to several Omt, all representing the same odor. Each odor feature has its own Ogl system, several hundred in mammals (two in the essay simulation). Ogl is where the neuropil of the Osn axons meet the Omt dendrites, in a fan-in to fan-out system. Also, each Ogl has many inhibitory Opg (periglomerular inhibitors) with multiple variations, and each Omt has several inhibitory Ogc (olfactory granule cells). The basic fan-in and fan-out structure looks like the following diagram.

Olfactory bulb glomerule fan-in and fan-out system. Bip interpeduncular nucleus, B.rs reticulospinal motor, Hb.m medial habenula, Ogc olfactory granule cell inhibitor, Ogl olfactory glomerule, Omt olfactory mitral/tufted output, Opg olfactory periglomerular inhibitor, Osn olfactory sensor neuron.

The switchboard diagram below focuses on the ACh and GABA control from P.bf. It combines multiple Osn, Opg, Omt and Ogc into single items.

Partial olfactory bulb switchboard circuit. B.ip interpeduncular nucleus, B.rs reticulospinal motor, Hb.m medial habenula, Ogc olfactory granule cell, Ogl olfactory glomeruli, Omt olfactory mitral/tufted output, Opg olfactory periglomerular inhibitor, Opir olfactory piriform cortex, Osn olfactory sensory neuron.

To break down the diagram, the core of the switchboard circuit is the Osn to Ogl to Omt to output path; everything else is gating to select or inhibit the signal.

Odor gating happens in two locations: modulating Omt’s input dendrite tree in Ogl by Opg and modulating Omt’s output by Ogc (olfactory granular cell). Because each Omt’s input Ogl is shared for several Omt, the Opg inhibition likely affects many or all Omt for a single Ogl. In contrast, the Ogc inhibition is individual, and the Omt and Ogc circuit creates and manages gamma oscillations, which amplifies and reduces noise from the signal.

Although I’m not planning on touching cortical areas for many essays, the Opir (olfactory piriform cortex) modules the Ob switchboard in a similar circuit as B.pf with some difference. Since the Opir input to the many Ogc and many Ogl is not odor selective [Boyd et al 2015], Ogc must learn the meaning of the Opir input through plasticity.

Global give-up circuit

The essay’s task engagement and give-up circuit currently uses H.l (lateral hypothalamus) and Hb.l (lateral habenula) with V.dr (dorsal raphe serotonin) [Hikosaka 2010], [Chowdhury and Yamanaka 2016]. When a seek fails Hb.l suppresses H.l, H.l ends seek, and the animal moves on [Post et al 2022].

Global give-up circuit. H.l lateral hypothalamus, Hb.l lateral habenula, V.dr dorsal raphe, 5HT serotonin.

Because the global give-up circuit is entirely disconnected from the olfactory selective attention from the essay, giving up means giving up on all odors, not just the current attended odor.

Simulation

For this essay, I refactored much of the simulation code to clean up ideas from previous essays. A new hindbrain module manages the main locomotion like the zebrafish hindbrain motor area [Dunn et al 2016], which is possibly different from the tetrapod / amniote locomotion in the midbrain. Because the essay animal is currently more primitive than amniotes, this simplification seemed appropriate and makes the code organization more clear.

Olfactory locomotion is now random-walk based following apical tumble-and-run, as opposed to the earlier bilateral path through Vta (ventral tegmental area / posterior tuberculum) and OT (tectum). In zebrafish both paths exist, which I might explore later, but this essay is restricted to the apical temporal gradient search.

The seek mode now slows the animal and adjusts the Levy walk parameters to simulate ARS (area restricted search). As I’ll cover in the problems section, switching to seek mode is still hardcoded.

I split the habenula seek from habenula give-up (Hb.m from Hb.l) and pulled the gradient seek and head direction from B.ip into the habenula seek. Conceptually, the habenula seek code now represents Hb.m and B.ip as a single complex.

Simulated odor seeking with target attention and distractor inhibition.

In the screenshot above, the animal is making a u-turn to return to the food when the odor gradient (blue semicircle) is opposite the head direction (black semicircle). In the upper right, the green box outlined in red represents the attended green odor signal, while the white box outline in blue represents the suppressed blue odor. Despite the Osn naively sensing both blue and green odors because the animal is in the overlap area, only the green odor passes through Omt to the seek system.

The square borders around the odor color represent P.bf modulation. Red is attended (100% pass through), blue is inhibited (10% pass through), and grey is unmodulated (50% pass through).

In the diamond-shaped homunculus, the bright blue triangle represents the u-turn nudge.

As the goal vector shows, the guessed goal direction isn’t very accurate, particularly when the animal is making a turn. Currently, the animal continues to update its guess even in the middle of a turn when the odor data and averages are not appropriate for the current direction.

References

Böhm E, Brunert D, Rothermel M. Input dependent modulation of olfactory bulb activity by HDB GABAergic projections. Sci Rep. 2020 Jul 1;10(1):10696. 

Boyd AM, Kato HK, Komiyama T, Isaacson JS. Broadcasting of cortical activity to the olfactory bulb. Cell Rep. 2015 Feb 24;10(7):1032-9.

Braitenberg, V. (1984). Vehicles: Experiments in synthetic psychology. Cambridge, MA: MIT Press. “Vehicles – the MIT Press”

Chowdhury S, Yamanaka A. Optogenetic activation of serotonergic terminals facilitates GABAergic inhibitory input to orexin/hypocretin neurons. Sci Rep. 2016;6:36039

Cisek P. Evolution of behavioural control from chordates to primates. Philos Trans R Soc Lond B Biol Sci. 2022 Feb 14

De Saint Jan D. Target-specific control of olfactory bulb periglomerular cells by GABAergic and cholinergic basal forebrain inputs. Elife. 2022 Feb 28;11:e71965.

Dunn, Timothy, Yu Mu, Sujatha Narayan, Owen Randlett, Eva A Naumann, Chao-Tsung Yang, Alexander F Schier, Jeremy Freeman, Florian Engert, Misha B Ahrens (2016) Brain-wide mapping of neural activity controlling zebrafish exploratory locomotion eLife 5:e12741

Henriques PM, Rahman N, Jackson SE, Bianco IH. Nucleus Isthmi Is Required to Sustain Target Pursuit during Visually Guided Prey-Catching. Curr Biol. 2019 Jun 3;29(11):1771-1786.e5. 

Hikosaka O. The habenula: from stress evasion to value-based decision-making. Nat Rev Neurosci. 2010 Jul;11(7):503-13. 

Nunez-Parra A, Cea-Del Rio CA, Huntsman MM, Restrepo D. The Basal Forebrain Modulates Neuronal Response in an Active Olfactory Discrimination Task. Front Cell Neurosci. 2020 Jun 5;14:141. 

Post RJ, Bulkin DA, Ebitz RB, Lee V, Han K, Warden MR. Tonic activity in lateral habenula neurons acts as a neutral valence brake on reward-seeking behavior. Curr Biol. 2022 Oct 24;32(20):4325-4336.e5.

Tosches, Maria Antonietta, and Detlev Arendt. The bilaterian forebrain: an evolutionary chimaera. Current opinion in neurobiology 23.6 (2013): 1080-1089.

Essay 22 issues: subthalamic nucleus simulation

The essay 22 simulation explored a striatum model where the two decision paths competed: odor seeking vs random exploration, using dopamine to bias between exploration and seeking. This model resembled striatum theories like [Bariselli et al. 2020] that consider the stratum’s direct and indirect paths as competing between approach and avoidant actions.

Issues in essay 22 include both neuroscience divergence and simulation problems. Although the simulation is a loose functional model, that laxity isn’t infinite and it may have gone too far from the neuroscience.

Adenosine and perseveration

Seeking and foraging have a perseveration problem: the animal must eventually give up on a failed cue, or it will remain stuck forever. The give-up circuit in essay 22 uses the lateral habenula (Hb.l) to integrate search time until it reaches a threshold to give up. An alternative circuit in the stratum itself involves the indirect path (S.d2), the D2 dopamine receptor and adenosine, with a behaviorally relevant time scale.

When fast neurotransmitters are on the order of 10 milliseconds, creating a timeout on the order of a few minutes is a challenge. Two possible solutions in that timescale are long term potentiation (LTP) where “long” means about 20 minutes, and astrocyte calcium accumulation, which is also about 10 to 20 minutes.

Adenosine receptors (A2r) in the striatum indirect path (S.d2) measure broad neural activity from ATP byproducts that accumulate in the intercellular space. Over 10 minutes those A2r can produce internal calcium ion (Ca) in the astrocytes or via LTP to enhance the indirect path. Enhancing the indirect path (exploration), eventually causes a switch from the direct path (seeking) to exploration, essentially giving-up on the seeking.

Ventral striatum

Although the essay models the dorsal striatum (S.d), the ventral striatum (S.v aka nucleus accumbens) is more associated with exploration and food seeking. In particularly, the olfactory path for food seeking goes through S.v, while midbrain motor actions use S.d. In salamanders, the striatum only processes midbrain (“collo-“) thalamic inputs, while olfactory and direct senses (“lemno-“) go to the cortex [Butler 2008]. Assuming the salamander path is more primitive, the essay’s use of S.d in the model is a likely mistake.

But S.v raises a new issue because S.v doesn’t use the subthalamus (H.stn) [Humphries and Prescott 2009]. Although, that model only applies to the S.v shell (S.sh) not the S.v core (S.core).

Ventral striatum pathway. MLR midbrain locomotive region, P.v ventral pallidum, S.sh ventral striatum shell, Vta ventral tegmental area.

In the above diagram of a striatum shell circuit, an odor-seek path is possible through the ventral tegmental area (Vta) but there is no space for an alternate explore path.

Low dopamine and perseveration

[Rutledge et al. 2009] investigates dopamine in the context of Parkinson’s disease (PD), which exhibits perseveration as a symptom. In contrast to the essay, PD is a low dopamine condition, and adding dopamine resolves the perseveration. But that resolve is the opposite of essay 22’s dopamine model, where low dopamine resolved perseveration.

Now, it’s possible that give-up perseveration and Parkinson’s perseveration are two different symptoms, or it’s possible that the complete absence of dopamine differs from low tonic dopamine, but in either case, the essay 22 model is too simple to explain the striatum’s dopamine use.

Dopamine burst vs tonic

Dopamine in the striatum has two modes: burst and tonic. Essay 22 uses a tonic dopamine, not phasic. The striatum uses phasic dopamine to switch attention to orient to a new salient stimulus. The phasic dopamine circuit is more complicated than the tonic system because it requires coordination with acetylcholine (ACh) from the midbrain laterodorsal tegmentum (V.ldt) and pedunculopontine (V.ppt) nuclei.

A question for the essays is whether that phasic burst is primitive to the striatum, or a later addition, possibly adding an interrupt for orientation to an earlier non-interruptible striatum.

Explore semantics

The word “explore” is used differently by behavioral ecology and in reinforcement learning, despite both using foraging-like tasks. These essays have been using explore in the behavioral ecology meaning, which may cause confusion on the reinforcement learning sense. The different centers on a fixed strategy (policy) compared with changing strategies.

In behavioral ecology, foraging is literal foraging, animals browsing or hunting in a place and moving on (giving up) if the place doesn’t have food [Owen-Smith et al. 2010]. “Exploring” is moving on from an unproductive place, but the policy (strategy) remains constant because moving on is part of the strategy. The policy for when to stay and when to go [Headon et al. 1982] often follows the marginal value theorem [Charnov 1976], which specifies when the animal should move on.

In contract, reinforcement learning (RL) uses “explore” to mean changing the policy (strategy). For example, in a two-armed bandit situation (two slot machines), the RL policy is either using machine A or using machine B, or a fixed probabilistic ratio, not a timeout and give-up policy. In that context, exploring means changing the policy not merely switching machines.

[Kacelnick et al. 2011] points out that the two-choice economic model doesn’t match vertebrate animal behavior, because vertebrates use an accept-reject decision [Cisek and Hayden 2022]. So, while the two-armed bandit may be useful in economics, it’s not a natural decision model for vertebrates.

Avoidance (nicotinic receptors in M.ip)

The simulation uncovered a foraging problem, where the animal remained around an odor patch it had given up on, because the give-up strategy reverts to random search. Instead, the animal should leave the current place and only resume search when its far away.

Path of simulated animal after giving up on a food odor.

In the diagram above, the animal remains near the abandoned food odor. The tight circles are the earlier seek before giving up, and the random path afterwards is the continued search. A better strategy would leave the green odor plume and explore other areas of the space.

As a possible circuit, the habenula (Hb.m) projects to the interpeduncular nucleus (M.ip) uses both glutamate and ACh as neurotransmitters, where ACh amplifies neural output. For low signals without ACh, the animal approaches the object, but high signals with ACh switch approach to avoidance. This avoidance switching is managed by the nicotine receptor (each) which is studied for nicotine addiction [Lee et al. 2019].

An interesting future essay might explore using nicotinic aversion to improve foraging by leaving an abandoned odor plume.

References

Bariselli S, Fobbs WC, Creed MC, Kravitz AV. A competitive model for striatal action selection. Brain Res. 2019 Jun 15;1713:70-79.

Butler, Ann. (2008). Evolution of the thalamus: A morphological and functional review. Thalamus & Related Systems. 4. 35 – 58.

Charnov, Eric L. Optimal foraging, the marginal value theorem. Theoretical population biology 9.2 (1976): 129-136.

Cisek P, Hayden BY. Neuroscience needs evolution. Philos Trans R Soc Lond B Biol Sci. 2022 Feb 14;377(1844):20200518.

Headon T, Jones M, Simonon P, Strummer J (1982) Should I Stay or Should I Go. On Combat Rock. CBS Epic.

Humphries MD, Prescott TJ. The ventral basal ganglia, a selection mechanism at the crossroads of space, strategy, and reward. Prog Neurobiol. 2010 Apr;90(4):385-417.

Kacelnik A, Vasconcelos M, Monteiro T, Aw J. 2011. Darwin’s ‘tug-of-war’ vs. starlings’ ‘horse-racing’: how adaptations for sequential encounters drive simultaneous choice. Behav. Ecol. Sociobiol. 65, 547-558.

Lee HW, Yang SH, Kim JY, Kim H. The Role of the Medial Habenula Cholinergic System in Addiction and Emotion-Associated Behaviors. Front Psychiatry. 2019 Feb 28

Owen-Smith N, Fryxell JM, Merrill EH. Foraging theory upscaled: the behavioural ecology of herbivore movement. Philos Trans R Soc Lond B Biol Sci. 2010 Jul 27;365(1550):2267-78. 

Rutledge RB, Lazzaro SC, Lau B, Myers CE, Gluck MA, Glimcher PW. Dopaminergic drugs modulate learning rates and perseveration in Parkinson’s patients in a dynamic foraging task. J Neurosci. 2009 Dec 2

Essay 22: Subthalamic Nucleus

After essay 21 changed the animal’s default movement to a Lévy exploration, it’s immediate to ask whether that random search is a full action, just like a seek turn or an avoid turn. An if exploration is a controlled action, then the model needs to treat exploration as a full action, like approach or avoid.

Exploration as a full locomotive system at the level of approach and avoid.

[Cisek 2020] identifies a vertebrate system for exploration, including the hippocampus (E.hc) and its associated nuclei such as the retromammilary hypothalamus (H.rm aka supramammilary). Essay 22 considers the idea of treating the subthalamic nucleus (H.stn) as part of the exploration circuit.

Subthalamic nucleus

H.stn is a hypothalamic nucleus from the same area as H.rm, which is part of the hippocampal theta circuit, which synchronizes exploration and spatial memory and learning. However, H.stn is part of the basal ganglia and not directly connected with the exploration system.

[Watson et al. 2021] finds a locomotive function of H.stn, where specific stimulation by the parafascicular thalamus (T.pf) to H.stn starts locomotion. If the stimulation is one-sided, the animal moves forward with a wide turn to the contralateral side. T.pf includes efference copies of motor actions from the MLR as well as from other midbrain actions.

Locomotion induced in the H.stn by T.pf stimulation. H.stn sub thalamic nucleus, T.pf parafascicular nucleus, MLR midbrain locomotor region.

For essay 22, let’s consider the H.stn locomotion as exploration. Since H.stn is part of the basal ganglia, the bulk of essay 22 is considering how exploration might fit into the proto-striatum model of essay 18.

Striatal attention and persistence

Since the current essay simulation animal is an early Cambrian proto-vertebrate, it doesn’t have a full basal ganglia. Evolutionarily, the full basal ganglia architecture could not have sprung into being fully formed; it must have developed in smaller step. Following a hypothetical evolutionary path, the essays are only implementing a simplified striatal model, adding features step-by-step. Unfortunately, because there’s no living species with a partial basal ganglia — all vertebrates have the full system — the essay’s steps are pure invention.

The initial striatum of essay 18 was a partial solution to a simulation problem: persistence. When the animal hit a wall head on, activating both touch sensors, it would choose randomly left or right, but because the simulation is real-time not turn-based, at the next tick both sensors remained active and the animal would choose randomly again, jittering at the wall until enough turns of the same direction escaped the barrier.

proto-striatum circuit for persistence by attention.
Proto-striatum for persistence by attention. Action feedback biases the choice to the last option: win-stay. B.rs reticulospinal motor command, Ob olfactory bulb, MLR midbrain locomotor region, Snc substantia nigra pars compacta (posterior tuberculum).

The main sense-to-action path is from the olfactory bulb (O.b) through the substantia nigra (Snc aka posterior tuberculum in zebrafish) to the midbrain locomotor region (MLR) and to the reticulospinal motor command neurons (B.rs), following the tracing and locomotive study of [Derjean et al. 2010] in zebrafish and Vta/Snc control of locomotion in [Ryczko et al. 2017]. The proto-striatum circuit is built around that olfactory-seeking circuit, acting persistent attention.

The proto-striatal model uses an efference copy of the last action from the MLR to bias the choice of the next action via a MLR to T.pf to striatum path. The model biases the choice through removing inhibition of the odor to action path. If the last action as left, the left odor is disinhibited, making it more likely to win.

The striatal system uses disinhibition for noise reasons. [Cohen et al. 2009] studied attention in the visual system and found that attention removed coherent noise by removing inhibition. By removing inhibition, the attended circuit is less affected by the controlling circuit’s noise.

Note: essay 19 considered an alternative solution to the attention issue by following the nucleus isthmi system in zebrafish as studied in [Grubert et al. 2006], where the attention to the win-stay odor used acetylcholine (ACh) amplification to bias the choice.

Striatal columns: approach and avoid

An immediate difficulty with the simple proto-striatal model is the lack of priority. Although left vs right have equal priority, avoiding a predator is more important than seeking a potential food source. Unfortunately, the proto-striatum treats all options equally. As a solution, essay 18 split the striatum into columns, where each column resolves an internal conflict without priority (“within-system”) and the columns are compared separately (“between-systems”), where “within-system” and “between-system” are from [Cisek 2019].

Proto-striatum columns for maintaining attention.
Dual striatum column for approach and avoid, where MLR resolves the final conflict. B.rs reticulospinal command neuron, B.ss somatosensory (touch), MLR midbrain locomotive region, M.pag periaqueductal gray, Ob olfactory bulb, S.ot olfactory tubercle, S.d dorsal striatum.

Subthalamic nucleus and exploration

If we now treat exploration as a distinct action system, then it needs its own control system and column in the proto-striatum. The within-system choice for exploration is the left and right turns for a random walk, and the between-system choices are between the exploration system and the odor-seeking system.

As a possible neural correlate of exploration, consider the sub thalamic nucleus (H.stn). The sub thalamic nucleus is derived from the hypothalamus, specifically from the same area as the retromammilary area (H.rm aka supramammilary), which is highly correlated with hippocamptal theta, locomotion and exploration.

[Watson et al. 2021] finds a locomotive function of H.stn, where specific stimulation by the parafascicular thalamus (T.pf) produces locomotion via the midbrain locomotive region (MLR). T.pf includes efference copies of motor actions from the MLR as well as other midbrain action efference copies. In the proto-striatum model, the feedback from MLR to striatum uses T.pf.

Exploration locomotive path through H.stn. H.stn sub thalamic nucleus, MLR midbrain locomotive region, T.pf parafascicular thalamus.

Seek and explore with dual striatal columns

Suppose the striatum manages both odor seeking (chemotaxis) and default exploration (Lévy walk). The two actions are conflicting with a complex priority system. When a food odor first appears, the animal should seek toward it (priority to seek), but if no food exists the animal should resume exploration (priority to explore). To resolve the between-system conflict, the two strategies need to columns with lateral inhibition to ensure that only one is selected.

Dual striatum columns for seek and explore strategies. B.rs reticulospinal motor command, H.stn sub thalamic nucleus, Ob olfactory bulb, P.ge globus pallidus external, S.d1 direct striatum projection, S.d2 indirect striatum projection, Snc substantia nigra pars compacta, Snr substantia nigra pars reticulata.

Selecting the seek column enables the odor sense to MLR path, seeking the potential food odor. Selecting the explore column enables the H.stn to MLR path, randomly searching for food.

Note: the double inversion in both paths is to reduce neuron noise [Cohen et al. 2009]. Removing inhibition reduces noise, where adding excitation would add noise. In the essay stimulation, this double negation isn’t necessary.

Striatum with dopamine/habenula control

The previous dual column circuit isn’t sufficient for the problem, because it lacks a control signal to switch between exploit (seek) and explore. The striatum dopamine circuit might help this problem by bringing in the foraging implementation from essay 17.

A major problem in essay 17 was the tradeoff between persistence and perseverance in seeking an odor. Persistence ensures that seeking an odor will continue even when the intermittent. Perseverance is a failure mode where the animal never gives up, like a moth to a flame. As a model, consider using dopamine in the striatum as persistence or effort [Salamone et al. 2007], and control of dopamine by the habenula as solving perseverance with a give-up circuit.

Explore and exploit (seek) columns controlled by dopamine. H.l lateral hypothalamus, Hb.l lateral habenula, H.stn sub thalamic nucleus, MLR midbrain locomotive region, Ob olfactory bulb, P.em pre thalamic eminence, P.ge globus pallidus external, S.d1 striatum direct projection, S.d2 striatum indirect projection, Snc substantia nigra pars compacta, Snr substantia nigra pars reticulata.

The striatum uses two opposing dopamine receptors named D1 and D2. D1 is a stimulating modulator though a G.s protein path, and D2 is an inhibiting modulator through a G.i protein path. In the above diagram, high dopamine will activate the seek column via D1 and inhibiting the explore column via D2. Low dopamine inhibits the seek column and enables the explore column. So dopamine becomes an exploit vs explore controller.

In many primitive animals, dopamine is a food signal. In c.elegans the dopamine neuron is a food-detecting sensory neuron. In vertebrates, the hunger and food-seeking areas like the lateral hypothalamus (H.l) strongly influence midbrain dopamine neurons both directly and indirectly. Indirectly, H.l to lateral habenula (Hb.l) causes non-reward aversion [Lazaridis et al. 2019].

For the essay, I’m taking H.l as multiple roles (H.l is a composite area with at least nine sub-areas [Diaz et al. 2023]), both calculating potential reward (odor) via the H.l to Vta/Snc connection, and cost (exhaustion of seek task without success) via the H.l to Hb.l to Vta/Snc connection.

References

Cisek P. Resynthesizing behavior through phylogenetic refinement. Atten Percept Psychophys. 2019 Oct

Cisek P. Evolution of behavioural control from chordates to primates. Philos Trans R Soc Lond B Biol Sci. 2022 Feb 14

Cohen MR, Maunsell JH. Attention improves performance primarily by reducing interneuronal correlations. Nat Neurosci. 2009 Dec;12(12):1594-600.

Derjean D, Moussaddy A, Atallah E, St-Pierre M, Auclair F, Chang S, Ren X, Zielinski B, Dubuc R. A novel neural substrate for the transformation of olfactory inputs into motor output. PLoS Biol. 2010 Dec 21

Diaz, C., de la Torre, M.M., Rubenstein, J.L.R. et al. Dorsoventral Arrangement of Lateral Hypothalamus Populations in the Mouse Hypothalamus: a Prosomeric Genoarchitectonic Analysis. Mol Neurobiol 60, 687–731 (2023).

Gruberg E., Dudkin E., Wang Y., Marín G., Salas C., Sentis E., Letelier J., Mpodozis J., Malpeli J., Cui H. Influencing and interpreting visual input: the role of a visual feedback system. J. Neurosci. 2006;26:10368–10371

Lazaridis I, Tzortzi O, Weglage M, Märtin A, Xuan Y, Parent M, Johansson Y, Fuzik J, Fürth D, Fenno LE, Ramakrishnan C, Silberberg G, Deisseroth K, Carlén M, Meletis K. A hypothalamus-habenula circuit controls aversion. Mol Psychiatry. 2019 Sep

Ryczko D, Grätsch S, Schläger L, Keuyalian A, Boukhatem Z, Garcia C, Auclair F, Büschges A, Dubuc R. Nigral Glutamatergic Neurons Control the Speed of Locomotion. J Neurosci. 2017 Oct 4

Salamone JD, Correa M, Nunes EJ, Randall PA, Pardo M. The behavioral pharmacology of effort-related choice behavior: dopamine, adenosine and beyond. J Exp Anal Behav. 2012 Jan

Watson GDR, Hughes RN, Petter EA, Fallon IP, Kim N, Severino FPU, Yin HH. Thalamic projections to the subthalamic nucleus contribute to movement initiation and rescue of parkinsonian symptoms. Sci Adv. 2021 Feb 5

17: Issues on vertebrate seek

While implementing the basic model, some issues came up, including issues already solved in earlier essays.

What controls “give-up”?

The foraging task needs to give-up on a non-promising odor, ignore it, leave from the current place, and explore for a new odor. In an earlier essay, odor habituation implemented give-up. If the seek didn’t find the food within the habituation time, the sense would disappear, disabling the seek action.

Animal circling food with no ability to break free.

The perseveration problem can be solved in many ways, including the goal give-up circuit in essay 17 and the odor habituation in an earlier essay. One approach cuts the sensor; the other disables the action. But two solutions raises the question of more possible solutions, any or all of which might affect the animal.

  • Sense habituation (cutting sensor)
  • Habenula give-up (inhibit action)
  • Motivational state – hypothalamus hunger/satiety
  • Circadian rhythm – foraging at twilight
  • Global periodic reset – rest / sleep

Give-up or leave?

The distinction between giving-up and leaving is between abandoning the current action and switching to a new, overriding action. Although the effect is similar, the implementing circuit differs. In a leave circuit, after the give-up time, the animal would actively leave the current area (place avoidance). Assuming the leave action has a higher priority than seeking, then lateral inhibition would disable the seek action. In foraging vocabulary, does failure inhibit exploitation or does it encourage exploration?

Distinct circuits for give-up and leave to curtail a failed odor approach.

As the diagram above shows, this distinction isn’t a semantic quibble, but represents different circuits. In the give-up circuit, the quit decision either inhibits the olfactory seek input and/or inhibits the seek action. With seek disable, the default action moves the animal away from the failed odor. In the leave circuit, the quit decision activates a leave action, which moves the animal away from the failed place, inhibiting the seek action laterally.

Leave or avoid?

Leaving an area is a primitive action and is a requirement for foraging. However, neuroscience papers don’t generally study foraging, they study place avoidance from aversive stimuli, which raises a question. Since the physical action of leaving and aversive place avoidance is identical, do the two actions share circuits or are they distinct?

Distinct leave and avoid actions compared to shared locomotion.

In the avoid circuit, danger avoidance is distinct from food-seeking, only sharing at the lowest motor layers. In the leave circuit, exploration leaving and place avoidance share the same mid-locomotor action.

Slow and fast twitch swimming

[Lacalli 2012] explores the evolution of chordate swimming, inspired by a discovery of mid-Cambrian fossils, which suggest that fast-twitch muscles are a later addition to a more basal chordate swimming, possibly to escape from new Cambrian predators. The paper explores the non-vertebrate Amphioxus motor circuitry in like of the fossil, suggesting two distinct motor circuits: normal swimming and escape.

Slow and fast paths for normal swimming and fast predator escape.

In this model, higher layers are independent paths that only resolve at the lowest motor command neuron level (such as B.rs). For the foraging tasks, this model that leaving an explored area would use a different system from leaving a noxious area (place aversion), despite being the same underlying motion.

Serotonin as muscle gain-control

In the zebrafish, [Wei et al. 2014] studied serotonin in V.dr (dorsal raphe) as gain-control for muscle output, amplifying the effect of glutamate signals. When they inhibited 5HT (serotonin), the muscle only produced 40% of its maximal strength. Serotonin acted as a gain-control, a multiplicative signal that amplified glutamate signals, allowing for a broader dynamic range.

[Kawashima et al. 2016] investigated 5HT in the context of task-learning for muscle effort, where 5HT caches the real-time adjustment by the cerebellum and pretectal areas. When 5HT is disabled, the real-time system still adjusts the muscle effort, but it doesn’t remember the adjustment for future bouts. That study considers the 5HT neurons as leaky integrators of motor-gated visual feedback, where zebrafish gauge the success of swimming effort by visual motion. Notably, the neurons only store visual information when the fish is actively swimming, as an action-outcome integrator.

The two studies focused on opposite muscle effects, both increasing effort and decreasing effort. 5HT can either inhibit or excite depending on the receptor type, suggesting that 5HT shouldn’t be interpreted as representing a specific value, either positive or negative, but instead possibly carrying either value.

Taking these studies as analogies, it seem reasonable to consider V.dr as an action-outcome accumulator for future effort in the 10-30 seconds range, not specific to either positive or negative amplification. Of course, because serotonin has diverse effects in multiple circuits, reality is likely more complicated.

Serotonin zooplankton dispersal and learning

Many aquatic animals have a larval zooplankton stage, where the larva disperses from its spawn point for several days or weeks, then descends to the sea floor for its adult life. A small number of serotonin neurons signal the switch to descend. Essentially, this is a single explore/exploit pair.

Larva exploring in a dispersal stage, switching to descend to the sea floor for adult life.

Habenula function circuit

Essay 17 is running with the model of the habenula as central to the give-up/move-on circuit. The following is a straw man model of the habenula based on the above discussion of quitting, leaving and avoiding circuits. Because essay 17 has no learning or higher areas like the striatum, the diagram ignores any learning functionality. This diagram is for a hypothetical pre-stratal habenular function.

Odor-based locomotion using the habenula.

Note, this locomotion only includes odor-based navigation. The audio-visual-touch locomotion uses a different system based on the optic tectum. This dual-locomotive system may be the result of a bilaterian chimaera brain [Tosches and Arendt 2013].

The habenula connectivity and avoidance path is loosely based on [Stephenson-Jones et al. 2012] on the lamprey habenula connectivity. The seek path is loosely based on [Derjean et al. 2010] for the zebrafish.

In this model, Hb.m (medial habenula) is primarily a danger-avoidance circuit, and M.ipn (interpeduncular nucleus) is a place avoidance locomotive region. Hb.l (lateral habenula) is a give-up circuit that both inhibits the seek function (giving up) and excites the shared leave locomotor region, implementing the foraging exploit to explore decision. Here, place avoidance and exploratory leaving are treated as equivalent. As mentioned above, this diagram is mean to be a straw man or a thought experiment, because it’s easier to work with a concrete model.

References

Derjean D, Moussaddy A, Atallah E, St-Pierre M, Auclair F, Chang S, Ren X, Zielinski B, Dubuc R. A novel neural substrate for the transformation of olfactory inputs into motor output. PLoS Biol. 2010 Dec 21

Kawashima T, Zwart MF, Yang CT, Mensh BD, Ahrens MB. The Serotonergic System Tracks the Outcomes of Actions to Mediate Short-Term Motor Learning. Cell. 2016 Nov 3

Lacalli, T. (2012). The Middle Cambrian fossil Pikaia and the evolution of chordate swimming. EvoDevo, 3(1), 1-6.

Stephenson-Jones M, Floros O, Robertson B, Grillner S. Evolutionary conservation of the habenular nuclei and their circuitry controlling the dopamine and 5-hydroxytryptophan (5-HT) systems. Proc Natl Acad Sci U S A. 2012 Jan 17

Tosches, Maria Antonietta, and Detlev Arendt. The bilaterian forebrain: an evolutionary chimaera. Current opinion in neurobiology 23.6 (2013): 1080-1089.

Wei, K., Glaser, J.I., Deng, L., Thompson, C.K., Stevenson, I.H., Wang, Q., Hornby, T.G., Heckman, C.J., and Kording, K.P. (2014). Serotonin affects movement gain control in the spinal cord. J. Neurosci. 34

Powered by WordPress & Theme by Anders Norén