Attempting a toy model of vertebrate understanding

Tag: striatum

Essay 31: Striatum as Timeout

Let’s return to the task of essay 16 on give-up time in foraging, which covered food search with a timeout. At first the animal uses a general roaming search and if it smells a food odor, it switches to a targeted seek following the odor with chemotaxis. If the animal finds food in the odor plume, it eats the food, but if it doesn’t find food, it will eventually give up and avoid the local area before returning to the roaming search.

Search state machine. Roam is the starting state, switching to seek when it detects odor, and switching to avoid after a timeout.

For another attempt at the problem, let’s take the striatum (basal ganglia) as implementing the timeout portion of this task using the neurotransmitter adenosine as a timeout signal and incorporating the multiple action path discussion from essay 30 on RTPA. Adenosine is a byproduct of ATP breakdown and is a measure of cellular activity. With sufficiently high adenosine, the striatum switches from the active seek path to an avoidance path. These circuits are where caffeine works to suppress the adenosine timeout, allowing for longer concentration.

Mollusk navigation

As mentioned in essay 30, the mollusk sea slug has a food search circuit with a similar logic to what we need here. The animal seeks food odors when it’s hungry, but it avoids food odors when it’s not hungry [Gillette and Brown 2015].

Mollusk food search circuit, modulated by hunger.
Mollusk food search circuit, illustrating a hunger-modulated switchboard. When the animal is not hungry, the switchboard reverses the odor to motor links turning it away from food.

This essay uses the same idea but replaces the hunger modulation with a timeout. When the timeout occurs, the circuit switches from a food seek action path to a food avoid action path.

Odor action paths

Two odor-following actions paths exist in the lamprey, one using Hb.m (medial habenula) and one using V.pt (posterior tuberculum). The Hb.m path is a chemotaxis path following a temporal gradient. The V.pt path projects to MLR (midbrain locomotor region), but The lamprey Ob.m (medial olfactory bulb) projects to both Hb.m (medial habenula) and to V.pt (posterior tuberculum), which each project to different locomotor paths [Derjean et all 2010], Hb.m to R.ip (interpeduncular nucleus) and V.pt to MLR (midbrain locomotor region). The zebrafish also has Ob projections to Hb and V.pt [Imamura et al 2020], [Kermen et al 2013].

Dual odor-seeking action paths in the lamprey and zebrafish. Hb (habenula), Ob.m (medial olfactory bulb), V.pt (posterior tectum).

Further complicating the paths, the Hb.m itself contains both an odor seeking path and an odor avoiding path [Beretta et al 2012], [Chen et al 2019]. Similarly Hb.m has dual action paths for social winning and losing [Okamoto et al 2021]. So, this essay could use the dual paths in Ob.m instead of contrasting Ob.m with V.pt, but the larger contract should make the simulation easier to follow.

This essay’s simulation makes some important simplifications. The Hb to R.ip path is a temporal gradient path used for chemotaxis, phototaxis and thermotaxis. In a real-world marine environment, odor diffusion and water turbulence is much more complicated, producing more clumps and making a simple gradient ascent more difficult [Hengenius et al 2012]. Because this essay is only focused on the switchboard effect, this simplification should be fine.

Striatum action paths with adenosine timeout

The timeout circuit uses the striatum, which has two paths: one selecting the main action, and the second either stopping the action, or selecting an opposing action [Zhai et al 2023]. The two paths are distinguished by their responsiveness to dopamine with S.d1 (striatal projection with D1 G-s stimulating) or S.d2 (striatal projection with D2 G-i inhibiting) marking the active and alternate paths respectively. This model is a simplification of the mammalian striatum where the two paths interact in a more complicated fashion [Cui et al 2013].

Essay odor seek with timeout circuit. The seek path flows from Ob, through S.d1 to P.v to V.pt. The avoid path flows from Obj, though S.d2 to Pv. to Hb. Ad (adenosine), Hb (habenula), Ob (olfactory bulb), Pv (ventral pallidum), S.d1 (striatum D1 projection neuron), S.d2 (striatum D2 projection neuron), V.pt (posterior tuberculum)

As mentioned, the two actions paths are the seek path from Ob to V.pt and the avoid path from Ob to Hb. For the timeout and switchboard, the Ob has a secondary projection to the striatum. Although this circuit is meant as a proto-vertebrate simplification, Ob does project to S.ot (olfactory tubercle) and to the equivalent in zebrafish [Kermen et al 2013].

The timeout is managed by adenosine, which is a neurotransmitter derived from ATP and a measure of neural activity. The striatum has three sub-circuits for this kind of functionality, which I’ll cover in order of complexity.

S.d1 and adenosine inhibition

The first circuit only uses the direct S.d1 path and adenosine as a timeout mechanism. When the animal follows an odor, the Ob to S.d1 signal enables the seek action. As a timeout, ATP from neural activity degrades to adenosine and the buildup of adenosine is a decent measure of activity over time. The longer the animal seeks, the more adenosine builds up. Of the Ob projection axis contains an A1i (adenosine G-i inhibitory) receptor, the adenosine will inhibit the release of glutamate from Ob, which will eventually self-disable the seek action.

S.d1 action path inhibited by adenosine buildup as a timeout. A1i (adenosine G-i inhibitory receptor), Ad (adenosine), mGlu5q (metabotropic glutamate G-q receptor), Ob (olfactory bulb), S.d1 (D1-type striatal projection neuron)

In practice, the striatum uses astrocytes to manage the glutamate release. An astrocyte that envelops the synapse measures glutamate release with an mGlu5q (metabotropic glutamate with G-q/11 binding) receptor and accumulates internal calcium [Cavaccini et al 2020]. The astrocyte’s calcium triggers an adenosine release as a gliotransmitter, making the adenosine level a timeout measure of glutamate activity. The presynaptic A1i receptor then inhibits the Ob signal. The timeframe is on the order of 5 to 20 minutes with a recovery of about 60 minutes, although the precise timing is probably variable. Interestingly, the time-out is a log function instead of linear measure of activity [Ma et al 2022].

This circuit doesn’t depend on the postsynaptic S.d1 firing [Cavaccini et al 2020], which contrasts with the next LTD (long term depression) circuit which only inhibits the axon if the S.d1 projection neuron fires.

S.d1 presynaptic LTD using eCB

S.d1 self-activating LTD uses retrotransmission to inhibit its own input using eCB (endocannabiniods) as a neurotransmitter. Like the astrocyte in the previous circuit, S.d1 uses a mGlu5q receptor to trigger eCB release, but also require that S.d1 fire, as triggered by NMDA glutamate receptor. The axon receives the eCB retrotransmission with a CB1i (cannabinoid G-i inhibitory) receptor and trigger presynaptic LTD [Shen et al 2008], [Wu et al 2015]. Like the previous circuit, the timeframe seems to be on the order of 10 minutes, lasting for 30 to 60 minutes.

S.d1 LTD circuit. A coincidence of glutamate detection with mGlu5q and S.d1 activation with NMDA triggers eCB release, which activates CB1i leading to presynaptic LTD. CB1i (cannabinoid G-i inhibitory receptor), mGlu5q (glutamate G-q receptor), Ob (olfactory bulb), S.d1 (striatum D1-type projection neuron).

This circuit inhibits itself over time without using adenosine or astrocytes. In the full striatum circuit, high dopamine levels suppress this LTD suppression, meaning that dopamine inhibits the timeout [Shen et al 2008].

The next circuit adds the S.d2 path, which uses adenosine and self-activity to trigger postsynaptic LTD.

S.d2 postsynaptic LTP via A2a.s

Consider a third circuit that has the benefits of both previous circuits because it uses adenosine as a timer managed by astrocytes and is also specific to postsynaptic activity. In addition, it allows for a second action path, changing the circuit from a Go/NoGo system to a Go/Avoid action pair. This circuit uses LTP (long term potentiation) on the S.d2 striatum neurons.

Timeout circuit using postsynaptic LTD at the S.d2 neuron and adenosine as a timeout signal. As adenosine accumulates, it stimulates S.d2, which both disables S.d1 and drives the avoid path. A2a.s (adenosine G-s stimulatory receptor), Ad (adenosine), mGlu5q (glutamate G-q metabotropic receptor), Ob (olfactory bulb), S.d1 (striatum D1-type projection neuron), S.d2 (striatum D2-type projection neuron)

When the odor first arrives, Ob activates the S.d1 path, seeking toward the odor. S.d1 is activated instead of S.d2 because of dopamine. In this simple model, the Ob itself could provide the initial dopamine like c. elegans odor-detecting neurons or the tunicate’s coronal cells or the dual glutamate and dopamine neurons in Vta (ventral tegmental area).

As time goes on, adenosine from the astrocyte builds up, which activates the S.d2 A2s.a (adenosine G-s stimulatory receptor) until it overcomes dopamine suppression and increases the S.d2 activity with LTP [Shen et al 2008]. Once S.d2 activates, it suppresses S.d1 [Chen et al 2023] and drives the avoid path.

The combination of these circuits looks like it’s precisely what the essay needs.

Simulation

In the simulation, when the animal is hunting food and finds a food odor plume, it directly seeks toward the center and eats if it find food. In the screenshot below, the animal is eating.

Simulation showing the animal eating food after seeking the odor plume.

Satiation disables the food seek. This might sound obvious, but hunger gating of food seeking requires specific satiety circuits to any seek path that’s food specific, which means the involvement of H.l (lateral hypothalamus) and related areas like H.arc (arcuate hypothalamus) and H.pv (periventricular hypothalamus). And, of course, the simulation requires simulation code to only enable food odor seek when the animal is searching for food.

The next screenshot shows the central problem of the essay, when the animal seeks a food odor but there’s no food at the center.

Screenshot showing the animal stuck in the middle of the food odor plume before the timeout.

Without a timeout, the animal circles the center of the food odor plume endlessly. After a timeout, the animal actively leaves the plume and avoid that specific odor until the timeout decays.

Screenshot showing the animal escaping from the odor plume after the timeout.

This system is somewhat complex because of the need for hysteresis. A too-simple solution with a single threshold can oscillate, because as soon as the animal starts leaving the timeout decays, which then re-enables the food-seek, which then quickly times out, repeating. Instead, the system needs to make re-enabling of the food seek more difficult after a timeout.

But that adds a secondary issue because if food seek is a lower threshold, then the sustain of seek needs to raise the threshold while the seek occurs. So, the sustain of seek needs a lower threshold than starting seek. This hysteresis and seek sustain presumably needs to be handled by the actual striatum circuit.

Discussion

I think this essay shows that using the stratum for an action timeout for food seek is a plausible application. The circuit is relatively simple and is effective, improving search by avoiding failed areas.

However, the simulation does raise some issues, particularly hysteresis problem. If the striatum does provide a timeout along these lines, it must somehow solve the hysteresis problem. While the animal is seeking, the ongoing LTP/LTD inhibition should use a high threshold to stop seeking, but once avoidance starts, there needs to be a high threshold to return to seeking to avoid oscillations between the two action paths.

Because LTD/LTP is a relatively long chemical process (minutes) internal to the neurons, as opposed to an instant switch in the simulation, the delay itself might be sufficient to solve the oscillation problem. It’s also possible that some of the more complicated parts of the circuit, such as P.ge (globus pallidus) and its feedback to the striatum or H.stn (subthalamic nucleus) might affect the sustain of seek or breaking it and so control the hysteresis problem.

The simulation also reinforced the absolute requirement that action paths need to be modulated by internal state like hunger. For the seek paths, both Hb.m and V.pt are heavily modulated by H.l and other hypothalamic hunger and satiety signals.

As expected, the simulation also illustrated the need for context information separate from the target odor. While the food odor is timed out, the animal can’t search the other odor plume because this essay’s animal can’t distinguish between the odor plumes, and therefore avoids both odors. With a long timeout and many odor plumes, this delays the food search. A future enhancement is to add context to the timeout. If the animal can timeout a specific odor plume, it can search alternatives even if the food odor itself is identical.

References

Beretta CA, Dross N, Guiterrez-Triana JA, Ryu S, Carl M. Habenula circuit development: past, present, and future. Front Neurosci. 2012 Apr 23;6:51. 

Cavaccini A, Durkee C, Kofuji P, Tonini R, Araque A. Astrocyte Signaling Gates Long-Term Depression at Corticostriatal Synapses of the Direct Pathway. J Neurosci. 2020 Jul 22;40(30):5757-5768. 

Chen JF, Choi DS, Cunha RA. Striatopallidal adenosine A2A receptor modulation of goal-directed behavior: Homeostatic control with cognitive flexibility. Neuropharmacology. 2023 Mar 15;226:109421. 

Chen WY, Peng XL, Deng QS, Chen MJ, Du JL, Zhang BB. Role of Olfactorily Responsive Neurons in the Right Dorsal Habenula-Ventral Interpeduncular Nucleus Pathway in Food-Seeking Behaviors of Larval Zebrafish. Neuroscience. 2019 Apr 15;404:259-267. 

Cui G, Jun SB, Jin X, Pham MD, Vogel SS, Lovinger DM, Costa RM. Concurrent activation of striatal direct and indirect pathways during action initiation. Nature. 2013 Feb 14;494(7436):238-42.

Derjean D, Moussaddy A, Atallah E, St-Pierre M, Auclair F, Chang S, Ren X, Zielinski B, Dubuc R. A novel neural substrate for the transformation of olfactory inputs into motor output. PLoS Biol. 2010 Dec 21;8(12):e1000567. 

Gillette R, Brown JW. The Sea Slug, Pleurobranchaea californica: A Signpost Species in the Evolution of Complex Nervous Systems and Behavior. Integr Comp Biol. 2015 Dec;55(6):1058-69. 

Hengenius JB, Connor EG, Crimaldi JP, Urban NN, Ermentrout GB. Olfactory navigation in the real world: Simple local search strategies for turbulent environments. J Theor Biol. 2021 May 7;516:110607.

Imamura F, Ito A, LaFever BJ. Subpopulations of Projection Neurons in the Olfactory Bulb. Front Neural Circuits. 2020 Aug 28;14:561822. 

Kermen F, Franco LM, Wyatt C, Yaksi E. Neural circuits mediating olfactory-driven behavior in fish. Front Neural Circuits. 2013 Apr 11;7:62.

Ma L, Day-Cooney J, Benavides OJ, Muniak MA, Qin M, Ding JB, Mao T, Zhong H. Locomotion activates PKA through dopamine and adenosine in striatal neurons. Nature. 2022 Nov;611(7937):762-768.

Okamoto H, Cherng BW, Nakajo H, Chou MY, Kinoshita M. Habenula as the experience-dependent controlling switchboard of behavior and attention in social conflict and learning. Curr Opin Neurobiol. 2021 Jun;68:36-43. 

Shen W, Flajolet M, Greengard P, Surmeier DJ. Dichotomous dopaminergic control of striatal synaptic plasticity. Science. 2008 Aug 8;321(5890):848-51. 

Wu YW, Kim JI, Tawfik VL, Lalchandani RR, Scherrer G, Ding JB. Input- and cell-type-specific endocannabinoid-dependent LTD in the striatum. Cell Rep. 2015 Jan 6;10(1):75-87. 

Zhai S, Cui Q, Simmons DV, Surmeier DJ. Distributed dopaminergic signaling in the basal ganglia and its relationship to motor disability in Parkinson’s disease. Curr Opin Neurobiol. 2023 Dec;83:102798.

Essay 22 issues: subthalamic nucleus simulation

The essay 22 simulation explored a striatum model where the two decision paths competed: odor seeking vs random exploration, using dopamine to bias between exploration and seeking. This model resembled striatum theories like [Bariselli et al. 2020] that consider the stratum’s direct and indirect paths as competing between approach and avoidant actions.

Issues in essay 22 include both neuroscience divergence and simulation problems. Although the simulation is a loose functional model, that laxity isn’t infinite and it may have gone too far from the neuroscience.

Adenosine and perseveration

Seeking and foraging have a perseveration problem: the animal must eventually give up on a failed cue, or it will remain stuck forever. The give-up circuit in essay 22 uses the lateral habenula (Hb.l) to integrate search time until it reaches a threshold to give up. An alternative circuit in the stratum itself involves the indirect path (S.d2), the D2 dopamine receptor and adenosine, with a behaviorally relevant time scale.

When fast neurotransmitters are on the order of 10 milliseconds, creating a timeout on the order of a few minutes is a challenge. Two possible solutions in that timescale are long term potentiation (LTP) where “long” means about 20 minutes, and astrocyte calcium accumulation, which is also about 10 to 20 minutes.

Adenosine receptors (A2r) in the striatum indirect path (S.d2) measure broad neural activity from ATP byproducts that accumulate in the intercellular space. Over 10 minutes those A2r can produce internal calcium ion (Ca) in the astrocytes or via LTP to enhance the indirect path. Enhancing the indirect path (exploration), eventually causes a switch from the direct path (seeking) to exploration, essentially giving-up on the seeking.

Ventral striatum

Although the essay models the dorsal striatum (S.d), the ventral striatum (S.v aka nucleus accumbens) is more associated with exploration and food seeking. In particularly, the olfactory path for food seeking goes through S.v, while midbrain motor actions use S.d. In salamanders, the striatum only processes midbrain (“collo-“) thalamic inputs, while olfactory and direct senses (“lemno-“) go to the cortex [Butler 2008]. Assuming the salamander path is more primitive, the essay’s use of S.d in the model is a likely mistake.

But S.v raises a new issue because S.v doesn’t use the subthalamus (H.stn) [Humphries and Prescott 2009]. Although, that model only applies to the S.v shell (S.sh) not the S.v core (S.core).

Ventral striatum pathway. MLR midbrain locomotive region, P.v ventral pallidum, S.sh ventral striatum shell, Vta ventral tegmental area.

In the above diagram of a striatum shell circuit, an odor-seek path is possible through the ventral tegmental area (Vta) but there is no space for an alternate explore path.

Low dopamine and perseveration

[Rutledge et al. 2009] investigates dopamine in the context of Parkinson’s disease (PD), which exhibits perseveration as a symptom. In contrast to the essay, PD is a low dopamine condition, and adding dopamine resolves the perseveration. But that resolve is the opposite of essay 22’s dopamine model, where low dopamine resolved perseveration.

Now, it’s possible that give-up perseveration and Parkinson’s perseveration are two different symptoms, or it’s possible that the complete absence of dopamine differs from low tonic dopamine, but in either case, the essay 22 model is too simple to explain the striatum’s dopamine use.

Dopamine burst vs tonic

Dopamine in the striatum has two modes: burst and tonic. Essay 22 uses a tonic dopamine, not phasic. The striatum uses phasic dopamine to switch attention to orient to a new salient stimulus. The phasic dopamine circuit is more complicated than the tonic system because it requires coordination with acetylcholine (ACh) from the midbrain laterodorsal tegmentum (V.ldt) and pedunculopontine (V.ppt) nuclei.

A question for the essays is whether that phasic burst is primitive to the striatum, or a later addition, possibly adding an interrupt for orientation to an earlier non-interruptible striatum.

Explore semantics

The word “explore” is used differently by behavioral ecology and in reinforcement learning, despite both using foraging-like tasks. These essays have been using explore in the behavioral ecology meaning, which may cause confusion on the reinforcement learning sense. The different centers on a fixed strategy (policy) compared with changing strategies.

In behavioral ecology, foraging is literal foraging, animals browsing or hunting in a place and moving on (giving up) if the place doesn’t have food [Owen-Smith et al. 2010]. “Exploring” is moving on from an unproductive place, but the policy (strategy) remains constant because moving on is part of the strategy. The policy for when to stay and when to go [Headon et al. 1982] often follows the marginal value theorem [Charnov 1976], which specifies when the animal should move on.

In contract, reinforcement learning (RL) uses “explore” to mean changing the policy (strategy). For example, in a two-armed bandit situation (two slot machines), the RL policy is either using machine A or using machine B, or a fixed probabilistic ratio, not a timeout and give-up policy. In that context, exploring means changing the policy not merely switching machines.

[Kacelnick et al. 2011] points out that the two-choice economic model doesn’t match vertebrate animal behavior, because vertebrates use an accept-reject decision [Cisek and Hayden 2022]. So, while the two-armed bandit may be useful in economics, it’s not a natural decision model for vertebrates.

Avoidance (nicotinic receptors in M.ip)

The simulation uncovered a foraging problem, where the animal remained around an odor patch it had given up on, because the give-up strategy reverts to random search. Instead, the animal should leave the current place and only resume search when its far away.

Path of simulated animal after giving up on a food odor.

In the diagram above, the animal remains near the abandoned food odor. The tight circles are the earlier seek before giving up, and the random path afterwards is the continued search. A better strategy would leave the green odor plume and explore other areas of the space.

As a possible circuit, the habenula (Hb.m) projects to the interpeduncular nucleus (M.ip) uses both glutamate and ACh as neurotransmitters, where ACh amplifies neural output. For low signals without ACh, the animal approaches the object, but high signals with ACh switch approach to avoidance. This avoidance switching is managed by the nicotine receptor (each) which is studied for nicotine addiction [Lee et al. 2019].

An interesting future essay might explore using nicotinic aversion to improve foraging by leaving an abandoned odor plume.

References

Bariselli S, Fobbs WC, Creed MC, Kravitz AV. A competitive model for striatal action selection. Brain Res. 2019 Jun 15;1713:70-79.

Butler, Ann. (2008). Evolution of the thalamus: A morphological and functional review. Thalamus & Related Systems. 4. 35 – 58.

Charnov, Eric L. Optimal foraging, the marginal value theorem. Theoretical population biology 9.2 (1976): 129-136.

Cisek P, Hayden BY. Neuroscience needs evolution. Philos Trans R Soc Lond B Biol Sci. 2022 Feb 14;377(1844):20200518.

Headon T, Jones M, Simonon P, Strummer J (1982) Should I Stay or Should I Go. On Combat Rock. CBS Epic.

Humphries MD, Prescott TJ. The ventral basal ganglia, a selection mechanism at the crossroads of space, strategy, and reward. Prog Neurobiol. 2010 Apr;90(4):385-417.

Kacelnik A, Vasconcelos M, Monteiro T, Aw J. 2011. Darwin’s ‘tug-of-war’ vs. starlings’ ‘horse-racing’: how adaptations for sequential encounters drive simultaneous choice. Behav. Ecol. Sociobiol. 65, 547-558.

Lee HW, Yang SH, Kim JY, Kim H. The Role of the Medial Habenula Cholinergic System in Addiction and Emotion-Associated Behaviors. Front Psychiatry. 2019 Feb 28

Owen-Smith N, Fryxell JM, Merrill EH. Foraging theory upscaled: the behavioural ecology of herbivore movement. Philos Trans R Soc Lond B Biol Sci. 2010 Jul 27;365(1550):2267-78. 

Rutledge RB, Lazzaro SC, Lau B, Myers CE, Gluck MA, Glimcher PW. Dopaminergic drugs modulate learning rates and perseveration in Parkinson’s patients in a dynamic foraging task. J Neurosci. 2009 Dec 2

Essay 22: Subthalamic Nucleus

After essay 21 changed the animal’s default movement to a Lévy exploration, it’s immediate to ask whether that random search is a full action, just like a seek turn or an avoid turn. An if exploration is a controlled action, then the model needs to treat exploration as a full action, like approach or avoid.

Exploration as a full locomotive system at the level of approach and avoid.

[Cisek 2020] identifies a vertebrate system for exploration, including the hippocampus (E.hc) and its associated nuclei such as the retromammilary hypothalamus (H.rm aka supramammilary). Essay 22 considers the idea of treating the subthalamic nucleus (H.stn) as part of the exploration circuit.

Subthalamic nucleus

H.stn is a hypothalamic nucleus from the same area as H.rm, which is part of the hippocampal theta circuit, which synchronizes exploration and spatial memory and learning. However, H.stn is part of the basal ganglia and not directly connected with the exploration system.

[Watson et al. 2021] finds a locomotive function of H.stn, where specific stimulation by the parafascicular thalamus (T.pf) to H.stn starts locomotion. If the stimulation is one-sided, the animal moves forward with a wide turn to the contralateral side. T.pf includes efference copies of motor actions from the MLR as well as from other midbrain actions.

Locomotion induced in the H.stn by T.pf stimulation. H.stn sub thalamic nucleus, T.pf parafascicular nucleus, MLR midbrain locomotor region.

For essay 22, let’s consider the H.stn locomotion as exploration. Since H.stn is part of the basal ganglia, the bulk of essay 22 is considering how exploration might fit into the proto-striatum model of essay 18.

Striatal attention and persistence

Since the current essay simulation animal is an early Cambrian proto-vertebrate, it doesn’t have a full basal ganglia. Evolutionarily, the full basal ganglia architecture could not have sprung into being fully formed; it must have developed in smaller step. Following a hypothetical evolutionary path, the essays are only implementing a simplified striatal model, adding features step-by-step. Unfortunately, because there’s no living species with a partial basal ganglia — all vertebrates have the full system — the essay’s steps are pure invention.

The initial striatum of essay 18 was a partial solution to a simulation problem: persistence. When the animal hit a wall head on, activating both touch sensors, it would choose randomly left or right, but because the simulation is real-time not turn-based, at the next tick both sensors remained active and the animal would choose randomly again, jittering at the wall until enough turns of the same direction escaped the barrier.

proto-striatum circuit for persistence by attention.
Proto-striatum for persistence by attention. Action feedback biases the choice to the last option: win-stay. B.rs reticulospinal motor command, Ob olfactory bulb, MLR midbrain locomotor region, Snc substantia nigra pars compacta (posterior tuberculum).

The main sense-to-action path is from the olfactory bulb (O.b) through the substantia nigra (Snc aka posterior tuberculum in zebrafish) to the midbrain locomotor region (MLR) and to the reticulospinal motor command neurons (B.rs), following the tracing and locomotive study of [Derjean et al. 2010] in zebrafish and Vta/Snc control of locomotion in [Ryczko et al. 2017]. The proto-striatum circuit is built around that olfactory-seeking circuit, acting persistent attention.

The proto-striatal model uses an efference copy of the last action from the MLR to bias the choice of the next action via a MLR to T.pf to striatum path. The model biases the choice through removing inhibition of the odor to action path. If the last action as left, the left odor is disinhibited, making it more likely to win.

The striatal system uses disinhibition for noise reasons. [Cohen et al. 2009] studied attention in the visual system and found that attention removed coherent noise by removing inhibition. By removing inhibition, the attended circuit is less affected by the controlling circuit’s noise.

Note: essay 19 considered an alternative solution to the attention issue by following the nucleus isthmi system in zebrafish as studied in [Grubert et al. 2006], where the attention to the win-stay odor used acetylcholine (ACh) amplification to bias the choice.

Striatal columns: approach and avoid

An immediate difficulty with the simple proto-striatal model is the lack of priority. Although left vs right have equal priority, avoiding a predator is more important than seeking a potential food source. Unfortunately, the proto-striatum treats all options equally. As a solution, essay 18 split the striatum into columns, where each column resolves an internal conflict without priority (“within-system”) and the columns are compared separately (“between-systems”), where “within-system” and “between-system” are from [Cisek 2019].

Proto-striatum columns for maintaining attention.
Dual striatum column for approach and avoid, where MLR resolves the final conflict. B.rs reticulospinal command neuron, B.ss somatosensory (touch), MLR midbrain locomotive region, M.pag periaqueductal gray, Ob olfactory bulb, S.ot olfactory tubercle, S.d dorsal striatum.

Subthalamic nucleus and exploration

If we now treat exploration as a distinct action system, then it needs its own control system and column in the proto-striatum. The within-system choice for exploration is the left and right turns for a random walk, and the between-system choices are between the exploration system and the odor-seeking system.

As a possible neural correlate of exploration, consider the sub thalamic nucleus (H.stn). The sub thalamic nucleus is derived from the hypothalamus, specifically from the same area as the retromammilary area (H.rm aka supramammilary), which is highly correlated with hippocamptal theta, locomotion and exploration.

[Watson et al. 2021] finds a locomotive function of H.stn, where specific stimulation by the parafascicular thalamus (T.pf) produces locomotion via the midbrain locomotive region (MLR). T.pf includes efference copies of motor actions from the MLR as well as other midbrain action efference copies. In the proto-striatum model, the feedback from MLR to striatum uses T.pf.

Exploration locomotive path through H.stn. H.stn sub thalamic nucleus, MLR midbrain locomotive region, T.pf parafascicular thalamus.

Seek and explore with dual striatal columns

Suppose the striatum manages both odor seeking (chemotaxis) and default exploration (Lévy walk). The two actions are conflicting with a complex priority system. When a food odor first appears, the animal should seek toward it (priority to seek), but if no food exists the animal should resume exploration (priority to explore). To resolve the between-system conflict, the two strategies need to columns with lateral inhibition to ensure that only one is selected.

Dual striatum columns for seek and explore strategies. B.rs reticulospinal motor command, H.stn sub thalamic nucleus, Ob olfactory bulb, P.ge globus pallidus external, S.d1 direct striatum projection, S.d2 indirect striatum projection, Snc substantia nigra pars compacta, Snr substantia nigra pars reticulata.

Selecting the seek column enables the odor sense to MLR path, seeking the potential food odor. Selecting the explore column enables the H.stn to MLR path, randomly searching for food.

Note: the double inversion in both paths is to reduce neuron noise [Cohen et al. 2009]. Removing inhibition reduces noise, where adding excitation would add noise. In the essay stimulation, this double negation isn’t necessary.

Striatum with dopamine/habenula control

The previous dual column circuit isn’t sufficient for the problem, because it lacks a control signal to switch between exploit (seek) and explore. The striatum dopamine circuit might help this problem by bringing in the foraging implementation from essay 17.

A major problem in essay 17 was the tradeoff between persistence and perseverance in seeking an odor. Persistence ensures that seeking an odor will continue even when the intermittent. Perseverance is a failure mode where the animal never gives up, like a moth to a flame. As a model, consider using dopamine in the striatum as persistence or effort [Salamone et al. 2007], and control of dopamine by the habenula as solving perseverance with a give-up circuit.

Explore and exploit (seek) columns controlled by dopamine. H.l lateral hypothalamus, Hb.l lateral habenula, H.stn sub thalamic nucleus, MLR midbrain locomotive region, Ob olfactory bulb, P.em pre thalamic eminence, P.ge globus pallidus external, S.d1 striatum direct projection, S.d2 striatum indirect projection, Snc substantia nigra pars compacta, Snr substantia nigra pars reticulata.

The striatum uses two opposing dopamine receptors named D1 and D2. D1 is a stimulating modulator though a G.s protein path, and D2 is an inhibiting modulator through a G.i protein path. In the above diagram, high dopamine will activate the seek column via D1 and inhibiting the explore column via D2. Low dopamine inhibits the seek column and enables the explore column. So dopamine becomes an exploit vs explore controller.

In many primitive animals, dopamine is a food signal. In c.elegans the dopamine neuron is a food-detecting sensory neuron. In vertebrates, the hunger and food-seeking areas like the lateral hypothalamus (H.l) strongly influence midbrain dopamine neurons both directly and indirectly. Indirectly, H.l to lateral habenula (Hb.l) causes non-reward aversion [Lazaridis et al. 2019].

For the essay, I’m taking H.l as multiple roles (H.l is a composite area with at least nine sub-areas [Diaz et al. 2023]), both calculating potential reward (odor) via the H.l to Vta/Snc connection, and cost (exhaustion of seek task without success) via the H.l to Hb.l to Vta/Snc connection.

References

Cisek P. Resynthesizing behavior through phylogenetic refinement. Atten Percept Psychophys. 2019 Oct

Cisek P. Evolution of behavioural control from chordates to primates. Philos Trans R Soc Lond B Biol Sci. 2022 Feb 14

Cohen MR, Maunsell JH. Attention improves performance primarily by reducing interneuronal correlations. Nat Neurosci. 2009 Dec;12(12):1594-600.

Derjean D, Moussaddy A, Atallah E, St-Pierre M, Auclair F, Chang S, Ren X, Zielinski B, Dubuc R. A novel neural substrate for the transformation of olfactory inputs into motor output. PLoS Biol. 2010 Dec 21

Diaz, C., de la Torre, M.M., Rubenstein, J.L.R. et al. Dorsoventral Arrangement of Lateral Hypothalamus Populations in the Mouse Hypothalamus: a Prosomeric Genoarchitectonic Analysis. Mol Neurobiol 60, 687–731 (2023).

Gruberg E., Dudkin E., Wang Y., Marín G., Salas C., Sentis E., Letelier J., Mpodozis J., Malpeli J., Cui H. Influencing and interpreting visual input: the role of a visual feedback system. J. Neurosci. 2006;26:10368–10371

Lazaridis I, Tzortzi O, Weglage M, Märtin A, Xuan Y, Parent M, Johansson Y, Fuzik J, Fürth D, Fenno LE, Ramakrishnan C, Silberberg G, Deisseroth K, Carlén M, Meletis K. A hypothalamus-habenula circuit controls aversion. Mol Psychiatry. 2019 Sep

Ryczko D, Grätsch S, Schläger L, Keuyalian A, Boukhatem Z, Garcia C, Auclair F, Büschges A, Dubuc R. Nigral Glutamatergic Neurons Control the Speed of Locomotion. J Neurosci. 2017 Oct 4

Salamone JD, Correa M, Nunes EJ, Randall PA, Pardo M. The behavioral pharmacology of effort-related choice behavior: dopamine, adenosine and beyond. J Exp Anal Behav. 2012 Jan

Watson GDR, Hughes RN, Petter EA, Fallon IP, Kim N, Severino FPU, Yin HH. Thalamic projections to the subthalamic nucleus contribute to movement initiation and rescue of parkinsonian symptoms. Sci Adv. 2021 Feb 5

18: Neuroscience issues with proto-striatum

The previous proto-striatum model is flawed because it focused too much on sensory input and not enough on action efferent copies. To fix this focus, the model can use midbrain locomotive region (MLR) actions as a bias selector.

Recall that the simulation needed the striatum to solve an action jitter problem by introducing a win-stay bias. Once the animal turns left, it should bias toward continued left turns. Before the fix, the animal randomly chose a direction every 50ms, reversing itself, causing problems in avoiding corners and obstacles. The simulation problem was an action-selection problem not a sensor problem.

In the vertebrate striatum, action feedback comes from the MLR via the parafascicular thalamus (T.pf). The T.pf connection to the striatum is unique, both in its targeting of striatal interneurons (S.cin and S.pv), but also for its connection to the medium spiny projection neurons (S.spn), the main striatal neurons [Ragu et al. 2006]. T.pf connects directly to S.spn dendrites, not merely the spines as with other inputs. This direct connection potentially gives a stronger stimulus, and its uniqueness suggests it may be an older, more primitive connection.

Action-focused striatum model

So, I’m changing the striatum model to follow an action focus. After an action fires the motor command neurons (B.rs reticulospinal), the MRL sends an efferent copy of the motor command to the striatum via T.pf.

Action feedback model for proto-striatum. B.rs reticulospinal motor command, MLR midbrain locomotive region, Ob olfactory bulb, Snc substantia nigra pars compacta, S.pv striatal parvalbumen interneuron, S.spn spiny projection neuron.

In the above diagram, the main sensor path is still from the olfactory bulb (Ob) to the substantia nigra pars compacta (Snc / posterior tuberculum) and then to MLR, basically a stimulus-response path. A previous action biases the sensory path for the next action by activating a corresponding S.spn, which disinhibits Snc, making the next sensory input more powerful.

Comparison with the previous model

As a comparison, the following diagram shows the previous striatal model. Unlike the new model, the final selected action didn’t bias the next action because there was no feedback connection. (The reset signal to S.pv is a different circuit, and doesn’t bias the decision because it applies to all choices equally.)

sense-focused proto-striatum model.
Previous photo-striatum, where a prior selected sense biased the next sense. B.ss somatosensory touch.

In addition, the sensory input must coordinate striatal disinhibition via S.spn with its excitation of the Snc action. Although not impossible evolutionarily, the double coordination required makes it less likely. The new model not only incorporates the action but simplifies the sensor circuit.

Parafascicular thalamus

For personal reference, here’s a summary of the T.pf connections [Smith et al. 2022].

Connections of the parafascicular thalamus.

Essentially all the T.pf inputs are motor efference copies and all the T.pf outputs are to the basal ganglia. Inputs include the following areas: vision/optic motor (OT and pretectum), midbrain locomotive region (MLR, M.pag, V.ppt, V.ldt), diencephalon locomotive region (H.zi), consummatory action (B.bp), forebrain attention (P.bf) and cortical action (C.fef, C.moss, C.gu). The cingulate cortex might be unusual (C.cc), although it also has motor areas.

Striatum as attention

Attention is a difficult topic, in part because it’s used in so many diverse ways that the word is often more confusing than helpful [Hommel et al. 2019], [Krauzlis et al. 2014]. However, I think it’s interesting that the action-based striatum model looks like selective attention.

Simplification of proto-striatum showing resemblance to selective attention.

When a left action biases the next action to stay the same, its mechanism is to enhance the sensory path, as if it’s paying attention more to one side than another.

Engineering feedback: dopamine mistake

When implementing this idea, the simulation doesn’t need dopamine feedback. Instead of forcing the dopamine just because the basal ganglia has dopamine feedback I’m taking it out from the model. Since I’ve only implemented a prototype portion of the basal ganglia, this may be okay instead of a fatal flaw. When the full model arises, we’ll see if this is a mistake.

Actual simulation implementation, removing dopamine and reset feedback.

Notice that the only dopamine in this model is descending, with no ascending dopamine [Ryczko and Dubuc 2017].

References

Hommel B, Chapman CS, Cisek P, Neyedli HF, Song JH, Welsh TN. No one knows what attention is. Atten Percept Psychophys. 2019 Oct

Krauzlis RJ, Bollimunta A, Arcizet F, Wang L. Attention as an effect not a cause. Trends Cogn Sci. 2014 Sep;18(9):457-64

Raju DV, Shah DJ, Wright TM, Hall RA, Smith Y. Differential synaptology of vGluT2-containing thalamostriatal afferents between the patch and matrix compartments in rats. J Comp Neurol. 2006 Nov 10

Ryczko D, Dubuc R. Dopamine and the Brainstem Locomotor Networks: From Lamprey to Human. Front Neurosci. 2017 May 26

Smith JB, Smith Y, Venance L, Watson GDR. Thalamic Interactions With the Basal Ganglia: Thalamostriatal System and Beyond. Front Syst Neurosci. 2022 Mar 25

18: Engineering issues with proto-striatum

The planned striatum model of essay 17 quickly runs into simulation problems because it’s missing priority selection between avoiding obstacles and seeking food. Obstacle avoidance needs a higher priority than seeking an odor plume, but a naive striatum doesn’t support that priority.

Broken striatum model where toward and away have no priority. Ob olfactory bulb, B.ss somatosensory touch, B.rs reticulospinal motor command.

This model fails because this striatum has no priority of away (avoid) actions from toward (approach) actions. An animal can’t simply follow an odor blindly, ignoring obstacles, but this model doesn’t support that priority.

Tectum

Adding the tectum seems like the right solution, although I was planning on putting it off until dealing with vision.

The tectum (optic tectum / superior colliculus) is better known for its vision support, but the deeper tectum layers are a general action-decision system. At its lower levels near periaqueductal gray (M.pag) it has a topographic direction-based map on its intermediate level and an action-based map in the deep level.

The tectum and M.pag are neighbors, almost layers of each other, and in animals like the frog, the M.pag is as a deeper layer of the tectum.

Relation between M.pag and OT in mammals (left) and frog (right), where the ventricle shape determines the anatomical label for homologous areas.

The tectum is an action organizer, not just a vision organizer. For the simulation, the action matters since the simulated animal doesn’t have vision.

Amphioxus, a non-vertebrate chordate that’s a model into pre-vertebrate evolution, has a few motor-related cells with the same genetic markers as the tectum [Pergner et al. 2020]. It’s conceivable that the amphioxus tectum is more action focused, since the amphioxus frontal eye is only a dozen photoreceptors with no lens.

Action categories

The tectum has split circuits for turning and for approach and avoid [Wheatcroft et al. 2022]. The simulation can use something like the following circuit.

Split tectum and striatum circuit. B.rs reticulospinal motor command, B.ss somatosensory input, M.lr midbrain locomotor region, M.pag periaqueductal gray, Ob olfactory bulb, S.d dorsal striatum, S.ot olfactory tubercle.

Approach (toward) senses like food odors excited toward actions, and avoidant (away) sense like touch excite away actions. Because the priority areas are split, each striatum can choose between non-priority options (left vs right). The priority resolves only later in the midbrain locomotor region, using context input to decide which major direction to use. In this split model, the simplified striatum circuit can work because all of striatum options are equal priority.

As a note on accuracy, the diagram misrepresents the actual olfactory path, specifically the real olfactory tubercle. In reality, olfaction has a distant, complicated path to the tectum.

Short-cut escape signal

The previous diagram is also misleading because it’s too organized, as if each function has a dedicated, planned circuit. Although the tectum itself is highly-organized, the downstream and modulating circuits are more ad hoc. For example, the zebrafish has an escape mechanism that short-cuts the tectum and drives the B.rs command motor directly [Zwaka et al. 2022].

fast escape shortcut of tectal locomotion circuit.
Fast escape shortcut of tectum-mediated locomotion.

In the above diagram, the escape circuit short-circuits any decisions of the tectum and striatum. Relatedly, the “switch” area in M.lr isn’t as tidy as the diagram suggests. It’s more like that M.lr contains multiple actions which laterally inhibit each other in a priority scheme, modulated by M.pag.

As an additional correct, many of the modulators like M.pag affect the tectum directly, instead of the diagram’s dedicated priority-resolution function.

References

Pergner J, Vavrova A, Kozmikova I, Kozmik Z. Molecular Fingerprint of Amphioxus Frontal Eye Illuminates the Evolution of Homologous Cell Types in the Chordate Retina. Front Cell Dev Biol. 2020 Aug 4

Wheatcroft T, Saleem AB, Solomon SG. Functional Organisation of the Mouse Superior Colliculus. Front Neural Circuits. 2022 Apr 29

Zwaka H, McGinnis OJ, Pflitsch P, Prabha S, Mansinghka V, Engert F, Bolton AD. Visual object detection biases escape trajectories following acoustic startle in larval zebrafish. Curr Biol. 2022 Dec 5

Essay 18: Proto-striatum

A problem with essay 17 was the lack of action stickiness, which became a problem for avoiding obstacles. When the animal hits an obstacle head-on, both touch sensors fire and the animal chooses a direction randomly. Because the decision repeats every tick (30ms) and chooses randomly to break ties, the animal flutters between both choices and remains stuck until enough random choices are in the same direction to escape the obstacle. What’s needed is a stick choice system to keep a direction once it’s selected. In some decision studies, this is a “win-stay” capability.

A previous essay solved this issue with muscle-based timing or a dopamine-based system, but some of the theories of the striatum function suggest it might solve the problem. The core idea uses the dopamine as a feedback enhancer to sway choice to “stay.”

Simplified proton-striatum circuit for “win-stay.” B.ss somatosensory (touch), B.rs reticulospinal motor control, M.lr midbrain locomotive region, S.pv parvalbumin GABA inhibitory interneuron, Snc substantia nigra pars compacta, S.spn striatum spiny projection neuron (aka medium spiny neuron), ACh acetylcholine, DA dopamine.

The circuit is intended not as the full vertebrate basal ganglia, but a possible core function for a pre-vertebrate animal in the early Cambrian. The circuit here represents only the direct path and specifically only the striostome (patch) circuit, and only represents the downstream connections, and ignores the efferent copy and upstream enhancements. Despite being simplified, I think it’s still to complicated as a single evolutionary step.

Simplified proto-circuit

If that simplified striatal circuit is too complicated for an evolutionary step, but lateral inhibition is a reasonable circuit.

Simplified photo-circuit with lateral inhibition.

The above simplified circuit is a simple lateral inhibition circuit with an added reset function from the motor region.

The main path is through the somatosensory touch (B.ss), through the substantia nigra pars compacta (Snc – posterior tubuculum in zebrafish) to the midbrain locomotive region (M.lr). [Derjean et al. 2010] traced a similar path for olfactory information. I’m just replacing odor with touch.

The reset function might be a simple efferent copy from the central pattern generator for timing. In a swimming animal like an eel, the spinal cord controls the oscillation of body undulation, moving the animal forward. Because the cycle is periodic, when the motor system fires at a specific phase such as an initial-segment muscle twitch, it can send a copy of the motor signal upstream as an efferent copy. That signal is periodic, clock-like, something like the theta oscillation in vertebrates, and upper layers can use that clock.

Zebrafish larva swim in discrete bouts, each on the order of 500ms to 2sec. Since the specific mechanism that organizes bouts isn’t known, any model is just a guess, but might motivate some of the striatal circuitry. Specifically, the acetylcholine (ACh) path in the striatum. The motor swimming clock could break movement into bouts with a reset signal.

Since the sense to Snc to M.lr is a known circuit [Derjean et al. 2010], lateral inhibition is a common circuit, and motor efferent copy of central pattern oscillation is also common, this simplified circuit seems like a plausible evolutionary step.

Improved circuit

Some problems in the simplified circuit lead to improvements in the full circuit. The simplified circuit is susceptible to noise, leading to twitchy behavior, because sensors and nerves are noisy. Secondly, when two options compete, a weaker signal might win the competition if it arrives first. An accumulator system that averages the signals will give better comparisons.

To improve the decisions, the new circuit adds a single pair of inhibition neurons, specializes the existing neurons, and changes the connections.

Circuit improving noise and decision.

To improve decision making, the S.spn neurons are now accumulators, averaging inputs over 100ms or so, just long enough to reduce noise without harming response time too much. As an implementation detail, the S.spn neurons might either accumulate calcium (Ca) itself, or a partner astrocyte might accumulate Ca.

To improve noise behavior, the added Snc inhibition neurons tonically inhibit the Snc neurons, so a stray signal from B.ss to Snc won’t inadvertently trigger the action before the decision. The dual inhibition is a slightly complicated circuit which reduces noise because an active path (disinhibited) has only sense inputs; the modulatory signals are taken away.

The dopamine feedback has the benefit of being a modulator instead of a pure feedback signal. Because it’s a multiplicative modulator, dopamine doesn’t trigger the cycle itself. When the signal ends, the dopamine feedback doesn’t continue a ghost reverberation signal.

Choice decisions: drift diffusion

Psychologists, economists, and neuroscientists have several useful models for decision making, primarily deriving from the drift diffusion model [Ratcliff and McKoon 2008], which extends a random walk model to decision-making. While most of the research appears to be centered on visual choice in the cortical (C) visual system, such as the lateral intraparietal area (C.lip), the concepts are general and the circuits simple, which could apply to many neural circuits, even outside of the mammalian cortex.

Drift-diffusion is a variation of a random walk. Each new datum adds a vector to an accumulator, walking a step, until the result crosses a threshold.

Circuits for leaky competing accumulator (LCA) and feed-forward models of two-choice decision.

One simple model is the leaky competing accumulator (LCA) of [Usher and McClelland 2001], where each choice has an accumulator, and the accumulators inhibit each other laterally. Another model use feedforward inhibition instead of lateral inhibition, where each sense inhibits its competitors. For this essay, these models seem a good, simple options for the simulation.

In the context of the striatum, [Bogacz and Gurney 2007] analyze the basal ganglia and cortex as a choice-based decision system. They interpret the direct path (S.d1) as the primary accumulator, and the indirect path (S.d2 / P.ge / H.stn) as feed-forward inhibition. They suggest that the basal ganglia could produce near-optimal decision in the two-choice task.

References

Bogacz R, Gurney K. The basal ganglia and cortex implement optimal decision making between alternative actions. Neural Comput. 2007;19:442–477

Derjean D, Moussaddy A, Atallah E, St-Pierre M, Auclair F, Chang S, Ren X, Zielinski B, Dubuc R. A novel neural substrate for the transformation of olfactory inputs into motor output. PLoS Biol. 2010 Dec 21

Ratcliff, R., & Childers, R. (2015). Individual differences and fitting methods for the two-choice diffusion model of decision making. Decision, 2(4), 237.

Usher, M., & McClelland, J. L. (2001). On the time course of perceptual choice: The leaky competing accumulator model. Psychological Review, 108, 550–592.

Wang, X.-J. (2002). Probabilistic decision making by slow reverberation in cortical circuits. Neuron, 36, 1–20.

Powered by WordPress & Theme by Anders Norén