Attempting a toy model of vertebrate understanding

Category: Uncategorized Page 3 of 5

Essay 22: Subthalamic Nucleus

After essay 21 changed the animal’s default movement to a Lévy exploration, it’s immediate to ask whether that random search is a full action, just like a seek turn or an avoid turn. An if exploration is a controlled action, then the model needs to treat exploration as a full action, like approach or avoid.

Exploration as a full locomotive system at the level of approach and avoid.

[Cisek 2020] identifies a vertebrate system for exploration, including the hippocampus (E.hc) and its associated nuclei such as the retromammilary hypothalamus (H.rm aka supramammilary). Essay 22 considers the idea of treating the subthalamic nucleus (H.stn) as part of the exploration circuit.

Subthalamic nucleus

H.stn is a hypothalamic nucleus from the same area as H.rm, which is part of the hippocampal theta circuit, which synchronizes exploration and spatial memory and learning. However, H.stn is part of the basal ganglia and not directly connected with the exploration system.

[Watson et al. 2021] finds a locomotive function of H.stn, where specific stimulation by the parafascicular thalamus (T.pf) to H.stn starts locomotion. If the stimulation is one-sided, the animal moves forward with a wide turn to the contralateral side. T.pf includes efference copies of motor actions from the MLR as well as from other midbrain actions.

Locomotion induced in the H.stn by T.pf stimulation. H.stn sub thalamic nucleus, T.pf parafascicular nucleus, MLR midbrain locomotor region.

For essay 22, let’s consider the H.stn locomotion as exploration. Since H.stn is part of the basal ganglia, the bulk of essay 22 is considering how exploration might fit into the proto-striatum model of essay 18.

Striatal attention and persistence

Since the current essay simulation animal is an early Cambrian proto-vertebrate, it doesn’t have a full basal ganglia. Evolutionarily, the full basal ganglia architecture could not have sprung into being fully formed; it must have developed in smaller step. Following a hypothetical evolutionary path, the essays are only implementing a simplified striatal model, adding features step-by-step. Unfortunately, because there’s no living species with a partial basal ganglia — all vertebrates have the full system — the essay’s steps are pure invention.

The initial striatum of essay 18 was a partial solution to a simulation problem: persistence. When the animal hit a wall head on, activating both touch sensors, it would choose randomly left or right, but because the simulation is real-time not turn-based, at the next tick both sensors remained active and the animal would choose randomly again, jittering at the wall until enough turns of the same direction escaped the barrier.

proto-striatum circuit for persistence by attention.
Proto-striatum for persistence by attention. Action feedback biases the choice to the last option: win-stay. B.rs reticulospinal motor command, Ob olfactory bulb, MLR midbrain locomotor region, Snc substantia nigra pars compacta (posterior tuberculum).

The main sense-to-action path is from the olfactory bulb (O.b) through the substantia nigra (Snc aka posterior tuberculum in zebrafish) to the midbrain locomotor region (MLR) and to the reticulospinal motor command neurons (B.rs), following the tracing and locomotive study of [Derjean et al. 2010] in zebrafish and Vta/Snc control of locomotion in [Ryczko et al. 2017]. The proto-striatum circuit is built around that olfactory-seeking circuit, acting persistent attention.

The proto-striatal model uses an efference copy of the last action from the MLR to bias the choice of the next action via a MLR to T.pf to striatum path. The model biases the choice through removing inhibition of the odor to action path. If the last action as left, the left odor is disinhibited, making it more likely to win.

The striatal system uses disinhibition for noise reasons. [Cohen et al. 2009] studied attention in the visual system and found that attention removed coherent noise by removing inhibition. By removing inhibition, the attended circuit is less affected by the controlling circuit’s noise.

Note: essay 19 considered an alternative solution to the attention issue by following the nucleus isthmi system in zebrafish as studied in [Grubert et al. 2006], where the attention to the win-stay odor used acetylcholine (ACh) amplification to bias the choice.

Striatal columns: approach and avoid

An immediate difficulty with the simple proto-striatal model is the lack of priority. Although left vs right have equal priority, avoiding a predator is more important than seeking a potential food source. Unfortunately, the proto-striatum treats all options equally. As a solution, essay 18 split the striatum into columns, where each column resolves an internal conflict without priority (“within-system”) and the columns are compared separately (“between-systems”), where “within-system” and “between-system” are from [Cisek 2019].

Proto-striatum columns for maintaining attention.
Dual striatum column for approach and avoid, where MLR resolves the final conflict. B.rs reticulospinal command neuron, B.ss somatosensory (touch), MLR midbrain locomotive region, M.pag periaqueductal gray, Ob olfactory bulb, S.ot olfactory tubercle, S.d dorsal striatum.

Subthalamic nucleus and exploration

If we now treat exploration as a distinct action system, then it needs its own control system and column in the proto-striatum. The within-system choice for exploration is the left and right turns for a random walk, and the between-system choices are between the exploration system and the odor-seeking system.

As a possible neural correlate of exploration, consider the sub thalamic nucleus (H.stn). The sub thalamic nucleus is derived from the hypothalamus, specifically from the same area as the retromammilary area (H.rm aka supramammilary), which is highly correlated with hippocamptal theta, locomotion and exploration.

[Watson et al. 2021] finds a locomotive function of H.stn, where specific stimulation by the parafascicular thalamus (T.pf) produces locomotion via the midbrain locomotive region (MLR). T.pf includes efference copies of motor actions from the MLR as well as other midbrain action efference copies. In the proto-striatum model, the feedback from MLR to striatum uses T.pf.

Exploration locomotive path through H.stn. H.stn sub thalamic nucleus, MLR midbrain locomotive region, T.pf parafascicular thalamus.

Seek and explore with dual striatal columns

Suppose the striatum manages both odor seeking (chemotaxis) and default exploration (Lévy walk). The two actions are conflicting with a complex priority system. When a food odor first appears, the animal should seek toward it (priority to seek), but if no food exists the animal should resume exploration (priority to explore). To resolve the between-system conflict, the two strategies need to columns with lateral inhibition to ensure that only one is selected.

Dual striatum columns for seek and explore strategies. B.rs reticulospinal motor command, H.stn sub thalamic nucleus, Ob olfactory bulb, P.ge globus pallidus external, S.d1 direct striatum projection, S.d2 indirect striatum projection, Snc substantia nigra pars compacta, Snr substantia nigra pars reticulata.

Selecting the seek column enables the odor sense to MLR path, seeking the potential food odor. Selecting the explore column enables the H.stn to MLR path, randomly searching for food.

Note: the double inversion in both paths is to reduce neuron noise [Cohen et al. 2009]. Removing inhibition reduces noise, where adding excitation would add noise. In the essay stimulation, this double negation isn’t necessary.

Striatum with dopamine/habenula control

The previous dual column circuit isn’t sufficient for the problem, because it lacks a control signal to switch between exploit (seek) and explore. The striatum dopamine circuit might help this problem by bringing in the foraging implementation from essay 17.

A major problem in essay 17 was the tradeoff between persistence and perseverance in seeking an odor. Persistence ensures that seeking an odor will continue even when the intermittent. Perseverance is a failure mode where the animal never gives up, like a moth to a flame. As a model, consider using dopamine in the striatum as persistence or effort [Salamone et al. 2007], and control of dopamine by the habenula as solving perseverance with a give-up circuit.

Explore and exploit (seek) columns controlled by dopamine. H.l lateral hypothalamus, Hb.l lateral habenula, H.stn sub thalamic nucleus, MLR midbrain locomotive region, Ob olfactory bulb, P.em pre thalamic eminence, P.ge globus pallidus external, S.d1 striatum direct projection, S.d2 striatum indirect projection, Snc substantia nigra pars compacta, Snr substantia nigra pars reticulata.

The striatum uses two opposing dopamine receptors named D1 and D2. D1 is a stimulating modulator though a G.s protein path, and D2 is an inhibiting modulator through a G.i protein path. In the above diagram, high dopamine will activate the seek column via D1 and inhibiting the explore column via D2. Low dopamine inhibits the seek column and enables the explore column. So dopamine becomes an exploit vs explore controller.

In many primitive animals, dopamine is a food signal. In c.elegans the dopamine neuron is a food-detecting sensory neuron. In vertebrates, the hunger and food-seeking areas like the lateral hypothalamus (H.l) strongly influence midbrain dopamine neurons both directly and indirectly. Indirectly, H.l to lateral habenula (Hb.l) causes non-reward aversion [Lazaridis et al. 2019].

For the essay, I’m taking H.l as multiple roles (H.l is a composite area with at least nine sub-areas [Diaz et al. 2023]), both calculating potential reward (odor) via the H.l to Vta/Snc connection, and cost (exhaustion of seek task without success) via the H.l to Hb.l to Vta/Snc connection.

References

Cisek P. Resynthesizing behavior through phylogenetic refinement. Atten Percept Psychophys. 2019 Oct

Cisek P. Evolution of behavioural control from chordates to primates. Philos Trans R Soc Lond B Biol Sci. 2022 Feb 14

Cohen MR, Maunsell JH. Attention improves performance primarily by reducing interneuronal correlations. Nat Neurosci. 2009 Dec;12(12):1594-600.

Derjean D, Moussaddy A, Atallah E, St-Pierre M, Auclair F, Chang S, Ren X, Zielinski B, Dubuc R. A novel neural substrate for the transformation of olfactory inputs into motor output. PLoS Biol. 2010 Dec 21

Diaz, C., de la Torre, M.M., Rubenstein, J.L.R. et al. Dorsoventral Arrangement of Lateral Hypothalamus Populations in the Mouse Hypothalamus: a Prosomeric Genoarchitectonic Analysis. Mol Neurobiol 60, 687–731 (2023).

Gruberg E., Dudkin E., Wang Y., Marín G., Salas C., Sentis E., Letelier J., Mpodozis J., Malpeli J., Cui H. Influencing and interpreting visual input: the role of a visual feedback system. J. Neurosci. 2006;26:10368–10371

Lazaridis I, Tzortzi O, Weglage M, Märtin A, Xuan Y, Parent M, Johansson Y, Fuzik J, Fürth D, Fenno LE, Ramakrishnan C, Silberberg G, Deisseroth K, Carlén M, Meletis K. A hypothalamus-habenula circuit controls aversion. Mol Psychiatry. 2019 Sep

Ryczko D, Grätsch S, Schläger L, Keuyalian A, Boukhatem Z, Garcia C, Auclair F, Büschges A, Dubuc R. Nigral Glutamatergic Neurons Control the Speed of Locomotion. J Neurosci. 2017 Oct 4

Salamone JD, Correa M, Nunes EJ, Randall PA, Pardo M. The behavioral pharmacology of effort-related choice behavior: dopamine, adenosine and beyond. J Exp Anal Behav. 2012 Jan

Watson GDR, Hughes RN, Petter EA, Fallon IP, Kim N, Severino FPU, Yin HH. Thalamic projections to the subthalamic nucleus contribute to movement initiation and rescue of parkinsonian symptoms. Sci Adv. 2021 Feb 5

Essay 21: Syllables and Lévy walks

Previous essays used a simple default ballistic forward motion action: the animal moved forward until it hit an obstacle or encountered food or an odor plume. Furthermore, all actions in the essay were continuous: at every time-step direction or speed could change. The animal was slug-like. However, since vertebrate is oscillatory — swimming or walking — powered by central pattern generators (CPGs), vertebrate motor commands are not continuous but modifiable only on correct timing of the oscillatory cycle.

This essay 21 adds some more realism to the simulated animal by creating distinct action syllables, each of which runs to completion. In addition, the default movement is a Lévy walk to more efficiently search for food.

Zebrafish bouts and mouse modules

Zebrafish larva move in discrete bouts [Johnson et al. 2020], punctuated by pauses. Each bout is on the order of 200ms to 1000ms and consists of stereotyped movement like a forward swimming stroke, and several turn types possibly followed by forward motion. The neural source of the bout timing is not know, although basal ganglia (striatal) defects can also produce jerky motion instead of smoothly linked motion.

Mouse movement is also comprised of small modules [Wiltschko et al. 2016] on the order of 60 modules in open exploration, each module lasting 200ms to 500ms. Unlike the zebrafish larva, sequences of mouse modules are linked smoothly instead of in jerky bouts.

Simulation action syllables

To match the vertebrate action syllables, the essay simulation now has action syllables at the lowest layers. The action syllables are fixed programs that last for several simulation ticks. For example a forward left turn might take 10 simulation ticks, depending on the tick resolution. In other words, these syllables are more like real-time actions with limited time resolution, not turn-based actions like a board game.

Unlike the current action completes, the system ignores new neural commands. In the future, specialized commands like freeze or panic escape might override the current action, like fast escape of zebrafish that bypass normal motor logic. In theory the system could also incorporate mid-action modulation, such as adding power to a swimming stroke without altering the basic action.

The new syllables should add realism and also introduce complications that vertebrates need to solve, such as timing for sequences of syllables. The syllables also introduce issues because prior-action memory as modeled by the striatum and nucleus isthmi need to persist for a syllable, not just a simulation tick.

Lévy walks

Along with the syllable changes, essay 21 adds better default movement. Previously if the animal wasn’t approaching food or avoiding an obstacle, it would move forward ballistically. That simplification only worked because the simulation is a simple bounded box, but in nature animals have better default search strategies.

Brownian motion is a simple default. The animal turns a random direction, then moves forward a random (gaussian) distance. Brownian motion does well at searching a neighborhood and is better than the essay’s ballistic strategy, but it tends to get stuck in a small area. When food is scarce and patchy, the brownian strategy won’t move the long distances needed to effectively find a new patch. The Lévy walk improves on this strategy by moving long distances when the current neighborhood is already searched [Abe 2020].

Default random walk for the simulated animal.

A Lévy walk is fractal (“scale-free”): larger walks have the same search structure as neighborhood search. If described in probabilistic terms, the distance traveled looks like a power law:

P(len) = 1/ len ^ 2

Where the exponent 2 (alpha) can be generalized between 1 and 3. The bounds are because the exponent 1 or less is ballistic and exponent 3 or more is essentially brownian. If alpha is closer to 1, the search is wider and moving further, while an alpha is closer to 3, the search is closer and more local.

Central pattern generators, criticality, and chaos

Animals do seem to use Lévy walks [Kölzsch et al. 2015] and the source appears to be internally generated [Berni et al. 2012] as opposed to a response to a fractal environment. The internal source of the randomness is not well know, although central pattern generators appear to be a possibility [Sims et al. 2018], [Reynolds 2019].

Near a critical point, fractal patterns appear and can be efficient computationally [Abe 2020], and a relatively simple chaotic system with only two random variables and produce Lévy walks. More broadly, other brain areas such as the cortex may also use criticality or near criticality to improve computation and generate longer-lasting signals to solve the timing problem [Hidalgo et al. 2014]. The timing problem is that fast neurons are 10ms but behavior needs to be responsive on the order of seconds and minutes.

Vertebrate source of Lévy walks

Although there is some knowledge of the search circuitry in fruit flies [Berni et al. 2012], the vertebrate circuitry seems entirely unknown.

As a thought experiment, consider the midbrain theta circuitry as part of the exploration circuit [Cisek 2022], and therefore related to the Lévy walk. If the vertebrate search is generated by a combination of central pattern generators then the source should be in the hindbrain, near those CPGs.

Midbrain and hindbrain theta circuitry. B.ni nucleus incepts, B.pno nucleus pontus oralis, B.rs reticulospinal motoc neurons, B.vtn ventral tegmental nucleus, H.rm retromammilary (supramammilary), Hb.m medial habenula, M.ip interpeduncular nucleus, P.ms medial septum, Poa preoptic area, V.mr median raphe.

Theta clock cycles (4-12hz) in the brain are strongly correlated with exploratory movement, thought, and learning especially with the hippocampus (E.hc).

In the above, the hindbrain contains the chaotic search circuitry and generates theta, such as B.pno and B.vtn in conjunction with the reticulospinal motor command (B.rs). B.vtn is the forward movement analogue of the head direction nucleus B.dtn.

The medial septum (P.ms) is essentially the main theta clock for the hippocampus. Part of the theta cycle is generated internally to P.ms with spontaneous interaction of acetylcholine (ACh) and GABA inhibitory neurons, but exploration-related theta comes from the hypothalamic retromammilary (H.rm, aka supramammilary), which receives its theta from the hindbrain theta centers.

The movement restriction for theta depends on the median raphe (V.mr), which is one of the two main serotonin (5HT) centers. If V.mr is disabled, theta through H.rm always exists, not restricted to exploration. V.mr is in turn strongly influenced with the habenula (Hb.m) and interpeduncular (M.ip) complex.

Simulation

For now, I’m avoiding using criticality or chaotic variables in the simulation, although criticality is an interesting design to explore and very possibly how the vertebrate brain solves these problems.

The disadvantage is that the simulation would quickly become unclear and overcomplicated. While a few chaotic variables in the brainstem might be manageable, extending that idea to the striatum and cortex seems like it would become impenetrable. Since the purpose of the simulation is something like an executable thought experiment or executable diagram, and impenetrable simulation is defeating the purpose. Some alternatives like [Bartumeus and Levin 2008] fractal reorientation clocks might serve the purpose while remaining more clear.

References

Abe MS. Functional advantages of Lévy walks emerging near a critical point. Proc Natl Acad Sci U S A. 2020 Sep 29

Bartumeus F, Levin SA. Fractal reorientation clocks: Linking animal behavior to statistical patterns of search. PNAS. 2008

Berni J., Pulver S.R., Griffith L.C., Bate M. Autonomous circuitry for substrate exploration in freely moving Drosophila larvae. Curr. Biol. 2012

Berni J., Genetic dissection of a regionally differentiated network for exploratory behavior in Drosophila larvae. Curr. Biol. 25, 1319–1326 (2015).

Cisek P. Evolution of behavioural control from chordates to primates. Philos Trans R Soc Lond B Biol Sci. 2022 Feb 14

Hidalgo J, Grilli J, Suweis S, Muñoz MA, Banavar JR, Maritan A. Information-based fitness and the emergence of criticality in living systems. Proc Natl Acad Sci U S A. 2014 Jul 15

Kölzsch A., et al., Experimental evidence for inherent Lévy search behaviour in foraging animals. Proc. R. Soc. B Biol. Sci. 282, 20150424 (2015)

Maass W, Natschläger T, Markram H. Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Comput. 2002 Nov

Nurzaman SG, Matsumoto Y, Nakamura Y, Shirai K, Koizumi S, Ishiguro H. From Lévy to Brownian: a computational model based on biological fluctuation. PLoS One. 2011 Feb 3

Reynolds A. M., Current status and future directions of Lévy walk research. Biol. Open 7, bio030106 (2018)

Sims DW, Reynolds AM, Humphries NE, Southall EJ, Wearmouth VJ, Metcalfe B, Twitchett RJ. Hierarchical random walks in trace fossils and the origin of optimal search behavior. Proc Natl Acad Sci U S A. 2014 Jul 29

Sims D. W., Humphries N. E., Hu N., Medan V., Berni J., Optimal searching behaviour generated intrinsically by the central pattern generator for locomotion. eLife 8, e50316 (2019)

Wiltschko AB, Johnson MJ, Iurilli G, Peterson RE, Katon JM, Pashkovski SL, Abraira VE, Adams RP, Datta SR. Mapping Sub-Second Structure in Mouse Behavior. Neuron. 2015 Dec 16

Wolf S, Nicholls E, Reynolds AM, Wells P, Lim KS, Paxton RJ, Osborne JL. Optimal search patterns in honeybee orientation flights are robust against emerging infectious diseases. Sci Rep. 2016 Sep 12

Essay 20: Olfactory avoidance

Although the essays have implemented obstacle avoidance, they haven’t yet explored olfactory avoidance. Olfactory avoidance is distinct from obstacles, not just because obstacles have higher priority, but because the olfactory system is from an entirely different nervous system than the sensorimotor system. In the chimaeral brain theory [Tosches and Arendt 2013], bilaterian brains are composed of an apical nervous system (ANS) focused on chemo senses (olfactory external and hypothalamic internal), and a blastoporal nervous system (BNS) focused on sensorimotor control like obstacle avoidance.

Olfactory path

The paths for olfactory motion compared with obstacle motion shows the value of the chimaeral theory in making sense of the brain. Working backward from the midbrain locomotive region (MLR), the acetylcholine (ACh) MLR nuclei specialize: the pedunculopontine nucleus (M.ppt) supports the sensorimotor BNS, and the laterodorsal tegmental nucleus (M.ldt) supports the chemosensory ANS.

Sensor-locomotion paths: olfactory on top and somatosensory on bottom. B.ll lateral line, B.rs reticulospinal motor command, B.ss somatosensory, Hb.m medial habenula, M.ldt laterodorsal tegmental nucleus, M.ppt pedunculopontine nucleus, Ob.m medial olfactory bulb, OT tectum, R.vis visual input, ,Vta ventral tegmental area.

In the above diagram, food odors and warning odors use distinct paths to the MLR. Food odors from the olfactory bulb (Ob) pass through the ventral tegmental area (Vta – posterior tuberculum in zebrafish) to the MLR [Derjean et al. 2010]. Aversive odors like cadaverine pass through the medial habenula (Hb.m) to the M.ldt portion of the MLR [Stephenson-Jones et al. 2012]. The food and avoidance paths are distinct because hunger and satiety from the hypothalamus modulate the food path, while the avoidance path can pass through unmodulated. These olfactory locomotion paths correspond to the ANS.

Lamprey medial habenula path

All vertebrates share this basic architecture, including the lamprey, one of the most evolutionary-distant vertebrates. [Stephenson-Jones et al. 2012] traced the Hb.m circuit, showing that Hb.m inputs are from the olfactory path, the parapineal (light attraction), and an electron-sensory alarm to the interpeduncular nucleus (M.ip).

Lamprey olfactory warning path through the habenula to the MLR. M.ip interpeduncular nucleus.

The above diagram fills out the olfactory warning path. The interpeduncular nucleus is a key node in the avoidance circuit, and also key to locomotor-induced theta, and one of the two serotonin nodes. Mip has a major output to the serotonin areas: dorsal raphe (V.dr) and medial raphe (V.mr) and to the central grey (M.pag) [Quina et al. 2017] and M.ldt as well as structures associated with hippocampal (E.hc) theta [Lima et al. 2017].

Medial habenula behavior

In larval zebrafish, Hb.m supports olfactory avoidance [Choi et al. 2017], [Jeong et al. 2021], and light seeking [Zhang et al. 2017]. At least one study indicates that it may also affect food seeking [Chen et al. 2019]. The non-Ob input to Hb.m — the posterior septum (P.ps) — produce locomotion when stimulated [Ostu et al. 2018], suggesting that later evolved functionality maintains the original basal function.

In zebrafish, M.ip only projects to serotonin areas (V.dr and V.mr), not to dopamine or MLR areas. The lamprey connectivity suggests that the M.ip to M.ldt connection was lost in fish.

The Hb.m to M.ip connection is affected by nicotine. An interesting property is that low stimulation and high stimulation have opposite effects. Low stimulation uses glutamate connections and is attractive while high stimulation adds ACh and is aversive [Krishnan et al. 2014].

Developmental genetic notes

As an interesting aside, both Hb.m and avoidant layers of OT shared a genetic marker Brn3a (aka pou4f1) [Quina et al. 2009], [Fedtsova et al. 2008]. That marker also appears in the cerebellum’s inferior olive, trigeminal sensory areas, and the amphioxus motor LPN3 neuron [Bozzo et al. 2023].

M.ldt and M.ppt are sibling areas, deriving from the r1 rhombic lip [Machold et al. 2011].

Glutamate and GABA neurons in M.ip, Vta, and M.ldt all derive from r1 basal neurons [Lahti et al. 2016].

Locomotion switchboard

The addition of olfactory avoidance further complicates the switchboard combining the various locomotor streams, especially if the olfactory path uses serotonin as a modulator as opposed to a straight glutamate connection. Although I’ll probably use a fixed priority for essay 20, and as [Cisek 2022] notes, avoidance can be combined additively, at some point the switchboard will need more control, especially when essays add vision and consummatory actions.

References

Bozzo M, Bellitto D, Amaroli A, Ferrando S, Schubert M, Candiani S. Retinoic Acid and POU Genes in Developing Amphioxus: A Focus on Neural Development. Cells. 2023 Feb 14

Chen W-Y, Peng X-L, Deng Q-S, Chen M-J, Du J-L, Zhang B-B. Role of Olfactorily Responsive Neurons in the Right Dorsal Habenula-Ventral Interpeduncular Nucleus Pathway in Food-Seeking Behaviors of Larval Zebrafish. Neuroscience. 2019

Choi JH, Duboue ER, Macurak M, Chanchu JM, Halpern ME. Specialized neurons in the right habenula mediate response to aversive olfactory cues. Elife. 2021 Dec 8

Cisek P. Evolution of behavioural control from chordates to primates. Philos Trans R Soc Lond B Biol Sci. 2022 Feb 14

Derjean D, Moussaddy A, Atallah E, St-Pierre M, Auclair F, Chang S, Ren X, Zielinski B, Dubuc R. A novel neural substrate for the transformation of olfactory inputs into motor output. PLoS Biol. 2010 Dec 21

Fedtsova N, Quina LA, Wang S, Turner EE. Regulation of the development of tectal neurons and their projections by transcription factors Brn3a and Pax7. Dev Biol. 2008 Apr 1

Jeong YM, Choi TI, Hwang KS, Lee JS, Gerlai R, Kim CH. Optogenetic Manipulation of Olfactory Responses in Transgenic Zebrafish: A Neurobiological and Behavioral Study. Int J Mol Sci. 2021 Jul 3

Krishnan S, Mathuru AS, Kibat C, Rahman M, Lupton CE, Stewart J, Claridge-Chang A, Yen SC, Jesuthasan S. The right dorsal habenula limits attraction to an odor in zebrafish. Current Biology. 2014

Lahti L, Haugas M, Tikker L, Airavaara M, Voutilainen MH, Anttila J, Kumar S, Inkinen C, Salminen M, Partanen J. Differentiation and molecular heterogeneity of inhibitory and excitatory neurons associated with midbrain dopaminergic nuclei. Development. 2016 Feb 1

Lima LB, Bueno D, Leite F, Souza S, Gonçalves L, Furigo IC, Donato J Jr, Metzger M. Afferent and efferent connections of the interpeduncular nucleus with special reference to circuits involving the habenula and raphe nuclei. J Comp Neurol. 2017 Jul 1

Machold R, Klein C, Fishell G. Genes expressed in Atoh1 neuronal lineages arising from the r1/isthmus rhombic lip. Gene Expr Patterns. 2011 Jun-Jul

Otsu Y, Lecca S, Pietrajtis K, Rousseau CV, Marcaggi P, Dugué GP, Mailhes-Hamon C, Mameli M, Diana MA. Functional Principles of Posterior Septal Inputs to the Medial Habenula. Cell Rep. 2018 Jan 16

Quina LA, Wang S, Ng L, Turner EE. Brn3a and Nurr1 mediate a gene regulatory pathway for habenula development. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience. 2009

Stephenson-Jones M, Floros O, Robertson B, Grillner S. Evolutionary conservation of the habenular nuclei and their circuitry controlling the dopamine and 5-hydroxytryptophan (5-HT) systems. Proc Natl Acad Sci U S A. 2012 Jan 17

Tosches, Maria Antonietta, and Detlev Arendt. “The bilaterian forebrain: an evolutionary chimaera.” Current opinion in neurobiology 23.6 (2013): 1080-1089.

Zhang BB, Yao YY, Zhang HF, Kawakami K, Du JL. Left Habenula Mediates Light-Preference Behavior in Zebrafish via an Asymmetrical Visual Pathway. Neuron. 2017 Feb 22

Essay 19: Nucleus Isthmi

Essay 18 was trying to solve the problem of maintaining behavioral state. When a fast neuron synapse takes only 5ms, behavior that lasts seconds or minutes needs some circuit to sustain attention on the task. Essay 18 explored the striatum as a possible model to maintain behavior. In zebrafish, this problem is partial solved with a paired system consisting of the optic tectum (OT) and the nucleus isthmi (NI) [Gruberg et al. 2006].

Optic tectum

The optic tectum (OT – superior colliculus in mammals) is a midbrain action and sensor system that organizes vision, touch, sound, and action into retinotopic map like an air controller radar screen that activates only for important triggers. So, it’s not like the movie screen of primate vision, but is an action-oriented, sparse map that focuses on a few important items. In the larva zebrafish, the OT activates for hunting prey (paramecia) and avoiding obstacles and predators.

The OT itself has no persistence, When it detects potential prey and moves toward the prey, the OT doesn’t remember that it’s hunting or recall the previous location of the prey. Without enhancement, it forgets the pretend fails the hunt. The nucleus isthmi (NI – parabigeminal in mammals) provides that attention and persistence function [Henriques et al. 2019].

Nucleus isthmi circuit

The NI has a simple organization that is topologically, bidirectionally mapped to OT. The return signal from NI to OT is acetylcholine (ACh), which amplifies the sense input, biasing the next action to follow the previous action. Essentially this is a simple attention circuit that maintains consistent behavior.

Optic tectum and nucleus isthmi circuit as used in the essay 19 simulation.

In the diagram above, a left action sends an efference copy to the matching nucleus isthmi area, which can remember the activation for longer than the 5ms fast activation in the OT. In turn it sends an ACh modulator to amplify the left touch sensor, biasing the direction toward the same action.

For the essay simulation, the original problem was hitting an obstacle head-on, which triggered both left and right touch sensors, which then caused jitter as the animal randomly chose left and right without maintaining consistency. By adding an NI system, an initial left action would bias the left input sense to choose a next left action.

Acetylcholine attention system

As a speculation, or perhaps a mnemonic, this NI system where ACh enhances senses based on action might be a model for some attention mechanisms else were in the brain. NI is a sister nucleus to other ACh nuclei, specifically the parabrachial nucleus (B.pb) and the pedunculopontine nucleus (V.ppn), all developing from the same stem region near the isthmus. V.ppn is one of the major ACh attention nuclei and is part of the midbrain locomotive region (MLR). It seems plausible that V.ppn might share some organization with NI where its upstream ACh might support sense attention like the NI does for OT.

Engineering note

After implementing the nucleus isthmi support, both the proto-striatum and NI solve the jittering problem equally. The algorithms are slightly different — NI is a straight enhancement, while proto-striatum is a disinhibition with selection — but for the current complexity of the animal and environment, there’s no behavioral difference. Both proto-striatum and NI can be enabled simultaneously without interference problems.

References

Cui H, Malpeli JG. Activity in the parabigeminal nucleus during eye movements directed at moving and stationary targets. J Neurophysiol. 2003 Jun;89(6):3128-42. doi: 10.1152/jn.01067.2002. Epub 2003 Feb 26

Gruberg E., Dudkin E., Wang Y., Marín G., Salas C., Sentis E., Letelier J., Mpodozis J., Malpeli J., Cui H. Influencing and interpreting visual input: the role of a visual feedback system. J. Neurosci. 2006

Henriques PM, Rahman N, Jackson SE, Bianco IH. Nucleus Isthmi Is Required to Sustain Target Pursuit during Visually Guided Prey-Catching. Curr Biol. 2019 Jun 3

Marín G, Salas C, Sentis E, Rojas X, Letelier JC, Mpodozis J. A cholinergic gating mechanism controlled by competitive interactions in the optic tectum of the pigeon. J Neurosci. 2007 Jul 25

Motts SD, Slusarczyk AS, Sowick CS, Schofield BR. Distribution of cholinergic cells in guinea pig brainstem. Neuroscience. 2008 Jun 12;154(1):186-95. doi: 10.1016/j.neuroscience.2007.12.017. Epub 2008 Jan 28

18: Neuroscience issues with proto-striatum

The previous proto-striatum model is flawed because it focused too much on sensory input and not enough on action efferent copies. To fix this focus, the model can use midbrain locomotive region (MLR) actions as a bias selector.

Recall that the simulation needed the striatum to solve an action jitter problem by introducing a win-stay bias. Once the animal turns left, it should bias toward continued left turns. Before the fix, the animal randomly chose a direction every 50ms, reversing itself, causing problems in avoiding corners and obstacles. The simulation problem was an action-selection problem not a sensor problem.

In the vertebrate striatum, action feedback comes from the MLR via the parafascicular thalamus (T.pf). The T.pf connection to the striatum is unique, both in its targeting of striatal interneurons (S.cin and S.pv), but also for its connection to the medium spiny projection neurons (S.spn), the main striatal neurons [Ragu et al. 2006]. T.pf connects directly to S.spn dendrites, not merely the spines as with other inputs. This direct connection potentially gives a stronger stimulus, and its uniqueness suggests it may be an older, more primitive connection.

Action-focused striatum model

So, I’m changing the striatum model to follow an action focus. After an action fires the motor command neurons (B.rs reticulospinal), the MRL sends an efferent copy of the motor command to the striatum via T.pf.

Action feedback model for proto-striatum. B.rs reticulospinal motor command, MLR midbrain locomotive region, Ob olfactory bulb, Snc substantia nigra pars compacta, S.pv striatal parvalbumen interneuron, S.spn spiny projection neuron.

In the above diagram, the main sensor path is still from the olfactory bulb (Ob) to the substantia nigra pars compacta (Snc / posterior tuberculum) and then to MLR, basically a stimulus-response path. A previous action biases the sensory path for the next action by activating a corresponding S.spn, which disinhibits Snc, making the next sensory input more powerful.

Comparison with the previous model

As a comparison, the following diagram shows the previous striatal model. Unlike the new model, the final selected action didn’t bias the next action because there was no feedback connection. (The reset signal to S.pv is a different circuit, and doesn’t bias the decision because it applies to all choices equally.)

sense-focused proto-striatum model.
Previous photo-striatum, where a prior selected sense biased the next sense. B.ss somatosensory touch.

In addition, the sensory input must coordinate striatal disinhibition via S.spn with its excitation of the Snc action. Although not impossible evolutionarily, the double coordination required makes it less likely. The new model not only incorporates the action but simplifies the sensor circuit.

Parafascicular thalamus

For personal reference, here’s a summary of the T.pf connections [Smith et al. 2022].

Connections of the parafascicular thalamus.

Essentially all the T.pf inputs are motor efference copies and all the T.pf outputs are to the basal ganglia. Inputs include the following areas: vision/optic motor (OT and pretectum), midbrain locomotive region (MLR, M.pag, V.ppt, V.ldt), diencephalon locomotive region (H.zi), consummatory action (B.bp), forebrain attention (P.bf) and cortical action (C.fef, C.moss, C.gu). The cingulate cortex might be unusual (C.cc), although it also has motor areas.

Striatum as attention

Attention is a difficult topic, in part because it’s used in so many diverse ways that the word is often more confusing than helpful [Hommel et al. 2019], [Krauzlis et al. 2014]. However, I think it’s interesting that the action-based striatum model looks like selective attention.

Simplification of proto-striatum showing resemblance to selective attention.

When a left action biases the next action to stay the same, its mechanism is to enhance the sensory path, as if it’s paying attention more to one side than another.

Engineering feedback: dopamine mistake

When implementing this idea, the simulation doesn’t need dopamine feedback. Instead of forcing the dopamine just because the basal ganglia has dopamine feedback I’m taking it out from the model. Since I’ve only implemented a prototype portion of the basal ganglia, this may be okay instead of a fatal flaw. When the full model arises, we’ll see if this is a mistake.

Actual simulation implementation, removing dopamine and reset feedback.

Notice that the only dopamine in this model is descending, with no ascending dopamine [Ryczko and Dubuc 2017].

References

Hommel B, Chapman CS, Cisek P, Neyedli HF, Song JH, Welsh TN. No one knows what attention is. Atten Percept Psychophys. 2019 Oct

Krauzlis RJ, Bollimunta A, Arcizet F, Wang L. Attention as an effect not a cause. Trends Cogn Sci. 2014 Sep;18(9):457-64

Raju DV, Shah DJ, Wright TM, Hall RA, Smith Y. Differential synaptology of vGluT2-containing thalamostriatal afferents between the patch and matrix compartments in rats. J Comp Neurol. 2006 Nov 10

Ryczko D, Dubuc R. Dopamine and the Brainstem Locomotor Networks: From Lamprey to Human. Front Neurosci. 2017 May 26

Smith JB, Smith Y, Venance L, Watson GDR. Thalamic Interactions With the Basal Ganglia: Thalamostriatal System and Beyond. Front Syst Neurosci. 2022 Mar 25

18: Engineering issues with proto-striatum

The planned striatum model of essay 17 quickly runs into simulation problems because it’s missing priority selection between avoiding obstacles and seeking food. Obstacle avoidance needs a higher priority than seeking an odor plume, but a naive striatum doesn’t support that priority.

Broken striatum model where toward and away have no priority. Ob olfactory bulb, B.ss somatosensory touch, B.rs reticulospinal motor command.

This model fails because this striatum has no priority of away (avoid) actions from toward (approach) actions. An animal can’t simply follow an odor blindly, ignoring obstacles, but this model doesn’t support that priority.

Tectum

Adding the tectum seems like the right solution, although I was planning on putting it off until dealing with vision.

The tectum (optic tectum / superior colliculus) is better known for its vision support, but the deeper tectum layers are a general action-decision system. At its lower levels near periaqueductal gray (M.pag) it has a topographic direction-based map on its intermediate level and an action-based map in the deep level.

The tectum and M.pag are neighbors, almost layers of each other, and in animals like the frog, the M.pag is as a deeper layer of the tectum.

Relation between M.pag and OT in mammals (left) and frog (right), where the ventricle shape determines the anatomical label for homologous areas.

The tectum is an action organizer, not just a vision organizer. For the simulation, the action matters since the simulated animal doesn’t have vision.

Amphioxus, a non-vertebrate chordate that’s a model into pre-vertebrate evolution, has a few motor-related cells with the same genetic markers as the tectum [Pergner et al. 2020]. It’s conceivable that the amphioxus tectum is more action focused, since the amphioxus frontal eye is only a dozen photoreceptors with no lens.

Action categories

The tectum has split circuits for turning and for approach and avoid [Wheatcroft et al. 2022]. The simulation can use something like the following circuit.

Split tectum and striatum circuit. B.rs reticulospinal motor command, B.ss somatosensory input, M.lr midbrain locomotor region, M.pag periaqueductal gray, Ob olfactory bulb, S.d dorsal striatum, S.ot olfactory tubercle.

Approach (toward) senses like food odors excited toward actions, and avoidant (away) sense like touch excite away actions. Because the priority areas are split, each striatum can choose between non-priority options (left vs right). The priority resolves only later in the midbrain locomotor region, using context input to decide which major direction to use. In this split model, the simplified striatum circuit can work because all of striatum options are equal priority.

As a note on accuracy, the diagram misrepresents the actual olfactory path, specifically the real olfactory tubercle. In reality, olfaction has a distant, complicated path to the tectum.

Short-cut escape signal

The previous diagram is also misleading because it’s too organized, as if each function has a dedicated, planned circuit. Although the tectum itself is highly-organized, the downstream and modulating circuits are more ad hoc. For example, the zebrafish has an escape mechanism that short-cuts the tectum and drives the B.rs command motor directly [Zwaka et al. 2022].

fast escape shortcut of tectal locomotion circuit.
Fast escape shortcut of tectum-mediated locomotion.

In the above diagram, the escape circuit short-circuits any decisions of the tectum and striatum. Relatedly, the “switch” area in M.lr isn’t as tidy as the diagram suggests. It’s more like that M.lr contains multiple actions which laterally inhibit each other in a priority scheme, modulated by M.pag.

As an additional correct, many of the modulators like M.pag affect the tectum directly, instead of the diagram’s dedicated priority-resolution function.

References

Pergner J, Vavrova A, Kozmikova I, Kozmik Z. Molecular Fingerprint of Amphioxus Frontal Eye Illuminates the Evolution of Homologous Cell Types in the Chordate Retina. Front Cell Dev Biol. 2020 Aug 4

Wheatcroft T, Saleem AB, Solomon SG. Functional Organisation of the Mouse Superior Colliculus. Front Neural Circuits. 2022 Apr 29

Zwaka H, McGinnis OJ, Pflitsch P, Prabha S, Mansinghka V, Engert F, Bolton AD. Visual object detection biases escape trajectories following acoustic startle in larval zebrafish. Curr Biol. 2022 Dec 5

Essay 18: Proto-striatum

A problem with essay 17 was the lack of action stickiness, which became a problem for avoiding obstacles. When the animal hits an obstacle head-on, both touch sensors fire and the animal chooses a direction randomly. Because the decision repeats every tick (30ms) and chooses randomly to break ties, the animal flutters between both choices and remains stuck until enough random choices are in the same direction to escape the obstacle. What’s needed is a stick choice system to keep a direction once it’s selected. In some decision studies, this is a “win-stay” capability.

A previous essay solved this issue with muscle-based timing or a dopamine-based system, but some of the theories of the striatum function suggest it might solve the problem. The core idea uses the dopamine as a feedback enhancer to sway choice to “stay.”

Simplified proton-striatum circuit for “win-stay.” B.ss somatosensory (touch), B.rs reticulospinal motor control, M.lr midbrain locomotive region, S.pv parvalbumin GABA inhibitory interneuron, Snc substantia nigra pars compacta, S.spn striatum spiny projection neuron (aka medium spiny neuron), ACh acetylcholine, DA dopamine.

The circuit is intended not as the full vertebrate basal ganglia, but a possible core function for a pre-vertebrate animal in the early Cambrian. The circuit here represents only the direct path and specifically only the striostome (patch) circuit, and only represents the downstream connections, and ignores the efferent copy and upstream enhancements. Despite being simplified, I think it’s still to complicated as a single evolutionary step.

Simplified proto-circuit

If that simplified striatal circuit is too complicated for an evolutionary step, but lateral inhibition is a reasonable circuit.

Simplified photo-circuit with lateral inhibition.

The above simplified circuit is a simple lateral inhibition circuit with an added reset function from the motor region.

The main path is through the somatosensory touch (B.ss), through the substantia nigra pars compacta (Snc – posterior tubuculum in zebrafish) to the midbrain locomotive region (M.lr). [Derjean et al. 2010] traced a similar path for olfactory information. I’m just replacing odor with touch.

The reset function might be a simple efferent copy from the central pattern generator for timing. In a swimming animal like an eel, the spinal cord controls the oscillation of body undulation, moving the animal forward. Because the cycle is periodic, when the motor system fires at a specific phase such as an initial-segment muscle twitch, it can send a copy of the motor signal upstream as an efferent copy. That signal is periodic, clock-like, something like the theta oscillation in vertebrates, and upper layers can use that clock.

Zebrafish larva swim in discrete bouts, each on the order of 500ms to 2sec. Since the specific mechanism that organizes bouts isn’t known, any model is just a guess, but might motivate some of the striatal circuitry. Specifically, the acetylcholine (ACh) path in the striatum. The motor swimming clock could break movement into bouts with a reset signal.

Since the sense to Snc to M.lr is a known circuit [Derjean et al. 2010], lateral inhibition is a common circuit, and motor efferent copy of central pattern oscillation is also common, this simplified circuit seems like a plausible evolutionary step.

Improved circuit

Some problems in the simplified circuit lead to improvements in the full circuit. The simplified circuit is susceptible to noise, leading to twitchy behavior, because sensors and nerves are noisy. Secondly, when two options compete, a weaker signal might win the competition if it arrives first. An accumulator system that averages the signals will give better comparisons.

To improve the decisions, the new circuit adds a single pair of inhibition neurons, specializes the existing neurons, and changes the connections.

Circuit improving noise and decision.

To improve decision making, the S.spn neurons are now accumulators, averaging inputs over 100ms or so, just long enough to reduce noise without harming response time too much. As an implementation detail, the S.spn neurons might either accumulate calcium (Ca) itself, or a partner astrocyte might accumulate Ca.

To improve noise behavior, the added Snc inhibition neurons tonically inhibit the Snc neurons, so a stray signal from B.ss to Snc won’t inadvertently trigger the action before the decision. The dual inhibition is a slightly complicated circuit which reduces noise because an active path (disinhibited) has only sense inputs; the modulatory signals are taken away.

The dopamine feedback has the benefit of being a modulator instead of a pure feedback signal. Because it’s a multiplicative modulator, dopamine doesn’t trigger the cycle itself. When the signal ends, the dopamine feedback doesn’t continue a ghost reverberation signal.

Choice decisions: drift diffusion

Psychologists, economists, and neuroscientists have several useful models for decision making, primarily deriving from the drift diffusion model [Ratcliff and McKoon 2008], which extends a random walk model to decision-making. While most of the research appears to be centered on visual choice in the cortical (C) visual system, such as the lateral intraparietal area (C.lip), the concepts are general and the circuits simple, which could apply to many neural circuits, even outside of the mammalian cortex.

Drift-diffusion is a variation of a random walk. Each new datum adds a vector to an accumulator, walking a step, until the result crosses a threshold.

Circuits for leaky competing accumulator (LCA) and feed-forward models of two-choice decision.

One simple model is the leaky competing accumulator (LCA) of [Usher and McClelland 2001], where each choice has an accumulator, and the accumulators inhibit each other laterally. Another model use feedforward inhibition instead of lateral inhibition, where each sense inhibits its competitors. For this essay, these models seem a good, simple options for the simulation.

In the context of the striatum, [Bogacz and Gurney 2007] analyze the basal ganglia and cortex as a choice-based decision system. They interpret the direct path (S.d1) as the primary accumulator, and the indirect path (S.d2 / P.ge / H.stn) as feed-forward inhibition. They suggest that the basal ganglia could produce near-optimal decision in the two-choice task.

References

Bogacz R, Gurney K. The basal ganglia and cortex implement optimal decision making between alternative actions. Neural Comput. 2007;19:442–477

Derjean D, Moussaddy A, Atallah E, St-Pierre M, Auclair F, Chang S, Ren X, Zielinski B, Dubuc R. A novel neural substrate for the transformation of olfactory inputs into motor output. PLoS Biol. 2010 Dec 21

Ratcliff, R., & Childers, R. (2015). Individual differences and fitting methods for the two-choice diffusion model of decision making. Decision, 2(4), 237.

Usher, M., & McClelland, J. L. (2001). On the time course of perceptual choice: The leaky competing accumulator model. Psychological Review, 108, 550–592.

Wang, X.-J. (2002). Probabilistic decision making by slow reverberation in cortical circuits. Neuron, 36, 1–20.

17: Issues on vertebrate seek

While implementing the basic model, some issues came up, including issues already solved in earlier essays.

What controls “give-up”?

The foraging task needs to give-up on a non-promising odor, ignore it, leave from the current place, and explore for a new odor. In an earlier essay, odor habituation implemented give-up. If the seek didn’t find the food within the habituation time, the sense would disappear, disabling the seek action.

Animal circling food with no ability to break free.

The perseveration problem can be solved in many ways, including the goal give-up circuit in essay 17 and the odor habituation in an earlier essay. One approach cuts the sensor; the other disables the action. But two solutions raises the question of more possible solutions, any or all of which might affect the animal.

  • Sense habituation (cutting sensor)
  • Habenula give-up (inhibit action)
  • Motivational state – hypothalamus hunger/satiety
  • Circadian rhythm – foraging at twilight
  • Global periodic reset – rest / sleep

Give-up or leave?

The distinction between giving-up and leaving is between abandoning the current action and switching to a new, overriding action. Although the effect is similar, the implementing circuit differs. In a leave circuit, after the give-up time, the animal would actively leave the current area (place avoidance). Assuming the leave action has a higher priority than seeking, then lateral inhibition would disable the seek action. In foraging vocabulary, does failure inhibit exploitation or does it encourage exploration?

Distinct circuits for give-up and leave to curtail a failed odor approach.

As the diagram above shows, this distinction isn’t a semantic quibble, but represents different circuits. In the give-up circuit, the quit decision either inhibits the olfactory seek input and/or inhibits the seek action. With seek disable, the default action moves the animal away from the failed odor. In the leave circuit, the quit decision activates a leave action, which moves the animal away from the failed place, inhibiting the seek action laterally.

Leave or avoid?

Leaving an area is a primitive action and is a requirement for foraging. However, neuroscience papers don’t generally study foraging, they study place avoidance from aversive stimuli, which raises a question. Since the physical action of leaving and aversive place avoidance is identical, do the two actions share circuits or are they distinct?

Distinct leave and avoid actions compared to shared locomotion.

In the avoid circuit, danger avoidance is distinct from food-seeking, only sharing at the lowest motor layers. In the leave circuit, exploration leaving and place avoidance share the same mid-locomotor action.

Slow and fast twitch swimming

[Lacalli 2012] explores the evolution of chordate swimming, inspired by a discovery of mid-Cambrian fossils, which suggest that fast-twitch muscles are a later addition to a more basal chordate swimming, possibly to escape from new Cambrian predators. The paper explores the non-vertebrate Amphioxus motor circuitry in like of the fossil, suggesting two distinct motor circuits: normal swimming and escape.

Slow and fast paths for normal swimming and fast predator escape.

In this model, higher layers are independent paths that only resolve at the lowest motor command neuron level (such as B.rs). For the foraging tasks, this model that leaving an explored area would use a different system from leaving a noxious area (place aversion), despite being the same underlying motion.

Serotonin as muscle gain-control

In the zebrafish, [Wei et al. 2014] studied serotonin in V.dr (dorsal raphe) as gain-control for muscle output, amplifying the effect of glutamate signals. When they inhibited 5HT (serotonin), the muscle only produced 40% of its maximal strength. Serotonin acted as a gain-control, a multiplicative signal that amplified glutamate signals, allowing for a broader dynamic range.

[Kawashima et al. 2016] investigated 5HT in the context of task-learning for muscle effort, where 5HT caches the real-time adjustment by the cerebellum and pretectal areas. When 5HT is disabled, the real-time system still adjusts the muscle effort, but it doesn’t remember the adjustment for future bouts. That study considers the 5HT neurons as leaky integrators of motor-gated visual feedback, where zebrafish gauge the success of swimming effort by visual motion. Notably, the neurons only store visual information when the fish is actively swimming, as an action-outcome integrator.

The two studies focused on opposite muscle effects, both increasing effort and decreasing effort. 5HT can either inhibit or excite depending on the receptor type, suggesting that 5HT shouldn’t be interpreted as representing a specific value, either positive or negative, but instead possibly carrying either value.

Taking these studies as analogies, it seem reasonable to consider V.dr as an action-outcome accumulator for future effort in the 10-30 seconds range, not specific to either positive or negative amplification. Of course, because serotonin has diverse effects in multiple circuits, reality is likely more complicated.

Serotonin zooplankton dispersal and learning

Many aquatic animals have a larval zooplankton stage, where the larva disperses from its spawn point for several days or weeks, then descends to the sea floor for its adult life. A small number of serotonin neurons signal the switch to descend. Essentially, this is a single explore/exploit pair.

Larva exploring in a dispersal stage, switching to descend to the sea floor for adult life.

Habenula function circuit

Essay 17 is running with the model of the habenula as central to the give-up/move-on circuit. The following is a straw man model of the habenula based on the above discussion of quitting, leaving and avoiding circuits. Because essay 17 has no learning or higher areas like the striatum, the diagram ignores any learning functionality. This diagram is for a hypothetical pre-stratal habenular function.

Odor-based locomotion using the habenula.

Note, this locomotion only includes odor-based navigation. The audio-visual-touch locomotion uses a different system based on the optic tectum. This dual-locomotive system may be the result of a bilaterian chimaera brain [Tosches and Arendt 2013].

The habenula connectivity and avoidance path is loosely based on [Stephenson-Jones et al. 2012] on the lamprey habenula connectivity. The seek path is loosely based on [Derjean et al. 2010] for the zebrafish.

In this model, Hb.m (medial habenula) is primarily a danger-avoidance circuit, and M.ipn (interpeduncular nucleus) is a place avoidance locomotive region. Hb.l (lateral habenula) is a give-up circuit that both inhibits the seek function (giving up) and excites the shared leave locomotor region, implementing the foraging exploit to explore decision. Here, place avoidance and exploratory leaving are treated as equivalent. As mentioned above, this diagram is mean to be a straw man or a thought experiment, because it’s easier to work with a concrete model.

References

Derjean D, Moussaddy A, Atallah E, St-Pierre M, Auclair F, Chang S, Ren X, Zielinski B, Dubuc R. A novel neural substrate for the transformation of olfactory inputs into motor output. PLoS Biol. 2010 Dec 21

Kawashima T, Zwart MF, Yang CT, Mensh BD, Ahrens MB. The Serotonergic System Tracks the Outcomes of Actions to Mediate Short-Term Motor Learning. Cell. 2016 Nov 3

Lacalli, T. (2012). The Middle Cambrian fossil Pikaia and the evolution of chordate swimming. EvoDevo, 3(1), 1-6.

Stephenson-Jones M, Floros O, Robertson B, Grillner S. Evolutionary conservation of the habenular nuclei and their circuitry controlling the dopamine and 5-hydroxytryptophan (5-HT) systems. Proc Natl Acad Sci U S A. 2012 Jan 17

Tosches, Maria Antonietta, and Detlev Arendt. The bilaterian forebrain: an evolutionary chimaera. Current opinion in neurobiology 23.6 (2013): 1080-1089.

Wei, K., Glaser, J.I., Deng, L., Thompson, C.K., Stevenson, I.H., Wang, Q., Hornby, T.G., Heckman, C.J., and Kording, K.P. (2014). Serotonin affects movement gain control in the spinal cord. J. Neurosci. 34

Essay 17: Proto-Vertebrate Locomotion

The locomotive model in essays 14 to 16 were non-vertebrate. Essay 17 takes the same problems, avoiding obstacles and seeking food, and with a model based on the vertebrate brain. Since these models are still Precambrian or early Cambrian, they don’t include the full vertebrate architecture, but try to find core components that might have been a basis for later vertebrate developments.

The animal is a slug-like creature with mucociliary forward movement, where propulsion is cilia or cilia-like and steering is muscular. This combination of slug-like motion and vertebrate brain is probably not evolutionary accurate, but it allows touch-based obstacle avoidance without the complications of vision of lateral-line senses.

The animal seeks food by following odor plumes, and avoids obstacles by turning away when touching them. The locomotion model includes the following components:

  • [Braitenberg 1984] navigation (simple crossed vs uncrossed signals for approach and avoid).
  • Obstacle avoidance with a direct touch-to-muscle circuit.
  • Odor-seeking with distinct “what” and “where” paths.
  • Perseveration fix with an explicit give-up circuit.
  • Motivation-state (satiety) control of odor-seeking (“why” path [Verschure et al. 2014])

Proto-vertebrate model

A diagram of the proto-vertebrate model, including analogous brain regions follows:

Proto-vertebrate locomotive model. Key: B.sp spinal motor, B.rs reticulospinal motor command (medial and lateral), B.ss spinal somatosensory, H.l lateral hypothalamus, Hb.l lateral habenula, M.lr midbrain locomotive region (M.ppt), Ob olfactory bulb, Snc substantia nigra pars compacta. DA dopamine, ACh acetylcholine.

For the sake of readability, the model simplifies the actual vertebrate midline crossing patterns, leaving only a single cross between B.rs (reticulospinal) and B.sp (spinal), which represents Braitenberg navigation.

In this model, obstacle avoidance is reflexive between B.ss (somatosensory touch) and B.rs. Odor navigation (“where”) flows through Snc (substantia nigra pars compacta) to M.lr (midbrain locomotive region). In the zebrafish, the Snc area is the posterior tuberculum, and the M.lr like represents M.ppn (pedunculopontine tegmental nucleus). The motivation-state (hunger or satiety) and “what” (food odor vs non-food) flow through H.l (lateral hypothalamus). The give-up circuit flows through Hb.l (lateral habenula).

Olfactory navigation path

[Derjean et al. 2010] traced a path in zebrafish from Ob (olfactory bulb) to the posterior tuberculum (mammal Snc) to the midbrain locomotive region (likely M.ppn), to the reticulospinal motor command neurons.

Zebrafish olfactory to motor path in [Derjean 2010].

A similar olfactory to motor path has been traced in lamprey by [Suryanarayana et al. 2021] and [Beausejour et al. 2021].

I’ve labeled this path as a “where” path, based on simulation requirements, but as far as I know, that label has no scientific basis.

The Snc / posterior tubuculum area includes descending glutamate and dopamine (DA) neurons, although the Snc is better known for its ascending dopamine path. Since [Ryczko et al. 2016] reports a mammalian descending glutamate and DA path from Snc to M.ppn, portions of this descending path appears to be evolutionarily conserved. The DA appears to be an effort boost, increasing downstream activity, but most of the activity is glutamate.

Braitenberg navigation

[Braitenberg 1986] vehicles are a thought experiment for simple circuits to implement approach and avoid navigation. In the original, the vehicles have two light-detection sensors connected to drive wheels. Depending on the connection topology, sign and thresholds, the simple circuits can implement multiple behaviors.

Braitenberg vehicles for approach and escape
Braitenberg circuits for approach and escape.

A circuit that combines the output of approach and avoid circuits with some lateral inhibition can implement both approach and avoidance with avoidance taking priority. In the essay simulation, if the animal touches a wall, it will turn away from the obstacle, temporarily ignoring any odor it might be following.

Circuit for obstacle avoidance and food approach for simulated slug.
Circuit for combined odor approach and touch obstacle avoidance.

Mammalian locomotion appears to use a similar circuit between the superior colliculus (OT – optic tectum) and the motor driving B.rs neurons [Isa et al. 2021]. This circuit pattern implies that approach and avoidance are separate behaviors, only reconciled at the end. For example, a punishing reinforces that increases avoidance is not simply the mirror image of a non-reward that decreases approach. The two reinforcers modify different circuits.

“What” path vs “where” path

The mammalian visual system has separate “what” and “where” paths. One path detects what object is in focus, and one path keeps track of where the object location is. This division between object decision and navigation has been useful in the simulation, because navigation details are quickly lost in the circuit when deciding what to do with an odor.

“What” and “where” paths as configuring a switchboard.

When an animal senses an odor, say a food odor, the animal needs to identify it as a food odor, decide if the animal is hungry or sated, and decide if there’s a higher-priority task. All that processing and decision can lost the fine timing phase and amplitude details needed for precise navigation. Gradient following, for example, needs fine differences in timing or amplitude to decide whether to turn left or right. By splitting the long, complicated “what” decision from the short, simple “where” location, the circuit can benefit from both.

[Cohn 2015] describes the fruit fly mushroom body as a switchboard, where dopamine neurons configure the path for olfactory senses to travel. In the context of “what” and “where”, the “what” path configures the switchboard and the “where” path follows the connected circuit.

Some odor-based navigation has a more extreme division between “what” and “where.” Following odor in water isn’t always gradient-based navigation, because odors form clumps instead of gradient plumes. Instead of following a gradient, the animal moves against the current toward the odor source. In that latter situation, the “where” path uses entirely different senses for navigation, using water flow mechanosensors, not olfactory sensors [Steele et al. 2023].

Navigation against current toward an odor plume.

The diagram above illustrates a food-searching strategy for some animals in a current, both water and air. In water, the current is more reliable for navigation than an odor gradient. When there’s no scent, the animal swims back and forth across the current. When it detects a food odor, it swims against the current. If it loses the odor, it will return to back and forth swimming. In this navigation type, entirely different senses drive the “what” and “where” paths.

Foraging and give-up time

Giving up is an essential part of goal-directed behavior. If an animal cannot ever give up, it will be stuck on the goal without escaping. In the context of foraging, the give-up time is optimized with the marginal value theorem [Charnov 1976], suggesting that an animal should move to another patch when its current reward-gaining rate drops below the average rate for the environment. Animal behavior researchers like [Kacelnik and Brunner 2002] have observed animals roughly following this theorem, although using simpler heuristics.

In more complex animals, the failure to give up can be pathological, such as psychological perseveration.

Odor-following state diagram including give-up timer.
Foraging state diagram illustrating the give-up timer

The give-up circuit needs some kind of internal timer or cost integrator, and a way to cancel the task. In this essay’s model, the lateral habenula (Hb.l) computes the give-up time or integrates the cost, and it cancels the task by suppressing the locomotive signal through Snc.

Habenula as a give-up circuit

Hb.l is positioned to act as a give-up circuit. It receives cost signals as non-rewarded bouts or as aversive events. [Stephenson-Jones et al. 2016] interprets the Hb.l input, P.hb (habenular-projecting pallidum), as evaluating action outcome. Hb.l can suppress both the midbrain dopamine and midbrain serotonin areas. In learned helplessness situations or depression, Hb.l is hyperactive [Webster et al. 2020], causing reduced activity.

Habenula circuit as a give-up mode in a locomotive circuit.

[Hikosaka 2012] suggests the habenula’s role as suppressing motor activity under aversive conditions, a role evolved from its close relationship to the pineal gland’s circadian scheduling.

In a review article, [Hu 2020] discusses the suppressive effects of the habenula, also remarking on its role as a reward-prediction error. In particular, noting that H.l (lateral hypothalamus) to Hb.l is aversive. The Hu article also notes that Hb.l knock-out abolishes the error signal from reward omission, not an error signal from aversive (shock or obstacles).

Once the threshold is crossed, the Hb.l to Snc signal produces behavioral avoidance, reduced effort and depressive-like behavior from learned helplessness. The Hb.l is the only brain area consistently hyperactive in animal models of depression.

Note, since this essay’s simulation is a non-learning behavioral model, the only “prediction” possible is an evolutionary intrinsically-attractive odor, and the only role for an error is giving up the current behavior. Here, I’m interpreting the H.l to Hb.l signal as a cost signal, integrated by Hb.l, that gives up when it crosses a threshold.

Vertebrate reference

For reference, here’s a functional model of the vertebrate brain.

Functional model of vertebrate brain.

The areas in this model cluster around the hindbrain isthmus divider. B.rs are hindbrain neurons near the isthmus. M.lr (M.ppn) are midbrain neurons that migrate from the hindbrain (r1) to the midbrain. Snc is the midbrain tegmental area (the V – value area), near the isthmus, and contiguous with M.ppn. Similarly the H.l area that projects to Snc is contiguous with it. The habenula is the most distant area, located above the thalamus near the pineal gland (not in the diagram as a simplification, but associated with the pallidum areas.) So, the areas discussed here are a small part of the entire brain, but interestingly clustered around the isthmus divider near the cerebellum.

Minimal viable straw man

I think it’s important to remember that the essay simulations are an engineering project not a scientific one. One difference is that the simulations necessary require decisions beyond science. Another difference is that the project needs a simple core that may not correspond to any evolutionary animal. For example, even simple animals have some rudimentary vision, if only two or three pigment spots. For another, learning centers like the mushroom body. And dealing with internal biological issues like breathing and blood pressure with motion.

This model in particular is more of a straw man or minimal viable product than an actual proposal for an ancestral proto-vertebrate mind. The model is intended to be a straw man, a target that might give a base framework to criticize or build on.

Alternative olfactory paths

Another potential “what” path for innate behavior goes through the medial habenula, which is responsive to odors and produces place avoidance [Amo et al. 2014], but [Chen et al. 2019] suggests it also supports attraction for food odors.

Olfactory innate path through habenula. Key: A.co cortical amygdala, H.l lateral hypothalamus, Hb habenula (medial and lateral), IPN interpeduncular nucleus, M.pag periaqueductal gray, Ob olfactory bulb.

In mammals, the olfactory path to H.l goes through the cortical amygdala (A.co) [Cádiz-Moretti et al. 2017]. While this essay is deliberately omitting the cortex, in the lamprey the olfactory path goes through the lateral pallium (LPa, corresponding to mammalian O.pir piriform cortex) to the posterior tubercular (Snc in mammals.)

For this essay, I’ve picked the Ob to Snc path instead of the alternatives for simplicity. The habenula path is very tempting, but would require exploring the IPN and serotonin (5HT) paths to the MLR, which is more complicated than a “what” path through H.l

Subthalamic nucleus as give-up circuit

The sub thalamic nucleus (H.stn) is associated with a “stop” action, stopping downstream motor actions, either because of a new, surprising stimulus, or from higher-level commands. Since a give-up signal stops the seek goal, the stop action from H.stn might play a part in the control

H.stn stop is in parallel to habenular give-up. Key: H.l lateral hypothalamus, H.stn subthalamic nucleus, Hb.l lateral habenula, M.lr midbrain locomotor region, Snc substantia nigra pars compacta, Snr substantia nigra part reticulata.

H.stn is believed to have a role in patience in decision making [Frank 2006] and in encoding reward and cost [Zénon et al. 2016], which is very similar to the role of the habenula, and H.stn projects to Hb.l via P.hb habenula-projecting pallidum.

However, the H.stn’s patience is more related to holding off (stopping) action before making a decision, related to impulsiveness, while the give-up circuit is more related to persistence, continuing an action. So, while the two capabilities are related, they’re different functions. Since current essay simulation does not have patience-related behavior arrest but does need a give-up time, the habenula seems a better fit.

Serotonin inhibition path

In zebrafish, the habenula inhibits the dorsal raphe (V.dr, serotonin neurons) but not Snc or dopamine [Okamoto et al. 2021]. The inhibition works through V.dr to the Snc/posterior tubuculum to the locomotive regions.

As with the alternative olfactory paths, this serotonin inhibition path may be more evolutionary primitive, but would add complexity to the essay’s model, so will be held off for later exploration.

Conclusions

As mentioned above, the purpose of this model is a basis for the current essay’s simulation, and as a straw man to focus alternatives to see if there might be a better minimal model.

References

Amo, Ryunosuke, et al. “The habenulo-raphe serotonergic circuit encodes an aversive expectation value essential for adaptive active avoidance of danger.” Neuron 84.5 (2014): 1034-1048.

Beauséjour PA, Zielinski B, Dubuc R. Olfactory-induced locomotion in lampreys. Cell Tissue Res. 2022 Jan

Braitenberg, V. (1984). Vehicles: Experiments in synthetic psychology. Cambridge, MA: MIT Press. “Vehicles – the MIT Press”

Cádiz-Moretti B, Abellán-Álvaro M, Pardo-Bellver C, Martínez-García F, Lanuza E. Afferent and efferent projections of the anterior cortical amygdaloid nucleus in the mouse. J Comp Neurol. 2017 

Charnov, Eric L. “Optimal foraging, the marginal value theorem.” Theoretical population biology 9.2 (1976): 129-136.

Chen, Wei-yu, et al. “Role of olfactorily responsive neurons in the right dorsal habenula–ventral interpeduncular nucleus pathway in food-seeking behaviors of larval zebrafish.” Neuroscience 404 (2019): 259-267.

Cohn R, Morantte I, Ruta V. Coordinated and Compartmentalized Neuromodulation Shapes Sensory Processing in Drosophila. Cell. 2015 Dec 17

Derjean D, Moussaddy A, Atallah E, St-Pierre M, Auclair F, Chang S, Ren X, Zielinski B, Dubuc R. A novel neural substrate for the transformation of olfactory inputs into motor output. PLoS Biol. 2010 Dec 21

Frank, Michael J. “Hold your horses: a dynamic computational role for the subthalamic nucleus in decision making.” Neural networks 19.8 (2006): 1120-1136.

Hikosaka, Okihide. The habenula: from stress evasion to value-based decision-making. Nature reviews neuroscience 11.7 (2010): 503-513.

Hu, Hailan, Yihui Cui, and Yan Yang. “Circuits and functions of the lateral habenula in health and in disease.” Nature Reviews Neuroscience 21.5 (2020): 277-295.

Isa, Tadashi, et al. “The tectum/superior colliculus as the vertebrate solution for spatial sensory integration and action.” Current Biology 31.11 (2021)

Kacelnik, Alex, and Dani Brunner. “Timing and foraging: Gibbon’s scalar expectancy theory and optimal patch exploitation.” Learning and Motivation 33.1 (2002): 177-195.

Okamoto H, Cherng BW, Nakajo H, Chou MY, Kinoshita M. Habenula as the experience-dependent controlling switchboard of behavior and attention in social conflict and learning. Curr Opin Neurobiol. 2021 Jun;68:36-43. doi: 10.1016/j.conb.2020.12.005. Epub 2021 Jan 6. PMID: 33421772.

Ryczko D, Cone JJ, Alpert MH, Goetz L, Auclair F, Dubé C, Parent M, Roitman MF, Alford S, Dubuc R. A descending dopamine pathway conserved from basal vertebrates to mammals. Proc Natl Acad Sci U S A. 2016 Apr 26

Steele TJ, Lanz AJ, Nagel KI. Olfactory navigation in arthropods. J Comp Physiol A Neuroethol Sens Neural Behav Physiol. 2023

Stephenson-Jones M, Floros O, Robertson B, Grillner S. Evolutionary conservation of the habenular nuclei and their circuitry controlling the dopamine and 5-hydroxytryptophan (5-HT) systems. Proc Natl Acad Sci U S A. 2012 Jan 17;109(3)

Stephenson-Jones M, Yu K, Ahrens S, Tucciarone JM, van Huijstee AN, Mejia LA, Penzo MA, Tai LH, Wilbrecht L, Li B. A basal ganglia circuit for evaluating action outcomes. Nature. 2016 Nov 10

Suryanarayana SM, Pérez-Fernández J, Robertson B, Grillner S. Olfaction in Lamprey Pallium Revisited-Dual Projections of Mitral and Tufted Cells. Cell Rep. 2021 Jan 5

Verschure PF, Pennartz CM, Pezzulo G. The why, what, where, when and how of goal-directed choice: neuronal and computational principles. Philos Trans R Soc Lond B Biol Sci. 2014 Nov 5

Webster JF, Vroman R, Balueva K, Wulff P, Sakata S, Wozny C. Disentangling neuronal inhibition and inhibitory pathways in the lateral habenula. Sci Rep. 2020 May 22

Zénon A, Duclos Y, Carron R, Witjas T, Baunez C, Régis J, Azulay JP, Brown P, Eusebio A. The human subthalamic nucleus encodes the subjective value of reward and the cost of effort during decision-making. Brain. 2016 Jun;139(Pt 6):1830-43.

16: Give-up Time in Foraging

The essay 16 simulation is a foraging slug that follows odors to food, which must give-up on an odor when the odor plume doesn’t have food. Foraging researchers treat the give-up time as a measurable value, in optimal foraging in the context of the marginal value theorem (MVT), which tells when an animal should give up [Charnov 1976]. This post is a somewhat disorganized collection of issues related to implementing the internal state needed for give up time.

Giving up on an odor

The odor-following task finds food by following a promising odor. A naive implementation with a Braitenberg vehicle circuit [Braitenberg 1984], as early evolution might have tried, has the fatal flaw that the animal can’t give up on an odor. The circuit always approaches the odor.

Braitenberg vehicles for approach and escape
Braitenberg vehicles for approach and avoid.

Since early evolution requires simplicity, a simple solution is adding a timer, possibly habituation but possibly a non-habituation timer. For example, a synapse LTD (long term depression) might ignore the sensor after some time. Or an explicit timer might trigger an inhibition state.

Odor-following state diagram including give-up timer.
State diagram for the odor-following task with give-up timer. Blue is stateful; beige is stateless.

In the diagram, the beige nodes are stateless stimulus-response transitions. The blue area is internal state required to implement the timers. This post is loosely centered around exploring the state for give-up timing.

Fruit fly mushroom body neurons

Consider a sub-circuit of the mushroom body, focusing on the Kenyon cell (KC) to mushroom body output neuron (MBON) synapses, and the dopamine neuron (DAN) that modulates it. For simplicity, I’m ignoring the KC fanout/fanin and ignoring the habituation layer between odor sensors and the KC, as if the animal was an ancestral Precambrian animal.

Give-up timing might be implemented either in the synapses in blue between the KC and MBON, or potentially in circuits feeding into the DAN. The blue synapses can depress over time (LTD) when receiving odor input [Berry et al. 2018], with a time interval on the order of 10-20 minutes. Alternatively, the timeout might occur in circuitry before the DAN and use dopamine to signal giving up.

In mammals, the second option involving a dopamine spike might signal a give-up time. Although the reward-prediction error (RPE) in the Vta (ventral tegmental area) is typically interpreted as a reinforcement-learning signal, it could also signal a give-up time.

Mammalian analogy

In mammals, a give-up signal might be a combination of some or all of several neurotransmitters: dopamine (DA), serotonin (5HT), acetylcholine (ACh), and possibly norepinephrine (NE).

Dopamine has a characteristic phasic dip when the animal decides no reward will come. Many researchers consider this no-reward dip to be a reward-prediction error (RPE) in the sense of reinforcement learning [Schultz 1997].

One of the many serotonin functions appears patience-related [Lottem et al. 2018], [Miyazaki et al. 2014]. Serotonin ramps while the animal is persevering at the task and rapidly drops when the animal gives. Serotonin is also required for reversal learning, although this may be unrelated.

Acetylcholine (ACh) is required for task switching. Since giving-up is a component of task switching, ACh likely plays some role in the circuit.

[Aston-Jones and Cohen 2005] suggest a related role for norepinephrine for patience, impatience and decision making.

On the one hand, having essentially all the important modulatory neurotransmitters involved in this problem doesn’t give a simple answer. On the other hand, the involvement of all of them in give-up timing may be an indication of how much neural circuitry is devoted to this problem.

Mammalian RPE circuitry

The following is a partial(!) diagram for the mammalian patience/failure learning circuit, assuming the RPE signal detected in DA/Vta is related to give-up time. The skeleton of the circuit is highly conserved: almost all of it exists in all vertebrates, with the possible exception ofr the cortical areas F.vm (ventromedial prefrontal cortex) and C.sma (supplemental motor area). For simplicity, the diagram doesn’t include the ACh (V.ldt/V.ppt) and NE (V.lc) circuits. The circuit’s center is the lateral habenula, which is associated with a non-reward failure signal.

Partial reward-error circuit in the mammalian brain.

Key: T.pf (parafascicular thalamus), T.pv (paraventricular thalamus), C.sma (supplementary motor area cortex), F.vm (ventromedial prefrontal cortex), A.bl (basolateral amygdala), E.hc (hippocampus), Ob (olfactory bulb), O.pir (piriform cortex), S.v (ventral striatum/nucleus accumbens), S.a (central amygdala), S.ls (lateral septum), S.ot (olfactory tubercle), P.hb (habenula-projecting pallidum), P.a (bed nucleus of the stria terminalis), Hb.l (lateral habenula), H.l (lateral hypothalamus), H.stn (sub thalamic nucleus), Poa (preoptic area), Vta (ventral tegmental area), V.dr (dorsal raphe), DA (dopamine), 5HT (serotonin). Blue – limbic, Red – striatal/pallidal. Beige – cortical. Green – thalamus.

Some observations, without going into too much detail. First, the hypothalamus and preoptic area is heavily involved in the circuit, which suggests its centrality and possibly primitive origin. Second, in mammals the patience/give-up circuit has access to many sophisticated timing and accumulator circuits, including C.sma, F.ofc (orbital frontal cortex), as well as value estimators like A.bl and context from episodic memory from E.hc (hippocampus.) This, essentially all of the limbic system projects to Hb.l (lateral habenula), a key node in the circuit.

Although the olfactory path (Ob to O.pir to S.ot to P.hb to Hb.l) is the most directly comparable to the fruit fly mushroom body, it’s almost certainly convergent evolution instead of a direct relation.

The most important point of this diagram is to show that mammalian give-up timing and RPE is so much more complex than the fruit fly, that results from mammalian studies don’t give much information for the fruit fly, although the reverse is certainly possible.

Reward prediction error (RPE)

Reward prediction error (RPE) itself is technically just an encoding of a reward result. A reward signal could either represent the reward directly or as a difference from a reference reward, for example the average reward. Computational reinforcement learning (RL) calls this signal RPE because RL is focused on the prediction not the signal. But an alternative perspective from the marginal value theorem (MVT) of foraging theory [Charnov 1976], suggests the animal use the RPE signal to decide when to give up.

The MVT suggests that an animal should give up on a patch when the current reward rate is lower than the average reward rate in the environment. If the RPE’s comparison reward is the average reward, then a positive RPE suggests the animal should stay in the current patch, and a negative RPE says the animal should consider giving up.

In mammals, [Montague et al. 1996] propose that RPE is used like computational reinforcement learning, specifically temporal difference (TD) learning, partly because they argue that TD can handle interval timing, which is related to the give up time that I need. However, TD’s timing representation requires a big increase in complexity.

Computational models

To see where the complexity of time comes from, let’s step back and consider computational models used by both RL and the Turing machine. While the Turing machine might seem to formal here, I think it’s useful to explore using a formal model for practical designs.

Reinforcement learning machine and Turning machine abstract diagram.

Both models above abstract the program into stateless transition tables. RL uses an intermediate value function followed by a policy table [Sutton and Barto 2018]. The Turing state is either in the current state variable (basically an integer) and the infinite tape. RL exports its entire state to the environment, making no distinction between internal state like a give-up timer and the external environment. Note the strong contrast with a neural model where every synapse can hold short-term or long-term state.

Unlike the Turing machine, the RL machine diagram is a simplification because researchers do explore beyond the static tabular model, such as using deep-learning representations for the functions. The TD algorithm itself doesn’t follow the model strictly because it updates the value and policy tables dynamically, which can create memory-like effects early in training.

The larger issue in this post’s topic is the representation of time. Both reinforcement learning and the Turing machine represent time as state transitions with a discrete ticking time clock. An interval timer or give-up timer is represented by states for each tick in the countdown.

State machine timeout

The give-up timeout is an illustration of the difference between neural circuits and state machines. In neural circuits, a single synapse can support a timeout using LTD (or STD) with biochemical processes decreasing synapse strength over time. In the fruit fly KC to MBON synapse, the timescale is on the order of minus (“interval” timing), but neural timers can implement many timescales from fractions of seconds to hours in a day (circadian).

State machines can implement timeouts as states and state transitions. Since state machines are clock based (tick-based), each transition occurs on a discrete, integral tick. For example, a timeout might look like the following diagram:

Portion of state machine for timeout.

This state isn’t a counter variable, it’s a tiny part of a state machine transition table. State machine complexity explodes with each added capability. If this timeout part of the state machine is 4 bits representing 9 states, and another mostly-independent part of the state machine has another 4 bits with 10 state, the total state machine would need 8 bits with 90-ish states, depending on the interactions between the two components because a state machine is one big table. So, while a Turing machine can theoretically implement any computation, in practice only relatively small state machines are usable.

Searle’s Chinese room

The tabular nature of state machines raises the philosophical thought experiment of Searle’s Chinese room, as an argument against computer understanding.

Searle’s Chinese room is a philosophical argument against any computational implementation of meaningful cognition. Searle imagines a person who doesn’t understand Chinese in a huge library with lookup books containing every response to every possible Chinese conversation. When the person receives a message, they find the corresponding phrase in one of the books and writes the proper response. So, the person in the Chinese room holds a conversation in Chinese without understanding a single word.

For clarity, the room’s lookup function is for the entire conversation until the last sentence, not just a sentence to sentence lookup. Essentially it’s like the input the attention/transformer deep learning as use in something like ChatGPT (with a difference that ChatGPT is non-tabular.) Because the input includes the conversational context, it can handle contextual continuity in the conversation.

The intuition behind the Chinese room is interesting because it’s an intuition against tabular state-transition systems like state machines, the Turing machine, and the reinforcement learning machine above. Searle’s intuition is basically since computer systems are all Turing computable, and Turing machines are tabular, but tabular lookup is an absurd notion of understanding Chinese (table intuition), therefore computers systems can never understand conversation. “The same arguments [Chinese room] would apply to … any Turing machine simulation of human mental processes.” [Searle 1980].

Temporal difference learning

TD learning can represent timeouts, as used in [Montague et al. 1996] to argue for TD as a model for the striatum, but this model doesn’t work at all for the fruit fly because each time step represents a new state, and therefore needs a new parameter for the value function. Since the fruit fly mushroom body only has 24 neurons, it’s implausible for each neuron to represent a new time step. Since the mammalian striatum is much larger (millions of neurons), it can encode many more values, but the low information rate from RPE (less than 1Hz), makes learning difficult.

These difficulties don’t mean TD is entirely wrong, or that some ideas from TD don’t apply to the striatum, but it does mean that a naive model of TD of the striatum might have trouble working at any significant scale.

State in VLSI circuits

Although possibly a digression, I think it’s interesting to compare state in VLSI circuits (microprocessors) to both neurons, and the reinforcement learning machine and Turing machine. In some ways, state in VLSI resembles neurons more than it does formal computing models.

VLSI logic and state. Blue is state (latches) and beige is logic.

The φ1 and φ2 are clock signals needed together with the latch state to make the system work. The clocks and latches act like gates in an airlock or a water lock on a river. In a water lock only one gate is open at a time to prevent the water from rushing through. In the VLSI circuit, only one latch phase is active at a time to keep the logic bigs from mixing togethers. Some neuroscience proposals like [Hasselmo and Eichenbaum 2005] have a similar architecture for the hippocampus (E.hc) for similar reasons (keeping memory retrieval from mixing up memory encoding.)

In a synapse the slower signals like NMDA, plateau potentials, and modulating neurotransmitters and neuropeptides have latch-like properties because their activation is slower, integrative, and more stable compared to the fast neurotransmitters. In that sense, the slower transmission is a state element (or a memory element). If memory is a hierarchy of increasing durations, these slower signals are at the bottom, but they are nevertheless a form of memory.

The point of this digression is to illustrate that formal machine state models is unusual and possibly unnatural, even when describing electronic circuits. That’s not to say that those models are useless. In fact, they’re very helpful for mental models at the smaller scales, but in larger implementations, the complexity of the necessary state machines limits their value as an architectural model.

Conclusion

This post is mostly a collection of trying to understand why the RPE model bothers me as unworkable, not a complete argument. As mentioned above, I have no issues with a reward signal relative to a predicted reward, or using that differential signal for memory and learning. Both seem quite plausible. What doesn’t work for me is the jump to particular reinforcement learning models like temporal difference, adding external signals like SCS, and without taking into account the complexities and difficulties of truly implementing reinforcement learning. This post tries to explain some of the reasons for that skepticism.

References

Aston-Jones, Gary, and Jonathan D. Cohen. An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. Annu. Rev. Neurosci. 28 (2005): 403-450.

Berry, Jacob A., et al. Dopamine is required for learning and forgetting in Drosophila. Neuron 74.3 (2012): 530-542.

Braitenberg, V. (1984). Vehicles: Experiments in synthetic psychology. Cambridge, MA: MIT Press. “Vehicles – the MIT Press”

Charnov, Eric L. “Optimal foraging, the marginal value theorem.” Theoretical population biology 9.2 (1976): 129-136.

Hasselmo ME, Eichenbaum H. Hippocampal mechanisms for the context-dependent retrieval of episodes. Neural Netw. 2005 Nov;18(9):1172-90.

Lottem, Eran, et al. Activation of serotonin neurons promotes active persistence in a probabilistic foraging task. Nature communications 9.1 (2018): 1000.

Miyazaki, Kayoko W., et al. Optogenetic activation of dorsal raphe serotonin neurons enhances patience for future rewards. Current Biology 24.17 (2014): 2033-2040.

Montague, P. Read, Peter Dayan, and Terrence J. Sejnowski. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. Journal of neuroscience 16.5 (1996): 1936-1947.

Schultz, Wolfram. Dopamine neurons and their role in reward mechanisms. Current opinion in neurobiology 7.2 (1997): 191-197.

Searle, John (1980), Minds, Brains and Programs, Behavioral and Brain Sciences, 3 (3): 417–457,

Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.

Page 3 of 5

Powered by WordPress & Theme by Anders Norén