Attempting a toy model of vertebrate understanding

Tag: drosophila

Essay 33: Klinotaxis

When seeking an odor, vertebrate swimming undulates left and right, naturally moving the nose perpendicular to the body motion. This lateral motion can help navigation if odor sampling can be coordinated with the movement, enabling a spatiotemporal gradient calculation along the path of the nose movement. This lateral sampling over time is called klinotaxis (“leaning navigation”) or weathervaning.

Essay 24 and essay 25 explored head-direction navigation as inspired by the fruit fly Drosophila fan-shaped body and ellipsoid body. The idea was to use head direction to translate egocentric movement into an allocentric memory of past samples, independent of the current body direction. In contrast, klinotaxis uses an egocentric system, where the lateral motion is relative to the current direction, not an independent, compass or map-like system.

Klinotaxis in Drosophila larva and C. elegans

Klinotaxis has been largely studied in the fruit fly Drosophila larva and the roundworm C. elegans. Drosophila larva have a distinct “cast” movement, where they pause and wave their heads side to side, either a single time (1-cast) or multiple times (n-cast) [Zhao et al 2017]. Larva movements break down into five major types [Gomez-Marin and Louis 2014]:

  • Forward
  • Backward
  • Stop
  • Turn
  • Cast

C. elegans has two major seek movements: pirouettes and weathervaning [Lockery 2011]. Pirouettes are a u-turn when the animal is moving away from the odor. Weathervaning is a side-to-side head movement that manages turning.

Both systems are temporal gradient systems, requiring measurements at different times and a memory of the older measurement [Chen X and Engert 2014]. Klinotaxis requires a basic form of memory [Karpenko et al 2020], but the comparison can be a simple ON or OFF result [Lockery 2011]. Pirouetts use a gradient parallel to body motion and reverse direction when the animal is moving away from the odor [Iino and Yoshida 2009]. Weathervaning uses a gradient perpendicular to body motion, measured with a lateral head movement [Lockery 2011].

This klinotaxis contrasts with a bilateral spatial navigation that compares two lateral sensors [Chen X and Engert 2014], such as bilateral eyes, ears, or nostrils. In Drosophila larva, odor turning is proportional to the lateral gradient more than the parallel gradient [Martinez 2014]. The odor navigation is not simply bilateral because disabling one side of O.sn (olfactory sensory neuron) only minimally impairs navigation [Gomez-Marin and Louis 2014].

As a slight digression, let’s return to the adult Drosophila navigation, because the structure can be a useful analogy for understanding vertebrate klinotaxis navigation, despite using a different allocentric system.

Adult Drosophila FSB

Below is a rough sketch of the Drosophila navigation circuit, focused on the fan-shaped body [Hulse et al 2021]. The ellipsoid body (EB) and protocerebral bridge (PB) calculate head direction and sort it into 18 columns. This head direction is allocentric, independent of the animal’s current direction, like a compass direction or a map. Input from odor areas like the mushroom body (MB) and lateral horn (LN) are organized into 9 rows. The fan-shaped body combines these 18 head direction columns and 9 sense data rows into a memory table.

Drosophila navigation
Drosophila navigation, focusing on head direction from PB, odor data from MB and LH, and allocentric table of FB. EB ellipsoid body, FB fan-shaped body, LH lateral horn, MB mushroom body.

Motor navigation reads out from the fan-shaped-body table. These motor commands include left and right, but also include a separate u-turn command [Westeinde et al. 2022]. Although this allocentric navigation system differs from egocentric klinotaxis, its motor output includes both the left vs right from weathervaning and the u-turn from pirouette.

The previous essay 24 and essay 25 attempts followed this model. As the animal moves in space, the model saved the forward odor gradient according to the current head direction. By comparing stored values for other head directions, the animal would improve its heading toward the direction with the strongest odor.

The fan-shaped body then becomes a record of samples of all the older directions that the animal had measured. Output is then calculated for left (PFL3L), right (PFL3R), and u-turn (PFL2) signals. [Westeinde et al 2024]. The current head direction is represented as a sinusoidal neural pattern and combined with the stored values to produce an output.

This system was only partially successful for the essay. Although it was an improvement over no memory, because the animal was continually moving in space, the table was always obsolete. Even when the table memory times out to represent loss in accuracy as the animal moves, the rapid obsolescence made navigation difficult, particularly as the animal neared the target.

So, this essay simplifies the circuit and lowers the ambition. Instead of trying to record every direction and keeping perfect allocentric compass direction, the animal could simple save its left and right oscillation as it swims naturally.

Vertebrate Hb.m and R.ip

The vertebrate Hb.m (medial habenula) to R.ip (interpeduncular nucleus) is used for phototaxis [Chen X and Engert 2014], Chemotaxis [Chen WY et al 2019] and thermotaxis [Palieri et al 2024]. In a clever experiment creating a virtual light circle, Chen and Engert shows that the zebrafish phototaxis is not simply comparing light between the eyes for a spatial gradient (tropotaxis) but is a temporally-based gradient (klinotaxis), relying on a short term memory of the previous light. This phototaxis uses the Hb.m to R.ip circuit [Chen X and Engert 2014].

Vertebrate olfactory klinotaxis circuit. Ob (olfactory bulb), Hb.m (medial habenula), P.ldt (laterodorsal tegmental nucleus), R.dtg (dorsal tegmental nucleus of Gudden), R.ip (interpeduncular nucleus), R.rs (reticulospinal), V.mr (median raphe)

Head direction from R.dgt (dorsal tegmental nucleus) tiles R.ip vertically [Petrucco et al 2023], while olfactory and light input is organized horizontally [Chen WY et al 2019], [Zaupa et al 2021]. After combining the odor with the head direction and comparing with the stored values, it sends motor commands to R.rs (reticulospinal) using P.ldt (laterodorsal tegmental nucleus) and V.mr (median raphe). The vertebrate R.ip has 6 columns of head direction input from R.dtg, resembling the Drosophila fan-shaped body, but instead of 18 columns for the fan-shaped body, R.ip only has 6, three to a side [Petrucco et al 2023].

Essay 25 explored a model which used the Drosophila fan-shaped body allocentric navigation in R.ip with some limited but not overwhelming success. Instead, this essay will try a different interpretation, where R.ip is only storing side to side weathervaning of the head while swimming, instead of a full 360 degree table like Drosophila.

Vertebrate klinotaxis

As a different approach, suppose the head direction to R.ip is not an allocentric map-making coordinator as in the adult Drosophila, but a simpler egocentric weathervaning or casting coordinator, storing only the lateral gradient from head direction changes from natural swimming, or possibly deliberate larger turns like casting to gather wider lateral gradient information.

Klinotaxis simplifies the need for precise head direction. Instead of the Drosophila 18 head direction columns calibrated to the outside world, we use only three, two lateral and one central, that only require motor efference copies of left and right muscle turns. Studies from the zebrafish R.ip suggest three columns to a side, which isn’t connected to the vestibular system [Petrucco et al 2023]. To me, this suggests to me that the head direction might not be an allocentric signal that requires precise direction, but a simple egocentric lateral measurement, which doesn’t need vestibular information.

Vertebrate thigmotaxis circuit. Hb.m (medial habenula), Ob (olfactory bulb), R.dtg (dorsal tegmental nucleus), R.ip (interpeduncular nucleus).

The above diagram illustrates the system. Olfactory samples arrive through Hb.mand head direction arrives from R.dtg. Like the Drosophila fan-shaped body, R.ip combines odor samples with lateral head movement into a simple memory table, and it reads out left and right motor commands. A similar system can save odor measurements parallel to body movement, using velocity instead of head direction, to trigger a u-turn when the animal is moving away from the odor.

Discussion

Compared to the parallel-only gradient, allocentric system of essay 25, this lateral navigation is far simpler and more effective. Even with only three bins compared to the 8 bins in essay 25, the lateral weathervaning turned out to be more effective and less brittle. If R.ip does implement a lateral klinotaxis system like this essay, it’s plausible that the 6 directions reported by [Westeinde et al 2024] are sufficient for accurate seek navigation. In contract, those 6 directions seem insufficient for an allocentric navigation compared to the Drosophila 18 directions.

Interestingly, the pirouette also highly effective, even without lateral klinotaxis. In the simulation, when the animal moved away from the odor source, it makes a u-turn. This system served to ratchet the animal closer and closer to the target. Even when most of the movement was random, the pirouette locks in any improvement. Pirouette itself is also simple, only requiring two averages: a short average and a long average, where a short average tracks the odor across a single swim cycle and a long average uses two swim cycles. When the short average has a stronger odor value than the long average, the animal is moving toward the odor.

In both cases, the simulation used a binary OFF for the motor command instead of attempting finer precision from the gradient. This simple OFF strategy was sufficient for the simulation. A C. elegans study suggested that ON-OFF coding was energy efficient, and the worm rarely orients perfectly to the gradient [Lockery 2011].

References

Chen WY, Peng XL, Deng QS, Chen MJ, Du JL, Zhang BB. Role of Olfactorily Responsive Neurons in the Right Dorsal Habenula-Ventral Interpeduncular Nucleus Pathway in Food-Seeking Behaviors of Larval Zebrafish. Neuroscience. 2019 Apr 15;404:259-267. 

Chen X, Engert F. Navigational strategies underlying phototaxis in larval zebrafish. Front Syst Neurosci. 2014 Mar 25;8:39.

Gomez-Marin A., Louis M. (2014). Multilevel control of run orientation in Drosophila larval chemotaxis. Front. Behav. Neurosci. 8:38 10.3389/fnbeh.2014.00038.

Hulse, B. K., Haberkern, H., Franconville, R., Turner-Evans, D., Takemura, S. Y., Wolff, T., … & Jayaraman, V. (2021). A connectome of the Drosophila central complex reveals network motifs suitable for flexible navigation and context-dependent action selection. Elife, 10.

Iino Y, Yoshida K. Parallel use of two behavioral mechanisms for chemotaxis in Caenorhabditis elegans. J Neurosci. 2009 Apr 29;29(17):5370-80. 

Karpenko S, Wolf S, Lafaye J, Le Goc G, Panier T, Bormuth V, Candelier R, Debrégeas G. From behavior to circuit modeling of light-seeking navigation in zebrafish larvae. Elife. 2020 Jan 2;9:e52882. 

Lockery SR. The computational worm: spatial orientation and its neuronal basis in C. elegans. Curr Opin Neurobiol. 2011 Oct;21(5):782-90. 

Martinez D. Klinotaxis as a basic form of navigation. Front Behav Neurosci. 2014 Aug 14;8:275. 

Palieri V, Paoli E, Wu YK, Haesemeyer M, Grunwald Kadow IC, Portugues R. The preoptic area and dorsal habenula jointly support homeostatic navigation in larval zebrafish. Curr Biol. 2024 Feb 5;34(3):489-504.e7.

Petrucco L, Lavian H, Wu YK, Svara F, Štih V, Portugues R. Neural dynamics and architecture of the heading direction circuit in zebrafish. Nat Neurosci. 2023 May;26(5):765-773. 

Westeinde EA, Kellogg E, Dawson PM, Lu J, Hamburg L, Midler B, Druckmann S, Wilson RI. Transforming a head direction signal into a goal-oriented steering command. Nature. 2024 Feb;626(8000):819-826. 

Zaupa M, Naini SMA, Younes MA, Bullier E, Duboué ER, Le Corronc H, Soula H, Wolf S, Candelier R, Legendre P, Halpern ME, Mangin JM, Hong E. Trans-inhibition of axon terminals underlies competition in the habenulo-interpeduncular pathway. Curr Biol. 2021 Nov 8;31(21):4762-4772.e5. 

Zhao W, Gong C, Ouyang Z, Wang P, Wang J, Zhou P, Zheng N, Gong Z. Turns with multiple and single head cast mediate Drosophila larval light avoidance. PLoS One. 2017 Jul 11;12(7):e0181193. 

Essay 25: head direction gradients

Essay 24, which investigated temporal gradient navigation, raised the question of head direction and navigation. The essay 24 model followed a zebrafish phototaxis experiment by [Chen and Engert 2014] which created a virtual light spot surrounded by darkness. The phototaxis behavior used Hb.m (medial habenula) and B.ip (interpeduncular nucleus) path using 5HT (serotonin) from V.mr (median raphe) as an average integrator [Cheng et al 2016] to generate the gradient without using head direction. Since B.ip receives head direction input [Petrucco et al 2023], essay 25 explores using head direction with the phototaxis gradient.

In the fruit fly drosophila, head direction and goal direction combine in the fan-shaped body to produce motor commands toward the goal [Matheson et al 2022]. Since the vertebrate B.ip connectivity with head direction resembles the fan-shaped body, this essay will use it as a model.

B.ip connectivity

Head direction from B.dtg (dorsal tegmental nucleus of Gudden) and the photo-gradient input from Hb.m would combine in tabular rows and columns in B.ip, if it resembles the fan-shaped body.

B.ip connectivity following a fan-shaped body model. B.dtg dorsal tegmental nucleus of Gudden, B.ip interpeduncular nucleus, B.rs reticulospinal motor command, Hb.m medial habenula.

Head direction encoding

Head direction is necessarily encoded by neurons. Each neuron in the head direction population has a specific direction, and fires when the animal is heading toward the neuron’s preferred direction.

Head direction encoding. Each neuron (colored box) corresponds to a direction. The neuron in the current direction is active, while other directions are silent.

In general, the heading is encoded is an ensemble of neurons, where several neurons around the actual direction fire at different rates (or possibly delayed phases). In the diagram above, the central direction (blue) has a higher activity while neighboring neurons have smaller values [Petrucco et al 2023].

Drosophila uses a coding for its head direction, where the amplitude of the actual direction neuron is close to one and the neurons at orthogonal directions are zero [Westeinde et al 2022]. This sinusoidal encoding enables neuron-friendly transformations and combinations [Touretzky et al 1993] with advantages over neural rate-encoding or phase encoding, particularly in response speed.

Fan-shaped body: allocentric to egocentric

Fruit fly navigation uses its fly-shaped body to combine an allocentric goal direction with the head direction to create motor commands to turn left or right. Egocentric is self-focused and allocentric is other-focused. Allocentric coordinates are animal-independent like North or toward a distant landmark, which egocentric coordinates are relative to the animal, like forward, right or left.

The fan-shaped body has a tabular shape where each column is a head direction and each row is a goal input [Hulse et al 2021]. The fan-shaped body combines the goal vector and the head direction to create motor commands [Westeinde et al 2022].

The fan-shaped body combines head direction with goal vectors to produce motor commands.

By shifting the head direction and combining the sinusoidal encodings of the goal vector, the motor output is a turn toward left or right. In drosophila, there’s a third motor command for a U-turn when the goal is behind the fly. Each motor command is carried by a specific neuron: PFL2.L (left), PFL2.R (right), and PFL3 (U-turn).

In drosophila, there are 18 distinct head direction columns and up to 9 goal rows. The fan-shaped body is also used for motivation calculations like sleep, despite sleep not fitting into the strict tabular model shown above. To create the strict organization, the fan-shaped body has 400 distinct neuron types [Hulse et al 2021].

Constructing goal vectors

In the phototaxis situation as in essay 24 or [Chen and Engert 2014] the goal vector is constructed from the gradient as the animal enters darkness from light and the head direction at that moment.

Captured goal vector (red) when the animal crosses into darkness.

As the diagram above suggests, the stored vector isn’t the true direction from light to dark, but only the sample along the animal’s path. The gradient value is then stored in the goal direction cells.

Storing the goal vector requires gating based on head direction. In zebrafish, serotonin accumulators can be gated by actions and used as a short term memory (5s – 20s) [Kawashima et al 2016]. For the essay, head dir gates serotonin accumulation as a replacement for the action gating.

Storing gradient into the goal vector based on the current goal. The red direction (south-east) gates its associated serotonin accumulator.

Since V.mr (median raphe) neurons produce consistent tonic oscillations, they are ideal for reading the accumulated value. No additional circuitry for the read is necessary.

Essay simulation

Because the essay model is a functional level, not a circuit level, it can use a directional vector encoding: a pair of floating-point numbers for direction and gradient for strength.

The simulation also calculated two averages: a short-term average for the goal vector gradient and a long-term average for phototaxis gradient motivation. The goal vector average needs to be shorter to avoid bleed-over from a previous direction.

Screenshot of animal crossing into darkness.

The above screenshot shows the animal’s state when it crosses into darkness. The long-timescale motivational gradient (“gr/grad”) is negative, driving the animal to avoid darkness. The short directional gradient (“sa”) is near zero, avoiding update of the stored goal vector. (Note: gradients are 0.5-centered for graphing consistency.)

The homunculus diamond in the upper right shows the current head direction (black semicircle pointing north-east) and the avoidance goal vector (orange semi-circle pointing east). Since the animal is heading toward the avoidance direction, it has a U-turn motor command (orange triangle at top). In addition, since the goal vector and head direction are near a right angle, right turns are inhibited (red at lower right). Because locomotion remains exploratory and stochastic, inhibits reduce turn probability but don’t force turns.

Discussion

This essay’s model is more speculative even compared to other essays, because I haven’t found any papers reporting in B.ip head direction behavior other than the base existence of head direction afferents [Petrucco et al 2023]. In particular, the drosophila fan-shaped body is not homologous to B.ip because the pre-vertebrate animal amphioxus lacks either structure. Nevertheless, it’s interesting that a goal gradient vector circuit is at least possible and relatively simple.

Specifically, the goal vector provides an evolutionary step toward hippocampal (E.hc) object vector cells and grid cells, because those are relatively small enhancements over the goal vector. Without a Bi.ip goal vector system as an intermediary step, hippocampal navigation is too big of an evolutionary step with too many concurrent requirements to be likely.

Note that the hippocampal system is strongly connected with the Hb, B.ip, V.mr, B.dtg system from this essay. E.hc (hippocampus), P.ms (medial septum), Hb (habenula), B.ip (interpeuncular nucleus), V.mr (median raphe), B.dtg (head direction) form a strong connected system together with H.sum (supramammilary/ retromammilary nucleus).

References

Chen X, Engert F. Navigational strategies underlying phototaxis in larval zebrafish. Front Syst Neurosci. 2014 Mar 25;8:39.

Cheng RK, Krishnan S, Jesuthasan S. Activation and inhibition of tph2 serotonergic neurons operate in tandem to influence larval zebrafish preference for light over darkness. Sci Rep. 2016 Feb 12;6:20788.

Hulse, B. K., Haberkern, H., Franconville, R., Turner-Evans, D., Takemura, S. Y., Wolff, T., … & Jayaraman, V. (2021). A connectome of the Drosophila central complex reveals network motifs suitable for flexible navigation and context-dependent action selection. Elife, 10.

Kawashima T, Zwart MF, Yang CT, Mensh BD, Ahrens MB. The Serotonergic System Tracks the Outcomes of Actions to Mediate Short-Term Motor Learning. Cell. 2016 Nov 3;167(4):933-946.e20. 

Matheson, A. M., Lanz, A. J., Medina, A. M., Licata, A. M., Currier, T. A., Syed, M. H., & Nagel, K. I. (2022). A neural circuit for wind-guided olfactory navigation. Nature Communications, 13(1), 4613.

Petrucco L, Lavian H, Wu YK, Svara F, Štih V, Portugues R. Neural dynamics and architecture of the heading direction circuit in zebrafish. Nat Neurosci. 2023 May;26(5):765-773. 

Touretzky, D. S., Redish, A. D., & Wan, H. S. (1993). Neural representation of space using sinusoidal arrays. Neural Computation, 5(6), 869-884.

Westeinde Elena A., Emily Kellogg, Paul M. Dawson, Jenny Lu, Lydia Hamburg, Benjamin Midler, Shaul Druckmann, Rachel I. Wilson (2022). Transforming a head direction signal into a goal-oriented steering command. bioRxiv 2022.11.10.516039; 

Essay 24: phototaxis problems

The simple phototaxis implementation exposes a few problem with the simulation, both from running it and from reviewing neuroscience to critique it.

Interrupts

The essay doesn’t currently implement any interrupt mechanism. When running into darkness, the animal turns around and starts an area restricted search (ARS), but if the animal is in the middle of a long Levy path, it will cross the border and the search will not find the border.

Long initial paths breaks the phototaxis algorithm.

The problem here is that the ARS starts too late because the animal doesn’t interrupt the current behavior when encountering the border.

One solution is to create an interrupt (orientation) system, which exists in the vertebrate brain in V.ppt (peduncular pontine nucleus), and uses ACh (acetylcholine) to interrupt the current behavior. A natural location for the interrupt is V.ppt for the signal and the stratum as the plan representation, interruptible via ACh interrupts to the striatum.

Another solution is to avoid the uninterruptible behavior entirely, where the problem is the essay’s Levy walk implementation. The essay pre-computes the length of a run instead of continuously creating extensions. In contrast the zebrafish larva swims in bouts, but longer runs are made of multiple forward bouts.

Zebrafish random walk (ARTR area)

The essay’s random walk does not match actual zebrafish search behavior. The essay uses a turn-and-run model where the turn and run length are computed randomly. The zebrafish has a hindbrain oscillator, the ARTR (anterior rhombencephalic turning region) which selects left and right turns [Karpenko et al 2020].

Speculative directed random walk with the zebrafish ARTR. The isthmus is the midbrain-hindbrain boundary (MHB). B.artr anterior rhombencephalic turning region, B.rs reticulospinal motor command, Hb.m medial habenula, M.ip interpeduncular nucleus.

In zebrafish, turns and runs are selected independently and can be chained differently. Instead of turn-run-turn-run as in the essay, the zebrafish can have turn-turn or run-run patters. Zebrafish turn direction is also correlated, as opposed to the random walk’s turn independence. A zebrafish left turn is more likely to follow a left turn.

When encountering darkness, the same-direction turns increase. When encountering light, alternating turns increase. Together with the sharp turn (O-bend) followed b shallower turns, this behavior should create a spiral-like search for the light area.

Note: this specialized circuitry in the hindbrain suggests that random search is a primitive behavior. Although the essay put the Levy walk logic in the midbrain, it belongs in the more primitive hindbrain.

Head direction

[Petrucco et al 2023] report that M.ip is highly connected with head direction axons from B.dtg (dorsal tegmental area of Gudden). This head direction does not receive vestibular input, but is likely derived from motor efferent copies. Since both M.ip and B.dtg are r1-derived regions and possibly ante-vestibular, this head-direction and M.ip connection may be ancient.

Speculative M.ip circuit following the fruit fly fan-shaped body. B.dtg dorsal tegmental nucleus of Gudde, B.rs reticulospinal motor command, Hb.m medial habenula, M.ip interpeduncular nucleus.

This organization is strikingly similar to the fruit fly’s ellipsoid body (EB), protocerebral bridge (PB), and fan-shaped body (FB) in the central complex (CX) [Hulse et al 2021]. EB and PB calculate head direction. The fan-shaped body merges head direction with goal direction to produce motor commands. In this diagram, M.ip represented as if it resembles the fan-shaped body.

If the M.ip functionality is similar to the fan-shaped body, it’s highly likely to be convergent evolution, not homology because amphioxus lacks any similar structure.

Dark search

When zebrafish are plunged into darkness, they initiate a search that continues for about five minutes. The darkness behavior increases speed and straight behavior [Horstick et al 2017]. In other words, phototaxis is not just gradient behavior but also has steady-state darkness behavior. Because zebrafish require light to hunt, darkness in itself is is an area to avoid.

The essay is purely gradient based and has no speed changes. Photokinesis is moving faster in darkness and slower in light, which will bias the time spent in the light area.

References

Hulse, B. K., Haberkern, H., Franconville, R., Turner-Evans, D., Takemura, S. Y., Wolff, T., … & Jayaraman, V. (2021). A connectome of the Drosophila central complex reveals network motifs suitable for flexible navigation and context-dependent action selection. Elife, 10.

Horstick EJ, Mueller T, Burgess HA. Motivated state control in larval zebrafish: behavioral paradigms and anatomical substrates. J Neurogenet. 2016 Jun;30(2):122-32.

Karpenko S, Wolf S, Lafaye J, Le Goc G, Panier T, Bormuth V, Candelier R, Debrégeas G. From behavior to circuit modeling of light-seeking navigation in zebrafish larvae. Elife. 2020 Jan 2;9:e52882. 

Petrucco L, Lavian H, Wu YK, Svara F, Štih V, Portugues R. Neural dynamics and architecture of the heading direction circuit in zebrafish. Nat Neurosci. 2023 May;26(5):765-773. doi: 10.1038/s41593-023-01308-5. 

16: Classical conditioning and negative learning

While I was researching how the fruit fly might learn to ignore initially-attractive odors, I ran into a difficulty that most papers aren’t interested in that attractive-to-ignore transition. For this simulation this lack of information means I have needed to guess as to the plausibility. One possible reason for the lack of data might be an over-focus on specifics of classical associative learning.

Foraging and odor-following task

The task here is finding food by following a promising odor. As explored in the essay 15 and essay 16 simulations, this following-odor task is more complicated than it first appears because a naive solution leads to perseveration: never giving up on the odor. Perseveration is potentially fatal for the animal if it can’t break away from a non-rewarding lead, an inability to accept failure. To avoid this fatal flaw, there needs to be a specific circuit to handle that failure, otherwise the animal will follow the odor forever. At the same time, the animal must spend some time exploring the potential food area before giving up (patience). This dilemma is similar to the explore/exploit dilemma for foraging theory and reinforcement learning. A state diagram for the odor-foraging task might look something like the following:

Behavior state transitions for following an odor to food.

This odor task starts when the animal detects the odor. The animal approaches the odor repeatedly until it either finds food, the odor disappears, or the animal gives up. Giving up is the interesting case here because it requires an internal state transition, while all other inputs causes stimulus-reponse transitions: the animal just reacts. Finding food triggers consummation (eating), and disappearing odor voids the task. Both are external stimuli. In contract, giving up requires internal state circuitry to decide when to quit, a potentially difficult decision.

Learning to ignore

The animal can improve its odor-following performance if it can learn to ignore false leads, as explored in essay 15. The animal follows intrinsically-attractive odors and if there’s a reward it continues to approach the odor. But if the animal doesn’t find food, it remembers the odor and ignores the odor the next time. The simulations in essay 15 showed the effectiveness of this strategy, improving food discovery.

Learning transitions for an intrinsically attractive odor.

In the learning diagram above, when the animal finds food in an odor that predicts food, it maintains the intrinsic approach. When the animal doesn’t find food for that odor, it will ignore the odor for the next time. This learning is not simple habituation because the learning depends on the reward outcome, but it’s also not classical associative learning.

Classical associative learning

Classical associative learning (Pavlovian) starts from an initial blank state (tabula rasa) and learns from reward to approach and from punishment to avoid. The null transition, no reward and no punishment, maintains the initial blank state, although this assumption is implicit and not discussed as an important part of the model. A number of points about the classical associative learning model:

Learning transitions for classical associative learning, adding the implicit non-reward transition.

First, the greyed-out transition is an assumption, often untested or if it is tested, it’s dismissed as unimportant. For example, [Tempel et al. 1983] notes that OCT (a testing odor) becomes increasingly aversive even without punishment (a shock), which contradicts the greyed transition, but doesn’t incorporate that observation into the analysis. Similarly, [Berry et al. 2018] found that the test odors have increasing LTD (long-term depression) even without punishment or reward, but relegates that observation as unpublished data, presumably because it was irrelevant to the classical model in the study.

Second, classical association is a blank slate (tabula rasa) model: the initial state is an ignore state. Although fruit flies have intrinsically attractive odors and intrinsically repelling odors, research seems to focus on neutral odors. possibly because classical association expects an initial ignore state. But starting with neutral odors means there’s little data about learning with intrinsically attractive odors. For example attractive odors might be impervious to negative learning. In fruit flies, the lateral horn responds to non-learned behavior, such as intrinsic attractive odors. The mushroom body (learning) might not suppress the lateral horn’s intrinsic behavior for those odors.

Learning transitions for a semi-classical situation where the non-reward transition learns aversion.

Third, in the fruit fly it’s unclear whether a no-reward transition uses the same MBONs as a punishment transition when the only negative learning data is from punishment studies. For example, both γ1-MBON and γ2-MBON are short term memory aversive-learning MBONs, as well as possibly α’3-MBON. A study that does test the no-reward transition as well as the punishment transition can distinguish between the two. [Hancock et al. 2022] includes a non-reward test to narrow the punishment effect to γ1, and notes non-reward depression for γ2, γ3, and γ4, but still treats and non-associative depression as outside the scope of interest.

Alternative learning model

An alternative is to treat all learning transitions as equally important, as opposed classical association’s focus on one a few transitions.

Alternative learning where all transitions are treated as interesting.

In the above diagram, seven of the nine transitions are interesting, and even the two trivial transitions, rewarded intrinsic-approach and punished intrinsic repel, are interesting in context of other transitions, because the implementing circuits might need to remember the reward.

Foraging learning revisited

Returning to the original foraging learning problem:

Learning for an instrinsically-attractive odor.
Learning transitions for an intrinsically-attractive odor.

How might this transition be implemented in the fruit fly mushroom body? A single-neuron implementation might be possible if a reward can reverse a habituation-like LTD for the none transition, such as a dopamine spike leading to LTP (long-term potentiation). A dual-neuron implementation might use one neuron to store the reward vs non-reward state and the other to store a visited-recently data, such as the surprise neuron α’3 [Hattori et al. 2017].

References

Berry JA, Phan A, Davis RL. Dopamine Neurons Mediate Learning and Forgetting through Bidirectional Modulation of a Memory Trace. Cell Rep. 2018 Oct 16

Hancock, C.E., Rostami, V., Rachad, E.Y. et al. Visualization of learning-induced synaptic plasticity in output neurons of the Drosophila mushroom body γ-lobe. Sci Rep 12, 10421 (2022). 

Hattori, Daisuke, et al. Representations of novelty and familiarity in a mushroom body compartment. Cell 169.5 (2017): 956-969.

Tempel, Bruce L., et al. Reward learning in normal and mutant Drosophila. Proceedings of the National Academy of Sciences 80.5 (1983): 1482-1486.

Essay 16: Learning to Ignore

Essay 16 extends the odor seeking (chemotaxis) of essay 15 by adding a single memory item. The memory caches a failed odor search, avoiding the cost of searching for false odors. The neuroscience source is the fruit fly Drosophila. The simulation is still based on a Braitenberg slug with distinct circuits for chemotaxis and for obstacle avoidance.

Mushroom body model

The fruit fly mushroom body (MB) is the learning center. MB is a modulating system: if it’s knocked out, the fruit fly behaves normally, although with only intrinsic, unlearned behavior. Essay 16 focuses on a single MBON short term memory (STM) output neuron, which may specifically be the γ2 neuron.

Architecture of fruit fly mushroom body.
Mushroom body architecture, adapted from [Aso et al. 2014]

For simplicity and focus, essay 16 isn’t implementing the KC yet. Instead, the γ2 MBON receives its input directly from a small number of odor projection neurons (PN) that was implemented in essay 15. Essentially, the input is a small set of primitive odors, where the full KC is a massive combinatorial odor spectrum.

Candidate odors from evolution

To motivate why evolution might develop learning, consider the food-seeking slug from essay 15. Since the food odor in essay 15 perfectly predicted food, there was no reason to learn anything about food. The simulation’s “evolution” has perfectly solved the artificially perfect world, selecting exactly those odors needed to find food.

Choosing the right set of candidate odors is a dilemma for evolution. Too many candidates means wasted search time. Too few candidates avoids wasted time, but misses out on opportunities, which may be a smaller problem than too many candidates because the animal can fall back to random, brownian-motion search. This beast against including semi-predictive odors might mean that early Precambrian evolution might only favor the highest predictors and skip semi-productive odors.

Candidate odor surrounded by distractors

The preceding image represents odors potentially available to the slug from an evolutionary design perspective. If the beige color is the only candidate for food, the slug will ignore the blue-ish colors because it never senses the odor. There’s no need for a circuit or behavior to distinguish the two. For the animal, those odors don’t exist.

Food odors don’t perfectly predict food, either because of lingering odors or simply candidate odors that aren’t always from nutritious food. For example, the fruit fly can taste sweet and it can also sense nutrition from a rest in blood sugar. That distinction between sweet and nutritious is reflected in the mushroom body with specific neurons for each [Owald et al. 2015].

Classical association

For this essay, let’s explore what could be the most trivial memory, in the context of the fruit fly MB. The MB output has only 24 neurons in 15 distinct compartments per hemisphere. Each compartment appears to have specialized roles, such as short term memory (STM) vs long term memory (LTM) [Bouzaiane et al. 2015], and water seeking compared to sugar seeking [Owald et al. 2015].

Although learning studies typically use classical association (Pavlovian) terminology, where a conditioned stimulus (CS) like the food odor becomes associated with an unconditioned stimulus (US) like consuming food, I don’t think that framing is useful for the odor-seeking behavior of the simulated slug.

Naive animal missing food before classical training

In the diagram above, which follows the classical model, the animal (arrow) missing the food (brown square) despite being in the candidate odor’s plume because it hasn’t learned the associate the odor (CS) with the food (US). It only learns the association if it finds the food through brownian random search. Even then, if it randomly hits another food source with a different odor, it will forget the first, limiting the gain from this learning.

Even the non-learning algorithm of essay 14 performs better, because naive searching of all candidate odors is relatively successful, even if slightly time inefficient. Behaviorally, the difference is between default-approach or default-ignore. Default-approach needs to learn to ignore and default-ignore needs to learn to approach.

Learning to ignore

Learning to ignore is an alternative to the classical associative way of looking at the problem. It’s not an argument against classical conditioning in total, but it is a different perspective that highlights different features of the problem.

Successful approach to food and failure

The diagram above shows a successful approach to food and an unsuccessful approach. Both candidate odors potentially signal food because evolution ignores useless odors, but in this neighborhood the reddish odor is a non-food signal. As in essay 14, habituation will rescue the animal from perseveration, spending infinite time exploring a useless odor, but once an odor is found useless, ignoring it from the start would improve search efficiency.

Since odors in a neighborhood are likely similar, the animal is likely to encounter the useless odor soon. So, remembering a single item, like a single item cache, will improve the search by avoiding cost, until the animal reaches an area that does have nutritious food. The single-item cache lets the animal ignore patches of non-predictive odors.

Single item cache (short term memory)

A single mushroom body output neuron (MBON) and its associated dopamine neuron (DAN) can implement a single item cache by changing the weights of the KC to MBON synapses with long-term depression (LTD). Following the previous discussion, since it’s more efficient to remember the last failure than the last success, the learning is LTD at the synapse between the odor and the MBON. In fruit flies, short term memory (STM) is on the order of 2h. (For a fuller discussion between “short” and “long” term see [Sossin 2008].)

Reduced MB circuit for negative learning

In the above diagram, the O2 synapse with a ball represents the LTD cache item. If the animal senses odors for either O1 or O3, it approaches the odor. If it senses O2, it ignores the odor because of the LTD at the synapse. (The colors follow the mnemonic model. Purple represents primary sensor/odor, and blue represents apical/limbic/odor and motivation areas.)

The DAN needs to implement a failure signal to implement LTD, which is actually relatively complicated. Unlike success, which has an obvious direct stimulus when finding food, failure is ambiguous. How long the animal should persist before giving up is a difficult problem, and at very least requires a timer even for the simplest strategy. Because habituation already implements a timeout, an easy solution is to copy the circuit or possibly use its output. So, if the animal exists the odor plume because of habituation, the DAN might signal failure.

Another possibly strategy is for the DAN to continuously degrade the active signal as in habituation, and only rescue the synapse when discovering food. Results from [Berry et al. 2018] show the needed degrading over time (ramping LTD) in their study of MBON-γ2, although that study didn’t explore the rescuing of approach by finding food that we seed.

So, the second strategy might require a second, opposing neuron, which I’ll probably explore later. For this essay, the DAN will produce a failure signal on timeout and a success signal on finding food, something like a reward prediction error signal from reinforcement learning [Sutton and Barto 2018], but without using a reinforcement learning architecture.

Mammalian correlates

In mammals, the KC/MBON synapse with DAN modulation circuit functionally resembles the hippocampus CA3 to CA1 synapse (E.hc, E.ca1, E.ca3) with locus coerulus (V.lc). In mammals, V.lc is known as the primary source of noradrenaline, associated with surprise and orientation, but it also contains dopamine neurons, and strongly innervates the hippocampus.

[Aston-Jones and Cohen 2005] discuss the locus coeruleus involvement in decision-making, specifically in explore vs exploit decision. If time passes and exploitation continues to fail by not finding food, V.lc signaling encourages moving on and exploring different options, a behavior similar to ours.

The E.ca3 to E.ca1 connection (and E.ec, entorhinal cortex) is believed to detect novelty, and V.lc is active during exploration, Like the fruit fly MBON, the hippocampus uses LTD to learn a new place, using V.lc signal [Lemon et al. 2012] like the DAN.

In contrast with my simulated slug, since the E.hc novelty output doesn’t directly drive food approach, because the mammalian brain is far more complex and abstract, the comparison isn’t exact, but it is an interesting similarity.

Simulation: shared habituation

In essay 15, the simulated slug approached an intrinsically attractive odor to find food, but needed a habituation circuit to avoid perseveration. The fruit fly LN1 neurons between the ~50 main olfactory sensory neurons (ORN) and the ~150 olfactory projection neurons (PN) implement primary olfactory habituation. In this essay, I’m essentially adding a second odor to the system. Although the fruit fly has separate habituation circuits for each of the 50 primary odors, it’s interesting to see what a shared habituation circuit might look like in the simulation.

Shared habituation pre-learning circuit

The simulation heat map shows the animal spends much of its time between the odor plumes because the habituation timeout keeps refreshing, despite encountering different odors. While habituation is active, the animal doesn’t approach either odor plume, but mostly moves in the default semi-random pattern. Only when habituation times out will it approach a new odor.

Shared habituation. Warm colors are food-predicting odors, blue are distractors.

Split habituation

The fruit fly has a split habituation unlike the previous simulation. Each primary odor has an independent habituation circuit, which is synapse specific.

Split habituation in pre-learning circuit

In the simulation of split habituation, the animal spends more time investigating the odors because each new odor has its own habituation timeout. It can move from a failed odor and immediately explore a new odor.

Simulation heat map for split habituation

Although the animal spends much of its time exploring the distractor candidate odors, it’s still a big improvement over random search, because it’s more likely to find food instead of a near miss.

Single distractor

Since a single distractor exactly fits the single item cache, it’s unsurprising that adding the cache immediately solves the distractor problem. In the following heat map, the animal only explores the successful candidate odor and ignores the distractor.

Simulation heat map for single-item negative cache

Multiple odors and distractors

Multiple distractor odors is more interesting for a single item cache because it introduces miss-rate as a prominent issue and allows comparison between negative caching and positive caching (classical association). The table below is a summary of feeding time as a success metric for each strategy.

AlgorithmFeeding time
No odor approach0.8%
No learning7.4%
LTD (cache)8.0%
LTP (classical)5.7%
Success comparison for multiple algorithms, measured by feeding time

No odor approach

As a baseline, the first simulation disables all odor approach. The animal only reaches food when it runs into it randomly. While it’s above the food, the animal will slow, improving its efficiency somewhat. This strategy was explored in essay 14, and resembles the feeding of Trichoplax in [Smith et al. 2015].

Simulation heat map for odor-ignoring animal.

As the heat map shows, this strategy is pretty terrible. Because the animal only finds food by randomly crossing it, its success rate is purely a matter of the area covered by food. Although this strategy may have been effective with Precambrian bacteria mats, where finding food isn’t an issue, it’s a problem when finding food is a necessary task.

No learning

Intrinsic chemotaxis is important as a baseline for the learning strategies. In the fruit fly intrinsic odor approach behavior is in the lateral horn. When the MB is disabled, the lateral horn continues to approach odors.

Simulation heat map for non-learning odor-seeking animal.

As the above heat map shows, intrinsic odor approach is a vast improvement over non-chemotaxis, improving food time from 0.8% to 7.4% in this environment.

Negative caching (LTD)

The single item caching that’s the focus of this post improves the food time from 7.4% to 8% by avoiding some of the time spent on non-food odors. The difference isn’t as dramatic as adding odor approach itself, but it’s an improvement.

Simulation heat map for single-item negative cache

In this strategy, the animal remembers the last failure odor, and ignores the odor plume the next time it reaches it. The animal explores all other odors, including failure odors. On a cache miss (failure), the animal remembers the new failure and forgets the old one.

Classical conditioning (LTP)

The next strategy tries to simulate what classical conditioning might look like if it was used for behavior. In this simulation, the animal only follows the odor after it’s associated with food, which means the animal needs to randomly discover the food first.

Simulation heat map for single-item classical association learning

This strategy is actually worse than the non-learning case, because it only finds one food source at a time. Although the heat map shows both being visited, the areas are actually alternating. One source is the only found food for a long time until the other is randomly discovered, when the roles switch and the first is now ignored.

Simulation limitations

I think it’s important to point out some simulation limitations, particularly since I’ve added performance numbers for comparison. The simulation environment and timings can affect the numbers dramatically. For example, the odor plume size dramatically affects the classical conditioning algorithm. If finding food without following the odor is difficult, the classical conditioning animal will have great difficulty finding a new odor.

Specifically, if the gain from following the odor is large, then classical conditioning will always have a penalty, because it loses out on that gain until it makes its association. In contrast an explore-first strategy will always gain the odor-exploring advantage. If the gain of explore-first outweighs its cost, then a non-learning explore-first will win against associative learning.

cost_cache = p_miss * cost_miss + (1 - p_miss) * cost_hit
cost = cost_cache + cost_noncache

Consider the rough cache cost model above to see some of the issues with the negative cache. If the non-cacheable cost greatly outweighs the cache miss cost, then it doesn’t matter if the animal learns to avoid irrelevant odors. Contrariwise, if the miss cost is very large, then the miss rate is critical.

In addition, the miss rate is highly dependent on spatial and temporal locality. If similar odors are tightly grouped, even a small cache will have a low miss rate. But if there are many different distractor types spread randomly, the cache will miss most of the time.

Links

References

Aso Y, Hattori D, Yu Y, Johnston RM, Iyer NA, Ngo TT, Dionne H, Abbott LF, Axel R, Tanimoto H, Rubin GM. “The neuronal architecture of the mushroom body provides a logic for associative learning.” Elife. 2014

Aston-Jones, Gary, and Jonathan D. Cohen. “An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance.” Annu. Rev. Neurosci. 28 (2005): 403-450.

Berry JA, Phan A, Davis RL. “Dopamine Neurons Mediate Learning and Forgetting through Bidirectional Modulation of a Memory Trace.” Cell Rep. 2018

Bouzaiane E, Trannoy S, Scheunemann L, Plaçais PY, Preat T. “Two independent mushroom body output circuits retrieve the six discrete components of Drosophila aversive memory.” Cell Rep. 2015 May 26;11(8):1280-92.

Lemon N, Denise Manahan-Vaughan, “Dopamine D1/D5 Receptors Contribute to De Novo Hippocampal LTD Mediated by Novel Spatial Exploration or Locus Coeruleus Activity, Cerebral Cortex,” Volume 22, Issue 9, September 2012, Pages 2131–2138.

Owald D, Felsenberg J, Talbot CB, Das G, Perisse E, Huetteroth W, Waddell S. “Activity of defined mushroom body output neurons underlies learned olfactory behavior in Drosophila“. Neuron. 2015 Apr 22

Smith CL, Pivovarova N, Reese TS. “Coordinated Feeding Behavior in Trichoplax, an Animal without Synapses.” PLoS One. 2015 Sep 2

Sossin, Wayne S. “Defining memories by their distinct molecular traces.Trends in neurosciences 31.4 (2008): 170-175.

Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.

Powered by WordPress & Theme by Anders Norén