While I was researching how the fruit fly might learn to ignore initially-attractive odors, I ran into a difficulty that most papers aren’t interested in that attractive-to-ignore transition. For this simulation this lack of information means I have needed to guess as to the plausibility. One possible reason for the lack of data might be an over-focus on specifics of classical associative learning.
Foraging and odor-following task
The task here is finding food by following a promising odor. As explored in the essay 15 and essay 16 simulations, this following-odor task is more complicated than it first appears because a naive solution leads to perseveration: never giving up on the odor. Perseveration is potentially fatal for the animal if it can’t break away from a non-rewarding lead, an inability to accept failure. To avoid this fatal flaw, there needs to be a specific circuit to handle that failure, otherwise the animal will follow the odor forever. At the same time, the animal must spend some time exploring the potential food area before giving up (patience). This dilemma is similar to the explore/exploit dilemma for foraging theory and reinforcement learning. A state diagram for the odor-foraging task might look something like the following:
Behavior state transitions for following an odor to food.
This odor task starts when the animal detects the odor. The animal approaches the odor repeatedly until it either finds food, the odor disappears, or the animal gives up. Giving up is the interesting case here because it requires an internal state transition, while all other inputs causes stimulus-reponse transitions: the animal just reacts. Finding food triggers consummation (eating), and disappearing odor voids the task. Both are external stimuli. In contract, giving up requires internal state circuitry to decide when to quit, a potentially difficult decision.
Learning to ignore
The animal can improve its odor-following performance if it can learn to ignore false leads, as explored in essay 15. The animal follows intrinsically-attractive odors and if there’s a reward it continues to approach the odor. But if the animal doesn’t find food, it remembers the odor and ignores the odor the next time. The simulations in essay 15 showed the effectiveness of this strategy, improving food discovery.
Learning transitions for an intrinsically attractive odor.
In the learning diagram above, when the animal finds food in an odor that predicts food, it maintains the intrinsic approach. When the animal doesn’t find food for that odor, it will ignore the odor for the next time. This learning is not simple habituation because the learning depends on the reward outcome, but it’s also not classical associative learning.
Classical associative learning
Classical associative learning (Pavlovian) starts from an initial blank state (tabula rasa) and learns from reward to approach and from punishment to avoid. The null transition, no reward and no punishment, maintains the initial blank state, although this assumption is implicit and not discussed as an important part of the model. A number of points about the classical associative learning model:
Learning transitions for classical associative learning, adding the implicit non-reward transition.
First, the greyed-out transition is an assumption, often untested or if it is tested, it’s dismissed as unimportant. For example, [Tempel et al. 1983] notes that OCT (a testing odor) becomes increasingly aversive even without punishment (a shock), which contradicts the greyed transition, but doesn’t incorporate that observation into the analysis. Similarly, [Berry et al. 2018] found that the test odors have increasing LTD (long-term depression) even without punishment or reward, but relegates that observation as unpublished data, presumably because it was irrelevant to the classical model in the study.
Second, classical association is a blank slate (tabula rasa) model: the initial state is an ignore state. Although fruit flies have intrinsically attractive odors and intrinsically repelling odors, research seems to focus on neutral odors. possibly because classical association expects an initial ignore state. But starting with neutral odors means there’s little data about learning with intrinsically attractive odors. For example attractive odors might be impervious to negative learning. In fruit flies, the lateral horn responds to non-learned behavior, such as intrinsic attractive odors. The mushroom body (learning) might not suppress the lateral horn’s intrinsic behavior for those odors.
Learning transitions for a semi-classical situation where the non-reward transition learns aversion.
Third, in the fruit fly it’s unclear whether a no-reward transition uses the same MBONs as a punishment transition when the only negative learning data is from punishment studies. For example, both γ1-MBON and γ2-MBON are short term memory aversive-learning MBONs, as well as possibly α’3-MBON. A study that does test the no-reward transition as well as the punishment transition can distinguish between the two. [Hancock et al. 2022] includes a non-reward test to narrow the punishment effect to γ1, and notes non-reward depression for γ2, γ3, and γ4, but still treats and non-associative depression as outside the scope of interest.
Alternative learning model
An alternative is to treat all learning transitions as equally important, as opposed classical association’s focus on one a few transitions.
Alternative learning where all transitions are treated as interesting.
In the above diagram, seven of the nine transitions are interesting, and even the two trivial transitions, rewarded intrinsic-approach and punished intrinsic repel, are interesting in context of other transitions, because the implementing circuits might need to remember the reward.
Foraging learning revisited
Returning to the original foraging learning problem:
Learning transitions for an intrinsically-attractive odor.
How might this transition be implemented in the fruit fly mushroom body? A single-neuron implementation might be possible if a reward can reverse a habituation-like LTD for the none transition, such as a dopamine spike leading to LTP (long-term potentiation). A dual-neuron implementation might use one neuron to store the reward vs non-reward state and the other to store a visited-recently data, such as the surprise neuron α’3 [Hattori et al. 2017].
Essay 16 extends the odor seeking (chemotaxis) of essay 15 by adding a single memory item. The memory caches a failed odor search, avoiding the cost of searching for false odors. The neuroscience source is the fruit fly Drosophila. The simulation is still based on a Braitenberg slug with distinct circuits for chemotaxis and for obstacle avoidance.
The fruit fly mushroom body (MB) is the learning center. MB is a modulating system: if it’s knocked out, the fruit fly behaves normally, although with only intrinsic, unlearned behavior. Essay 16 focuses on a single MBON short term memory (STM) output neuron, which may specifically be the γ2 neuron.
For simplicity and focus, essay 16 isn’t implementing the KC yet. Instead, the γ2 MBON receives its input directly from a small number of odor projection neurons (PN) that was implemented in essay 15. Essentially, the input is a small set of primitive odors, where the full KC is a massive combinatorial odor spectrum.
Candidate odors from evolution
To motivate why evolution might develop learning, consider the food-seeking slug from essay 15. Since the food odor in essay 15 perfectly predicted food, there was no reason to learn anything about food. The simulation’s “evolution” has perfectly solved the artificially perfect world, selecting exactly those odors needed to find food.
Choosing the right set of candidate odors is a dilemma for evolution. Too many candidates means wasted search time. Too few candidates avoids wasted time, but misses out on opportunities, which may be a smaller problem than too many candidates because the animal can fall back to random, brownian-motion search. This beast against including semi-predictive odors might mean that early Precambrian evolution might only favor the highest predictors and skip semi-productive odors.
Candidate odor surrounded by distractors
The preceding image represents odors potentially available to the slug from an evolutionary design perspective. If the beige color is the only candidate for food, the slug will ignore the blue-ish colors because it never senses the odor. There’s no need for a circuit or behavior to distinguish the two. For the animal, those odors don’t exist.
Food odors don’t perfectly predict food, either because of lingering odors or simply candidate odors that aren’t always from nutritious food. For example, the fruit fly can taste sweet and it can also sense nutrition from a rest in blood sugar. That distinction between sweet and nutritious is reflected in the mushroom body with specific neurons for each [Owald et al. 2015].
Classical association
For this essay, let’s explore what could be the most trivial memory, in the context of the fruit fly MB. The MB output has only 24 neurons in 15 distinct compartments per hemisphere. Each compartment appears to have specialized roles, such as short term memory (STM) vs long term memory (LTM) [Bouzaiane et al. 2015], and water seeking compared to sugar seeking [Owald et al. 2015].
Although learning studies typically use classical association (Pavlovian) terminology, where a conditioned stimulus (CS) like the food odor becomes associated with an unconditioned stimulus (US) like consuming food, I don’t think that framing is useful for the odor-seeking behavior of the simulated slug.
Naive animal missing food before classical training
In the diagram above, which follows the classical model, the animal (arrow) missing the food (brown square) despite being in the candidate odor’s plume because it hasn’t learned the associate the odor (CS) with the food (US). It only learns the association if it finds the food through brownian random search. Even then, if it randomly hits another food source with a different odor, it will forget the first, limiting the gain from this learning.
Even the non-learning algorithm of essay 14 performs better, because naive searching of all candidate odors is relatively successful, even if slightly time inefficient. Behaviorally, the difference is between default-approach or default-ignore. Default-approach needs to learn to ignore and default-ignore needs to learn to approach.
Learning to ignore
Learning to ignore is an alternative to the classical associative way of looking at the problem. It’s not an argument against classical conditioning in total, but it is a different perspective that highlights different features of the problem.
Successful approach to food and failure
The diagram above shows a successful approach to food and an unsuccessful approach. Both candidate odors potentially signal food because evolution ignores useless odors, but in this neighborhood the reddish odor is a non-food signal. As in essay 14, habituation will rescue the animal from perseveration, spending infinite time exploring a useless odor, but once an odor is found useless, ignoring it from the start would improve search efficiency.
Since odors in a neighborhood are likely similar, the animal is likely to encounter the useless odor soon. So, remembering a single item, like a single item cache, will improve the search by avoiding cost, until the animal reaches an area that does have nutritious food. The single-item cache lets the animal ignore patches of non-predictive odors.
Single item cache (short term memory)
A single mushroom body output neuron (MBON) and its associated dopamine neuron (DAN) can implement a single item cache by changing the weights of the KC to MBON synapses with long-term depression (LTD). Following the previous discussion, since it’s more efficient to remember the last failure than the last success, the learning is LTD at the synapse between the odor and the MBON. In fruit flies, short term memory (STM) is on the order of 2h. (For a fuller discussion between “short” and “long” term see [Sossin 2008].)
Reduced MB circuit for negative learning
In the above diagram, the O2 synapse with a ball represents the LTD cache item. If the animal senses odors for either O1 or O3, it approaches the odor. If it senses O2, it ignores the odor because of the LTD at the synapse. (The colors follow the mnemonic model. Purple represents primary sensor/odor, and blue represents apical/limbic/odor and motivation areas.)
The DAN needs to implement a failure signal to implement LTD, which is actually relatively complicated. Unlike success, which has an obvious direct stimulus when finding food, failure is ambiguous. How long the animal should persist before giving up is a difficult problem, and at very least requires a timer even for the simplest strategy. Because habituation already implements a timeout, an easy solution is to copy the circuit or possibly use its output. So, if the animal exists the odor plume because of habituation, the DAN might signal failure.
Another possibly strategy is for the DAN to continuously degrade the active signal as in habituation, and only rescue the synapse when discovering food. Results from [Berry et al. 2018] show the needed degrading over time (ramping LTD) in their study of MBON-γ2, although that study didn’t explore the rescuing of approach by finding food that we seed.
So, the second strategy might require a second, opposing neuron, which I’ll probably explore later. For this essay, the DAN will produce a failure signal on timeout and a success signal on finding food, something like a reward prediction error signal from reinforcement learning [Sutton and Barto 2018], but without using a reinforcement learning architecture.
Mammalian correlates
In mammals, the KC/MBON synapse with DAN modulation circuit functionally resembles the hippocampus CA3 to CA1 synapse (E.hc, E.ca1, E.ca3) with locus coerulus (V.lc). In mammals, V.lc is known as the primary source of noradrenaline, associated with surprise and orientation, but it also contains dopamine neurons, and strongly innervates the hippocampus.
[Aston-Jones and Cohen 2005] discuss the locus coeruleus involvement in decision-making, specifically in explore vs exploit decision. If time passes and exploitation continues to fail by not finding food, V.lc signaling encourages moving on and exploring different options, a behavior similar to ours.
The E.ca3 to E.ca1 connection (and E.ec, entorhinal cortex) is believed to detect novelty, and V.lc is active during exploration, Like the fruit fly MBON, the hippocampus uses LTD to learn a new place, using V.lc signal [Lemon et al. 2012] like the DAN.
In contrast with my simulated slug, since the E.hc novelty output doesn’t directly drive food approach, because the mammalian brain is far more complex and abstract, the comparison isn’t exact, but it is an interesting similarity.
Simulation: shared habituation
In essay 15, the simulated slug approached an intrinsically attractive odor to find food, but needed a habituation circuit to avoid perseveration. The fruit fly LN1 neurons between the ~50 main olfactory sensory neurons (ORN) and the ~150 olfactory projection neurons (PN) implement primary olfactory habituation. In this essay, I’m essentially adding a second odor to the system. Although the fruit fly has separate habituation circuits for each of the 50 primary odors, it’s interesting to see what a shared habituation circuit might look like in the simulation.
Shared habituation pre-learning circuit
The simulation heat map shows the animal spends much of its time between the odor plumes because the habituation timeout keeps refreshing, despite encountering different odors. While habituation is active, the animal doesn’t approach either odor plume, but mostly moves in the default semi-random pattern. Only when habituation times out will it approach a new odor.
Shared habituation. Warm colors are food-predicting odors, blue are distractors.
Split habituation
The fruit fly has a split habituation unlike the previous simulation. Each primary odor has an independent habituation circuit, which is synapse specific.
Split habituation in pre-learning circuit
In the simulation of split habituation, the animal spends more time investigating the odors because each new odor has its own habituation timeout. It can move from a failed odor and immediately explore a new odor.
Simulation heat map for split habituation
Although the animal spends much of its time exploring the distractor candidate odors, it’s still a big improvement over random search, because it’s more likely to find food instead of a near miss.
Single distractor
Since a single distractor exactly fits the single item cache, it’s unsurprising that adding the cache immediately solves the distractor problem. In the following heat map, the animal only explores the successful candidate odor and ignores the distractor.
Simulation heat map for single-item negative cache
Multiple odors and distractors
Multiple distractor odors is more interesting for a single item cache because it introduces miss-rate as a prominent issue and allows comparison between negative caching and positive caching (classical association). The table below is a summary of feeding time as a success metric for each strategy.
Algorithm
Feeding time
No odor approach
0.8%
No learning
7.4%
LTD (cache)
8.0%
LTP (classical)
5.7%
Success comparison for multiple algorithms, measured by feeding time
No odor approach
As a baseline, the first simulation disables all odor approach. The animal only reaches food when it runs into it randomly. While it’s above the food, the animal will slow, improving its efficiency somewhat. This strategy was explored in essay 14, and resembles the feeding of Trichoplax in [Smith et al. 2015].
Simulation heat map for odor-ignoring animal.
As the heat map shows, this strategy is pretty terrible. Because the animal only finds food by randomly crossing it, its success rate is purely a matter of the area covered by food. Although this strategy may have been effective with Precambrian bacteria mats, where finding food isn’t an issue, it’s a problem when finding food is a necessary task.
No learning
Intrinsic chemotaxis is important as a baseline for the learning strategies. In the fruit fly intrinsic odor approach behavior is in the lateral horn. When the MB is disabled, the lateral horn continues to approach odors.
Simulation heat map for non-learning odor-seeking animal.
As the above heat map shows, intrinsic odor approach is a vast improvement over non-chemotaxis, improving food time from 0.8% to 7.4% in this environment.
Negative caching (LTD)
The single item caching that’s the focus of this post improves the food time from 7.4% to 8% by avoiding some of the time spent on non-food odors. The difference isn’t as dramatic as adding odor approach itself, but it’s an improvement.
Simulation heat map for single-item negative cache
In this strategy, the animal remembers the last failure odor, and ignores the odor plume the next time it reaches it. The animal explores all other odors, including failure odors. On a cache miss (failure), the animal remembers the new failure and forgets the old one.
Classical conditioning (LTP)
The next strategy tries to simulate what classical conditioning might look like if it was used for behavior. In this simulation, the animal only follows the odor after it’s associated with food, which means the animal needs to randomly discover the food first.
Simulation heat map for single-item classical association learning
This strategy is actually worse than the non-learning case, because it only finds one food source at a time. Although the heat map shows both being visited, the areas are actually alternating. One source is the only found food for a long time until the other is randomly discovered, when the roles switch and the first is now ignored.
Simulation limitations
I think it’s important to point out some simulation limitations, particularly since I’ve added performance numbers for comparison. The simulation environment and timings can affect the numbers dramatically. For example, the odor plume size dramatically affects the classical conditioning algorithm. If finding food without following the odor is difficult, the classical conditioning animal will have great difficulty finding a new odor.
Specifically, if the gain from following the odor is large, then classical conditioning will always have a penalty, because it loses out on that gain until it makes its association. In contrast an explore-first strategy will always gain the odor-exploring advantage. If the gain of explore-first outweighs its cost, then a non-learning explore-first will win against associative learning.
Consider the rough cache cost model above to see some of the issues with the negative cache. If the non-cacheable cost greatly outweighs the cache miss cost, then it doesn’t matter if the animal learns to avoid irrelevant odors. Contrariwise, if the miss cost is very large, then the miss rate is critical.
In addition, the miss rate is highly dependent on spatial and temporal locality. If similar odors are tightly grouped, even a small cache will have a low miss rate. But if there are many different distractor types spread randomly, the cache will miss most of the time.
Essay 15 is adding food-seeking to the simulated slug. Before the change in essay 14, the slug didn’t seek from a distance, but it does slow when it’s above food to improve feeding efficiency. The slug doesn’t have food-approach behavior, but it does have consummatory behavior. Because the slug doesn’t seek food, it only finds food when it randomly crosses a tile. Most of its movement is random, except for avoiding obstacle.
In the screenshot above, the slug is moving forward with no food senses and no food approach. Its turns are for obstacle avoidance. The food squares have higher visitation because of the slower movement over food. Notice that all areas are visited, although there is a statistical variation because of the obstacle.
Although the world is tiled for simulation simplicity, the slug’s direction, location, and movement is floating-point based. The simulation isn’t an integer world. This continuous model means that timing and turn radius matters.
The turning radius affects behavior in combination with timing, like the movement-persistence circuit in essay 14. The tuning affects the heat map. Some turning choices result in the animal spending more time turning in corners when the dopamine runs out. This turn-radius dependence occurs in animals as well. The larva zebrafish has 13 stereotyped basic movements, and its turns have different stereotyped angles depending on the activity.
Food approach
Food seeking adds complexity to the neural circuit, but it’s still uses a Braitenberg vehicle architecture. (See the discussion on odor tracking for avoided complexity.) Odor sensors stimulate muscles in uncrossed connections to approach the odor. Touch sensors use crossed connections to avoid obstacles. For simultaneous odor and touch, an additional command neuron layer resolves the conflict to favor touch.
The command neurons correspond to vertebrate reticulospinal neurons (B.rs) in the hindbrain. Interestingly, the zebrafish circuit does seem to have direct connections from the touch sensors to B.rs neurons, exactly as pictured. In contrast, the path from odor receptors to B.rs neurons is a longer, more complicated path.
For the slug’s evolutionary parallel, the odor’s attractively is hardcoded, as if evolution has selected an odor that leads to food. Even single-celled animals follow attractive chemicals and avoid repelling chemicals, and in mammals some odors are hardcoded as attractive or repelling. For now, no learning is occurring.
Perseveration (tunnel vision)
Unfortunately, this simple food circuit has an immediate, possibly fatal, problem. Once the slug detects an odor, it’s stuck moving toward it because our circuit can’t break the attraction. Although the slug never stops, it orbits the food scent, always turning toward it. The hoped-for improvement of following an odor to find food is a disaster.
In psychology, this inability to switch away is called perseveration, which is similar to tunnel vision but more pathological. Once a goal is started, the person can’t break away. In reinforcement learning terminology, the inability to switch is like an animal stuck on exploiting and incapable of breaking away to explore.
In the screenshot, the heat map shows the slug stuck on a single food tile. The graph shows the slug turning counter-clockwise, slowed down (sporadically arrested) over the food.
To solve the problem, one option is to re-enabled satiation for the simulation, as was added in essay 14, but satiation only solves the problem if the tile has enough food to satiate the animal. Unfortunately, the tile might have a lingering odor but no food, or possibly an evolutionary food odor that’s unreliable, only signaling food 25% of the time. Instead, we’ll introduce habituation: the slug will become bored of the odor and start ignoring it for a time.
In the fruit fly the timescale for habituation is about 20 minutes to build up and 20-40 minutes to recover. The exact timing is likely adjustable because of the huge variability in biochemical receptors. So, 20 minutes probably isn’t a hard constant across different animals or circuits for habituation, but more of a range between a few minutes or an hour or so.
Because the minutes to hour timescale for habituation is much wider than the 5ms to 2s range for neurotransmitters, the biochemical implementation is very different. Habituation seems to occur by adding receptors to increase receptor and/or adding more neurotransmitter generators to produce a bigger signal.
Fruit fly odor habituation
[Das 2011] studies the biochemical circuit for fruit fly odor habituation. The following diagram tries to capture the essence of the circuit. The main, unhabituated connection is from the olfactory sensory neuron (ORN) to the projection neuron (PN), which projects to the mushroom body. The main fast neurotransmitter for insects is acetylcholine (ACh), represented by beige.
The key player in the circuit is the NMDA receptor on the PN neuron’s dendrite, which works with the inhibitory LN1 GABA neuron to increase inhibition over time to habituate the odor.
The LN1 neuron drives habituation. Its GABA neurotransmitter release inhibits PN, which reduces the olfactory signal. Because LN1 itself can be inhibited, this circuit allows for a quick reversal of habituation. Habituation adds to simple inhibition by increasing the synapse strength (weight) over time when its used and decreasing the weight when its idle.
An NMDA receptor needs both a chemical stimulus (glutamate and glycine) and a voltage stimulus (post-synaptic activation, PN in this case). When activated, it triggers a long biochemical chain with many genetic variations to change synapse weight. In this case, it triggers a retrograde neurotransmitter (such as nitrous oxide, NO) to the pre-synaptic LN1 axon, directing it to add new GABA release vesicles. Because adding new vesicles takes time (20 minutes) and removing the vesicles also takes time (20-40 minutes), habituation adds longer, useful behavior that will help solve the odor perseveration problem.
Slug with habituation
The next slug simulation adds trivial habituation following the example of the fruit fly. The simulation adds a value when the slug senses an odor and decrements the value when the slug doesn’t detect an odor. When habituation crosses a threshold, the odor sensor is cut off from the command neurons. The behavior and heat map look like the following:
As the heat map shows, habituation does solve the fatal problem of perseveration for food approach. The slug visits all the food nodes without having any explicit directive to visit multiple nodes. Adding a simple habituation circuit, creates behavior that looks like an explore vs exploit pattern without the simulation being designed around reinforcement learning. Explore vs exploit emerges naturally from the problem itself.
In the screenshot, the bright tile has no intrinsic meaning. Because habituation increases near any food tile, it can only decay when away from food. That bright tile is near a big gap that lets habituation to drop and recharge.
The essays aren’t designed as solutions; they’re designed as thought experiments. So, their main value is generally the unexpected implementation details or issues that come up in the simulation. For example, how habituation thresholds and timing affect food approach. The bright tile in the last heat map occurs because that food source is isolated from other food sources.
If the slug passes near food sources, the odor will continually recharge habituation and it won’t decay enough to re-enable chemotaxis. The slug needs to be away from food for some time for odor tracking to re-enable. In theory, this behavior could be a problem for an animal. Suppose the odor range is very large and the animal is very slow, like a slug. If simple habituation occurs, the slug might habituate to the odor before it reaches the food, making it give up too soon.
As a possible solution, the LN1 inhibitory neuron that implements habituation could itself be disabled, although that beings back the issue of the animal getting stuck. But perhaps it would instead be diminished instead of being cut off, giving the animal persistence without devolving into perseveration.
Odor vs actual food
Another potential issue is the detection of food itself as opposed to just its odor. If the food tile has food, the animal should have more patience than if the tile is empty with just the food odor. That scenario might explain why other habituation circuits include serotonin or dopamine as modulators. If food is actually present, there should be less habituation.
The issue of precise values raises calibration as an issue, because evolution likely can’t precisely calibrate cutoff values or calibrate one neuron to another. Some of the habituation and related synapse adjustment may simply be calibration, adjusting neuron amplification to the system. In a sense, that calibration would be learning but perhaps atypical learning.
Das, Sudeshna, et al. “Plasticity of local GABAergic interneurons drives olfactory habituation.” Proceedings of the National Academy of Sciences 108.36 (2011): E646-E654.
Shen Y, Dasgupta S, Navlakha S. Habituation as a neural algorithm for online odor discrimination. Proc Natl Acad Sci U S A. 2020 Jun 2;117(22):12402-12410. doi: 10.1073/pnas.1915252117. Epub 2020 May 19. PMID: 32430320; PMCID: PMC7275754.
In the fruit fly Drosophila, the mushroom body is an olfactory associative learning hub for insects, similar to the amygdala (or hippocampus or piriform, O.pir) of vertebrates, learning which odors to approach and which to avoid. Although the mushroom body (MB) is primarily focused on olfactory senses, it also receives temperature, visual and some other senses. The genetic markers for MB embryonic development in insects match the markers for cortical pyramidal neurons and other learning neurons like the cerebellum granule cells.
While most studies seem to focus on the mushroom body as an associative learning structure, others like [Farris 2011] compare it to the cerebellum for action timing and adaptive filtering.
Essay 15 will likely expand the essay 14 slug that avoided obstacles, adding odor tracking. The mushroom body should show how to improve the slug behavior without adding too much complexity.
The mushroom body is a highly structured system, comprised of Kenyon cells (KC), mushroom body output neurons (MBONs), and dopamine (DAN) and optamine (OAN) neurons (collectively MBIN – mushroom body input neurons). The outputs are organized into exactly output compartments, each with only one or two output neurons (MBONs) [Aso et al. 2014].
Mushroom body architecture. Diagram adapted from [Aso et al. 2014].
The 50 primary olfactory neurons (ORN) and 50 projection neurons (PN) project to 2,000 KC neurons. Each KC neuron connects to 6-8 PNs with claw-like dendrites. The 2,000 KC neurons then converge to 34 MBONs. In vertebrates, divergence-convergence pattern occurs in the cerebellum with granule cells to Purkinje cells, in the hippocampus with dentate gyrus to CA3, in the striatum with MSN (medium spiny neurons) to S.nr/S.nc (substantia nigra), and in cortical areas with large granular areas.
Habituation (ORN, PN, LN1)
Olfactory habituation is handled by the LN1 circuit from the olfactory receptor neuron (ORN) to projection neuron (PN) connection. The fly will ignore an odor after 30 minutes, and the odor remains ignored for 30 to 60 minutes. Since the LN1 GABA inhibition neuron implements the habituation, suppressing LN1 can reverse habituation nearly instantly.
Habituation is odor-specific because it’s synapsed based, but a single neuron inhibits multiple odors. (There are multiple LN1 inhibitors, but they’re not odor-specific and inhibit multiple odors.) Habituation is odor-specific because it inhibits on a per-synapse system. LN1 doesn’t inhibit the PN, but inhibits the synapse from an ORN to the target PN. So, only figuring PNs trigger habituation, despite relying on a shared inhibitory neuron. [Shen 2020].
Sparse KC firing (KC, APL)
The 50 projection neurons (PNs) diverge into 2,000 Kenyon cells(KC). Each KC has 6-8 claw-like dendrites to 6-8 PNs. Each fly has a random connection between the PNs and its KCs. The KCs that fire for an odor form a “tag,” which is essentially a hash of the odor inputs.
About 5% of the KCs fire for any odor, which is strictly enforced by the APL circuit. APL is a single inhibitory GABA neuron per side that reciprocally connects with all 2,000 KCs. The APL inhibition threshold ensures only the strongest 5% of KCs will fire.
A computer algorithm by [Dasgupta et al. 2017] uses this KC tag for a “fly hash” algorithm, which computes a hash using a random, sparse connection for locality-sensitive hashing (LSH). LSH retains information from the origin data to retain distance between hashes as reflective as distance between odors. Interestingly, they use the fly hash with visual input, treating pixels as equivalent to odors. Despite the significant differences between unique odors and arbitrary pixels, the results for the fly hash are better than other similar LSH systems.
Dopamine and compartments (MBON, DAN)
The output compartments for the mushroom body are broadly organized into attractive and repellant groups with seven repelling compartments and nine attracting compartments. Three of the compartments (g1-g3) feed forward into other compartments.
Each compartment is fed by one or two dopamine neuron types (DAN), for a total of 20 DAN types for 16 MBON compartments. The repellant section (PPL1) is even more restricted with exactly one dopamine neuron per compartment. The attractive section (PAM) has a bound 20 DAN cells per compartment.
In a study of larval Drosophila [Eichler et al. 2020] studied the entire connectome of the mushroom body and found highly specific reciprocal MBON connections, inhibitory, directly excitatory and axon excitatory.
An [Eschbach et al. 2020] study looked at dopamine (DAN) connective. Each DAN type has distinct inputs from raw value stimuli (unconditioned stimuli – US), and also internal state, and feed back from the mushroom body. So, while the DANs do convey unconditioned stimuli for learning, they’re not a simplistic reward and punishment signal.
Essay 15 simulation direction
Essay 15 will likely continue the slug-like animal from essay 14, and add odor approaching. Because I think the slug will need habituation immediately, I’ll probably explore habituation first.
The essay will probably next explore a single KC to a single MBON, simulating a single odor.
I’m tempted to explore multiple MBON compartments before exploring multiple KCs. Partially for general contrariness: pushing against the simple reward/punishment model to see if it goes anywhere, or if multiple dopamine sources needlessly complicated learning.
And, actually, since I want to push against learning as a solution for everything, to see how much of the mushroom body function can work without any learning at all, treating habituation as short term memory, not learning.
References
Aso Y, Hattori D, Yu Y, Johnston RM, Iyer NA, Ngo TT, Dionne H, Abbott LF, Axel R, Tanimoto H, Rubin GM. “The neuronal architecture of the mushroom body provides a logic for associative learning.” Elife. 2014 Dec 23;3:e04577. doi: 10.7554/eLife.04577. PMID: 25535793; PMCID: PMC4273437.
Dasgupta S, Stevens CF, Navlakha S. “A neural algorithm for a fundamental computing problem.” Science. 2017 Nov 10;358(6364):793-796. doi: 10.1126/science.aam9868. PMID: 29123069.
Eichler, Katharina, et al. “The complete connectome of a learning and memory centre in an insect brain.” Nature 548.7666 (2017): 175-182.
Eschbach, Claire, et al. “Recurrent architecture for adaptive regulation of learning in the insect brain.” Nature Neuroscience 23.4 (2020): 544-555.
Farris, Sarah M. “Are mushroom bodies cerebellum-like structures?.” Arthropod structure & development 40.4 (2011): 368-379.
Shen Y, Dasgupta S, Navlakha S. “Habituation as a neural algorithm for online odor discrimination.” Proc Natl Acad Sci U S A. 2020 Jun 2;117(22):12402-12410. doi: 10.1073/pnas.1915252117. Epub 2020 May 19. PMID: 32430320; PMCID: PMC7275754.