Associative Learning - ACT-R

Associative Learning - ACT-R

An Updated Associative Learning Mechanism Robert Thomson & Christian Lebiere Carnegie Mellon University Overview What is Associative Learning (AL) and why do we need it? History of AL implementation in ACT-R Bayesian log-likelihood transformations From a Bayesian to Hebbian Implementation Recent Neural Evidence: Spike Timing Dependent Plasticity A balanced associative learning mechanism Hebbian and anti-Hebbian associations Interference-driven decay Early Results: Serial Order / Sequence Learning

2 What is Associative Learning? Associative learning is one of two major forms of learning The other is reinforcement, although they are not necessarily distinct kinds It is a generalized version of classical conditioning You mentally pair two stimuli (or a behavior and a stimulus) together In Hebbian terms: things that fire together, wire together ACT-R 6 currently does not have a functional associative learning mechanism implemented 3 Why have Associative Learning? It instantiates many major phenomena such as:

Binding of episodic memories / context sensitivity Anticipation of important outcomes Non-symbolic spread of knowledge Top-down perceptual grouping effects Sequence Learning Prediction Error (Rescorla-Wagner learning assumption) It is flexible, stimulus-driven (and orderdependent) Without associative learning its very hard to chain together non-symbolic information E.g., Chunks with no overlapping slot values yet are found in similar contexts, such as learning unfamiliar sequences 4 History of Associative Learning in ACT-R In ACT-R 4 (and 5), associative learning was driven by Bayesian logodds Association strength (sji) estimated log-likelihood of how much the presence of chunk j (the context) increases the probability that chunk i

will be retrieved 5 Issues with Bayesian Approach Based on log-likelihood of recall, if two chunks (i and j) arent associated together, then the odds of one being recalled in the context of another is 50% In a robust model, these chunks may have been recalled many times without being in context together However, once these items are associated, because of the low , the odds of recalling i in the context of j ends up being much lower than if they were never associated together 6 Issues with Bayesian Approach Sma

Sji Associative Strength x 0 1 ... - 7 ACT-R 6 Spreading Activation Set spread (Smax) using :mas parameter Due Mature models cant recall high-fan items

due to interference Sma x Associative Strength to log-likelihood calculation, high fan items have their sji become inhibitory This can lead to catastrophic failure 0 fanj i Sji ... - 8

From Bayesian to Hebbian Really, the Bayesian approach of ACT-R 4 really isnt that different from more neurallyinspired Hebbian learning +retrieval> ISA action light green GREEN When looking DM retrieval In both cases, stuff that fire together, wire together GO to update associative learning in ACT-R 6, we went to look at recent development in neural Hebbian-style learning Recent work on spike-timing dependent plasticity inspired a re-imagination of INHIBITION in ACT-R associative learning 9 Traditional (Neural) Hebbian Approaches

Before getting too deep into our approach, heres some necessary background: Synchronous Neurons that fire together wire together Change in wij is a rectangular time window t j Synapse association is increased if pre- and post-synaptic neurons fire within a given temporal resolution i 1 j ti wij 0 -t

t 0 10 Traditional Hebbian Approaches Asynchronous Change in wij is a gaussian window Very useful in sequence learning (Gerstner & van Hemmen, 1993) t Synapse association is j t i increased if pre-synaptic 1 w spike arrives just before post-synaptic spike j i ij Partially-causal firing

0 -t t 0 11 Recent Neural Advances: Spike Timing Dependent Plasticity Spike-based formulation of Hebbian Learning If the pre-synaptic firing occurs just before the post-synaptic firing, we get long-term potentiation However, if the postsynaptic firing occurs just before the presynaptic firing, we get long-term depression (Anti-Hebbian Learning) 12

Neural Evidence Post-synaptic NMDA receptors use calcium channel signal that is largest when back-prop action potential arrives shortly after the synapse was active (pre-post spiking) Triggers LTP (similar to asynchronous Hebbian learning) You also see the same NMDA receptors trigger LTD when the back-prop action potential arrives BEFORE the pre-synaptic synapse was active (post-pre spiking) Seen in hippocampal CA1 neurons (Wittenberg & Wang, 2006) This is different from GABAergic inhibitory interneurons, which have also been extensively studied throughout cortical regions Which I would argues is more like partial matching / similarity 13 From Bayesian to Hebbian Revisited Weve just reviewed some interesting evidence for timing-dependent excitation AND inhibition

Why is inhibition so important? 1. There needs to be a balance in activation. 2. Its neurally-relevant (and necessary) 3. The alternatives arent neurally-plausible But weve waited long enough, so lets proceed to the main event: 14 A Balanced Associative Learning Mechanism Instead of pre-synaptic and postsynaptic firing, we look at: 1. The state of the system when a retrieval request is made 2. The state of the system after the chunk is placed in the buffer +retrieval> ISA action Where light

color green retriev Hebbian learning occurs when a request is made DM GO-1 al GREEN-1 Anti-Hebbian learning occurs after the retrieval 15 Hebbian Learning Component Initially based on a set spread (similar to :mas) divided evenly by the number of slots in the source chunk This is subject to change as we implement into ACT-R

Ideally Id like this to be driven by base level / preexisting associative strength (variants of the Rescorla-Wagner learning rule and Hebbian delta rule) Interference-driven decay is another possibility The sources are the contents of the buffers One change we made was to have the sources be only the difference in context for reasons well get into 16 Anti-Hebbian Learning Component Its intuitive to think that a retrieved chunk spreads activation to itself Thats how ACT-R currently does it However, this tends to cause the most recently-retrieved chunk to be the most likely to be retrieved again (with a similar retrieval request) You can easily get into some pretty nasty loops where the chunk is so active you cant retrieve any

other chunk BLI and declarative FINSTs somewhat counteract this 17 Anti-Hebbian Learning Component Instead, we turned this assumption on its head! A retrieved chunk inhibits itself, while spreading activation to associated chunks By self-inhibiting the chunk you just retrieved, you can see how this could be applied to sequence learning The retrieved chunks the spread activation to the next item in the sequence while inhibiting their own retrieval This is a nice sub-symbolic / mechanistic re-construing of base-level inhibition It also could be seen as a neural explanation for the production system matching a production then advancing to the next state 18 Anti-Hebbian Learning Component The

main benefit of having an inhibitory association spread is that it provides balance with the positive spread This helps keep the strength of associations in check (i.e., from growing exponentially) for commonly retrieved chunks Still, we havent spent much time saying exactly what were going to inhibit! 19 What do we Inhibit? You could just inhibit the entire contents of the retrieved chunk In pilot models of sequence learning, if the chunk contents werent very unique, then the model would tend to skip over chunks The positive spread would be cancelled out by the negative spread

In the example to the right, assume each line is +1 or -1 spread (613) 513 868 613 5 RECALL 6 1 3 5 1 3 8 6 8 SA1 : IN1: +3 +2 +1

-3 -2 -1 SA2 : SA3 : -1 + 1 3 -1 -1 +3 20 Context-Driven Effects When lists have overlapping contexts (i.e., overlapping slot-values), then there

are some interesting effects: 1. If anti-hebbian inhibition is spread to all slots, then recall tends to skip over list elements until you get a sufficiently unique context 2. If anti-hebbian inhibition is only spread to unique context, then theres a smaller fan, which facilitates sequence-based recall The amount of negative association spread is the same, the difference is just how diluted the spread is 21 How else could we Inhibit? Instead, we attempted to only spread and inhibit the unique context SA1 SA2 : :SA3 : This sharpened the association and led to better

sequence recall As you can see, you get more distinct association in sequence learning Essentially, you (almost) always get full inhibition of previously recalled chunk +3 -1 -1 +2 + 1-3 +1 -1 +3 (613) 513 868 613 5 RECALL 6 1 3

S1 : S2 : S3 : + 3 -3 -1 5 1 3 + 2 + 3 -3 8 6 8 +1 -3 +3 22 Differences from Bayesian By

moving away from log-likelihood and into a pure hebbian learning domain, weve eliminated the issue of high fan items receiving negative spread Also, this move allows us to model inhibition in a neurally-plausible manner You cant easily model negative likelihoods (inhibition) using a log-based notation because negative activations quickly spiral out of control I know someone still wants to ask: why do we NEED to model inhibition? 23 Issues with Traditional Approaches Traditional Hebbian learning only posited a mechanism to strengthen associations, leading modelers to deal with very high associative activations in a mature models You need to balance activations!

Three(ish) general balancing acts: 1) Squash: Fit raw values to logistic/sigmoid type distribution 2) Decay: Have activations decay over time 3) Do Both 24 Squashing Association Strength Most traditional Hebbian-style learning implementations arent very neurally plausible in that our brains dont handle stronger and stronger signals as we learn Many cell assemblies require some form of lateral inhibition to specialize Squashing association strength, generally to a [0 to 1] or [-1 to 1] range, also isnt very neurally plausible Lets

look an at example: 25 Squashing Associations 26 Squashing Association Strength It looks silly as an animation, but its what a lot of implementations do Instead of squashing to a non-linear distribution, we should be trying to find a balance where associative learning is more-or-less zero sum Thats what our mechanism attempts to do, by balancing excitatory and inhibitory associations The goal is to specialize chunk associations by serializing/ sequencing recall Degree of association gain will be based on prior associative strength and/or base-level of involved chunks 27

Interference-Driven Decay Another alternative to squashing is interferencedriven decay Decay based on interference due to list length As the number of items to recall in a similar context grows, the amount of activation spread is reduced We also have a variant based on list length and recency Results fit an exponential decay function (on next slide) Further work will find the balance between interference-driven and temporal-based decay I prefer an expectancy-driven associative system where highly associated chunks wont get a big boost This may be modeled similar to how base-level is calculated 28 Interference-Driven Decay Interference-based Decay 0.8 f(x) = 0.89 x^-0.88 R = 0.97

0.7 0.6 P(recall) 0.5 P(recall) Power (P(recall)) 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 Lists 29

An Example: Serial Order Effects Recall a list of 8 chunks of 3 elements in sequence Assume spread of 3 No full-chunk repetition No within-chunk confusion Chunk ----(8 0 6) 1.0) (1 (4 9 1) 1.0) (5 (6 7 5) 2.0) (5 0 5) 1.0) (4 (3 2 4) Associations -----------(8 . -1.0) (0 . -1.0) (6 . -1.0) (4 . 1.0) (9 . . 1.0) (4 . -1.0) (9 . -1.0) (1 . -1.0) (6 . 1.0) (7 . . 1.0) (6 . -1.5) (7 . -1.5) (0 . 1.0) (5 . (0 . -1.0) (5 . -2.0)

. 1.0) (2 . -1.5) (4 . -1.5) (3 . 1.0) (2 . (6 . 1.0) (9 . 30 Serial Order: Confusion Matrix We get serial order for free by context-driven asynchronous spread of activation Chunk Order Emergent property of model, wasnt Recall Position expected 1 2 3 4 5 6 7 0.865 0.000

0.025 0.025 0.015 0.025 0.020 0.020 0.025 0.780 0.005 0.055 0.040 0.045 0.040 0.055 0.015 0.030 0.710 0.010 0.080 0.080 0.045 0.050 0.015 0.050 0.045 0.645 0.030

0.090 0.090 0.055 0.005 0.045 0.050 0.045 0.585 0.045 0.120 0.085 0.020 0.025 0.070 0.065 0.065 0.540 0.045 0.135 0.030 0.035 0.050 0.085 0.090 0.070 0.545 0.075

8 _ 0.025 0.035 0.045 0.070 0.095 0.105 0.095 0.525 31 Positional Confusion In the ACT-R 4 model of list memory, position was explicitly encoded and similarities were explicitly set between positions (set-similarities pos-3 pos-4 .7) Interestingly, with our model of associative learning, you get some positional confusion for free out of the asynchronous nature of the learning You dont get a fully-developed gaussian

dropoff, but things like rehearsal and base-level decay arent modeled yet 32 Positional Confusion 5 Elements Positional Confusion 0.9 0.8 0.7 P(Recall) 0.6 First Second Third Fourth Fifth 0.5 0.4 0.3 0.2 0.1 0 0.5 1

1.5 2 2.5 3 3.5 4 4.5 5 5.5 Position 33 Future Plans / Open Questions How will we merge associations? Which buffers will be sources of association and which will use associative learning?

Optimize processing costs? Use associative learning to replicate classical-conditioning experiments Extend to episodic-driven recall Use association to drive analogical reasoning 34

Recently Viewed Presentations

  • On Learning from the Research Successful Jean Murray,

    On Learning from the Research Successful Jean Murray,

    On Learning from the Research Successful Jean Murray, University of East London and Pat Mahony, Roehampton University Presentation at the UCET Conference Hinckley Island 10 November 2010 Teacher education and research Institutional issues: past & futures - research audits and...
  • Enel eSolutions Research - Electric Reliability Council of Texas

    Enel eSolutions Research - Electric Reliability Council of Texas

    Finds current tariffs unjust and unreasonable in terms of barriers to storage participation, resulting in reduced competition and higher rates.. Tariffs do not provide bidding parameters for storage resources to offer services other than frequency regulation. Tariffs do not specify...
  • Dr P. Raymond BEGG A.O. B.D.Sc., L.D.S.(Melb.), D.D.Sc.,

    Dr P. Raymond BEGG A.O. B.D.Sc., L.D.S.(Melb.), D.D.Sc.,

    At Dr Begg's last birthday in 1982 Peter Cheng Wayne Sampson Bill Weekes Dr Begg Raj Prasad Ronnie Wong John Jenner Fraser Gurling Milton Sims Dr Begg's orthodontic practice in Shell House ….induction to the Pierre Fauchard Hall of Fame...
  • POLYCARBONATE (PC) CORPORATE TRAINING AND PLANNING Polycarbonate (PC)

    POLYCARBONATE (PC) CORPORATE TRAINING AND PLANNING Polycarbonate (PC)

    Medical Equipment's Applications of PC in Optical Diffusers, lenses for lighting, vacuum metallised reflectors, housing for steel lamps and traffic signals, lamp holders, bulk heads, light fixture, lenses and safety glasses, window panels ( for out door lighting) sunglasses, ski...
  • SMBL and Blast - University of Wisconsin-Madison

    SMBL and Blast - University of Wisconsin-Madison

    SMBL and Blast Joe Rinkovsky Unix Systems Support Group Indiana University Introduction IU has around 2000 Windows PCs in public Student Technology Centers Condor is used to harvest unused cycles Simple Message Brokering Library(SMBL) used for parallelizing applications on Windows...
  • Data Flow Diagrams - University of California, Berkeley

    Data Flow Diagrams - University of California, Berkeley

    Chapter 6 (with additions by Yale Braunstein) Key Definitions A process model is a formal way of representing how a business operates Data flow diagramming shows business processes and the data that flows between them Key Definitions Logical process models...
  • Chapter 21 Heian-kyo - Medieval Japan

    Chapter 21 Heian-kyo - Medieval Japan

    21.3 The Rise of the Fujiwara Family. During much of the Heian period, aristocrats were the political and cultural leaders of Japan. By the mid-9th century, real power in the imperial court shifted from the emperor to the aristocratic families.
  • Length Conversions mm, cm, m, km There are

    Length Conversions mm, cm, m, km There are

    Length Conversions mm, cm, m, km Length Conversions mm, cm, m, km There are… 10 in 1cm mm There are… 100 in 1m cm There are… 1000 in 1m mm There are… 500 in ½m mm There are… 750 in...