Rescorla-Wagner model
Robert Rescorla (2008), Scholarpedia, 3(3):2237. | doi:10.4249/scholarpedia.2237 | revision #91712 [link to/cite this article] |
The Rescorla-Wagner model is a formal model of the circumstances under which Pavlovian conditioning occurs. It attempts to describe the changes in associative strength (V) between a signal (conditioned stimulus, CS) and the subsequent stimulus (unconditioned stimulus, US) as a result of a conditioning trial. The model emerged in the early 1970s (Rescorla and Wagner 1972) as an attempt to deal with empirical results suggesting that the idea of simple co-occurrence of two events, important in historical philosophical, psychological, and biological thinking, was inadequate. Conditioning phenomena such as "blocking", "relative validity", "correlational effects" and "conditioned inhibition" all suggested that associative mechanisms do not simply count co-occurrences but rather evaluate those co-occurrences in a broader context of the stream of events. Casually put, associative learning occurs not because two events co-occur but because that co-occurrence is unanticipated on the basis of current associative strength. The Rescorla-Wagner model attempts to capture that casual idea in a more formal way.
Contents |
The Model
The model can be viewed as a modification of prior "linear operator" models which describe changes in knowledge as a linear function of current knowledge. It modified such models in two basic ways:
- It described changes in a theoretical associative strength, rather than in overt probabilities directly, and, more importantly,
- it provided a learning rule that made associative changes in each stimulus dependent not only on its own state but also on the state of other stimuli concurrently present.
Consequently, on a learning trial in which a compound stimulus, \(AX\ ,\) is followed by US1, the rules for change in associative strength of \(A\) and \(X\) are: \[\Delta V_A = [\alpha_A\beta_1](\lambda_1 - V_{AX})\] and \[\Delta V_X = [\alpha_X\beta_1](\lambda_1 - V_{AX})\] where \[V_{AX} = V_A + V_X\ .\] In this expression \(\lambda_1\) is the maximum conditioning US1 can produce; it represents the limit of learning. The \(\alpha\) and \(\beta\) are rate parameters dependent, respectively, on the CS and US. These parameters are viewed as having fixed values based on the physical properties of the particular CS and US. On any given trial the current associative strength, \(V_{AX}\ ,\) is compared with \(\lambda\) and the difference is treated like an error to be corrected; this happens by producing a change in associative strength (\(\Delta V\)) accordingly. Consequently, this is an error-correction model.
Examples
Blocking Conditioning
The Kamin blocking effect, in which prior pairing of one stimulus \(A\) with the US makes ineffective the subsequent joint pairing of an \(AX\) with the US, is a good illustration. Prior conditioning of \(A\) results in \(V_A\) being close to \(\lambda\ ;\) then on an \(AX\) trial, because \(V_X\) is zero, \(V_{AX}\) is close to \(\lambda\ ,\) resulting in an error term \((\lambda-V_{AX})\) that is close to zero; hence \(\Delta V_X\) is close to zero and there is little resulting change in \(V_X\ .\)
Correlational Experiments
Correlational experiments, in which little conditioning of \(X\) occurs despite USs occurring during \(X\ ,\) if the US also occurs in the absence of \(X\ ,\) can be seen in the same way, with situational cues playing the role of \(A\ .\)
Conditioned Inhibition
Conditioned inhibition is described as a stimulus with a negative \(V\) which therefore reduces the total positive associative strength on a trial. This is seen as resulting from a paradigm in which \(A\)-US and \(AX\) alone trials are mixed. The \(A\)-US trials result in \(V_A\) close to \(\lambda\ ;\) when the neutral (i.e. \(V = 0\)) stimulus \(X\) is then added to \(A\ ,\) the total \(V_{AX}\) is also \(\lambda\ .\) But the asymptote which the nonreinforcement that follows \(AX\) can support is 0, resulting in an error term on the \(AX\) trials of \((0-V_{AX})\ ,\) a negative number that decrements \(V_X\) from zero, making it negative.
This model predicted a number of previously unknown results, mainly arising from its feature that it is not the power of the US per se but rather the discrepancy of the US from current strength that determines learning: e.g., overexpectation (decrement from reinforcement), superconditioning (superior conditioning from reinforcement in the presence of an inhibitor), and protection from extinction conditioning (when nonreinforcement is carried out in the presence of an inhibitor).
Shortcomings
This model has a number of known shortcomings, such as the failure to correctly predict the conditions under which inhibition is extinguished. It also ignores a number of important features of conditioning paradigms, such as detailed temporal relations. Nevertheless, it continues to be widely cited in textbooks as a good summary of many of the most important Pavlovian conditioning phenomena and it has served as the core around which many subsequent conditioning models have been built. It is essentially identical to the learning algorithm of Widrow an Hoff (1960) which closely corresponds to the delta rule implemented in many connectionist networks.
References
- Rescorla RA, Wagner AR. A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In: Classical Conditioning II: Current Research and Theory (Eds Black AH, Prokasy WF) New York: Appleton Century Crofts, pp. 64-99, 1972
- Sutton RS, Barto AG, Reinforcement Learning, MIT Press, Cambridge, MA 1998
- Widrow B, Hoff ME. Adaptive switching circuits. In: 1960 WESCON Convention Record Part IV, New York: Institute of Radio Engineers, pp. 96-104, 1960 (Reprinted in Neurocomputing: Foundations of Research (Eds Anderson JA, Rosenfeld E) Cambridge, MA: MIT Press, pp. 126-134, 1988)
Internal references
- Peter Redgrave (2007) Basal ganglia. Scholarpedia, 2(6):1825.
- Nestor A. Schmajuk (2008) Classical conditioning. Scholarpedia, 3(3):2316.
- Peter Jonas and Gyorgy Buzsaki (2007) Neural inhibition. Scholarpedia, 2(9):3286.
- Florentin Woergoetter and Bernd Porr (2008) Reinforcement learning. Scholarpedia, 3(3):1448.
- Wolfram Schultz (2007) Reward. Scholarpedia, 2(3):1652.
- Wolfram Schultz (2007) Reward signals. Scholarpedia, 2(6):2184.
- Andrew G. Barto (2007) Temporal difference learning. Scholarpedia, 2(11):1604.
External Links
See also
Actor-Critic Method, Basal Ganglia, Conditioning, Neuroeconomics, Q-Learning, Reinforcement Learning, Reward, Reward Signals, Temporal Difference Learning