This is the final version of the term paper I wrote for my Psych 301 class, "The Psychology of Learning." It is shorter and more narrowly focused. It is 13 pages, not including the title page and references. It does have an extra section about the emergent nature of the power function. So, I shaved off about 4 pages and added one. I have not combined these two papers because I want these posts to accurately reflect what I actually turned in for a grade.
A Brief History of the Mathematical Definition of Forgetting Curves
by Grant S. Robertson
Written July, 2008
When attempting to investigate learning, it is unfortunate that scientists are as of yet unable to take direct measurements, within the brain, of how much has actually been learned at a given point. Indirect measures must instead be used to determine what has been learned, or – more importantly – what has been retained. As is common knowledge, the amount retained appears to decrease over time. (Ebbinghaus, 1885) For most of history, this fact has been either not given much thought or discussed in purely qualitative terms. However, in about 1879 Herman Ebbinghaus began a six-year research project where he devised tests to quantitatively measure retention under various conditions. In 1885, Ebbinghaus published a book wherein he described how he performed his tests and even explained his method of statistical analysis.
Ebbinghaus discovered that retention falls off quickly at first and then more slowly later, in a pattern that has since been referred to alternatively as a “forgetting curve,” “forgetting function,” “retention curve,” or “retention function.” In the 123 years since the publication of Ebbinghaus's work there has been much research done on the nature of these "forgetting curves" and what affects them. After firmly establishing that the amount of material retained in memory does, in fact, follow a curve which is steeper at first and shallower later (e.g. Wylie, 1926; Boreas, 1930; MacLeod, 1988), researchers turned their attentions to determining an exact equation to express the nature of this curve (e.g. Wherry, 1932; Murdock, 1960; Wickelgren, 1974; Rubin & Wenzel, 1996) in a mathematical process called “curve fitting.” Different primary mathematical functions have been attempted, from Ebbinghaus’s first logarithmic function (1885) to exponentials (Murdock, 1960) and power functions (Wickelgren, 1974). After over a hundred years, one enterprising pair of researchers undertook the arduous task of performing a curve fitting analysis for over 210 different data sets, matching each of them to 105 different two-parameter functions (Rubin & Wenzel, 1996). They determined that three different basic functions performed equally well. A further review of what has become known as the “Wickelgren Power Law” (Wixted & Carpenter, 2007) reveals that it not only can be made to fit the existing data but can accurately predict later data-points when given only the first few. As any scientist will tell you, the ability to predict future results is the calling card of a good theory.
The Discovery of Forgetting Curves and Early Research Supporting the Concept
Herman Ebbinghaus (1879 – 1885)
In 1879 Herman Ebbinghaus began his landmark study attempting to quantify memory which he defined “in its broadest sense, including Learning, Retention, Association and Reproduction” (Ebbinghaus, 1885, Preface). In 1885 he published a book of his research called, “Memory: A Contribution to Experimental Psychology” and what a contribution it was. In nine chapters, Ebbinghaus discussed the qualitative nature of the, then current, thinking on the subject, then moved on to provide details on the experiments he performed. In his experiments, Ebbinghaus – using himself as his only subject – would memorize a list of nonsense syllables to the point where he could repeat them twice. Then, depending on varying conditions, attempt to relearn that same list to the point where he could repeat it twice again. This technique was devised to remove ambiguity from his research. Rather than simply asking (himself) how many items from a list he could remember – which he determined to be an unreliable means of measuring retention – he used the savings in time necessary to relearn the lists as a measure of how much had been retained. He used these techniques to test for retention as a function of a variety of different conditions, each described in its own chapter.
The chapter most important for this review is “Chapter 7. Retention and Obliviscence as a Function of the Time.” “Obliviscence” simply means “Forgetting” (A Dictionary of Psychology, 2001 cited in encyclopedia.com, 2008). In this experiment, Ebbinghaus attempted to determine exactly how much he could remember after a given period of time (§ 27). In order to avoid the confounding variable of practicing the same list over and over again, he learned 163 different lists with 13 syllables each and only attempted to learn each list twice: once as the original learning and once at a predefined time after that original learning. He divided the 163 lists into eight different groups and attempted the second learning after a different delay for each group (§ 27). So, after lots of exhausting tests and a bit of adjustment for different conditions, Ebbinghaus arrived at the following simple chart depicting how much he tended to retain after a specific period of time:
Though Ebbinghaus discussed the “curve” produced when these figures are plotted, he never produced an actual plot of the data. Following is a plot of this data created in Microsoft Excel. The smooth curve in the plot is a result of the smoothing feature of the software. Only the actual data points from the table above are of importance.
You may notice that the forth data point is slightly higher than one might expect if memory behaved in a smoothly progressing fashion. Ebbinghaus discusses this in Section 29.2 and attributes it to possible errors in his methodology or the exercise thereof. All-in-all, this plot of the first “forgetting curve” in history is still quite telling. It reveals how retention falls off quickly at first and then shifts to falling off quite slowly. From the curve seen above it is difficult to ascertain whether the actual amount of material retained would drop all the way to zero or would asymptote on some positive, non-zero number.
Groundbreaking as it was, Ebbinghaus’s study was not without its critics. One-hundred years after the publication of Ebbinghaus’s book, Henry L. Roediger summarized these criticisms thus, “First, he employed only one subject-himself. Second, and a more common criticism today, is that the artificiality of Ebbinghaus's experimental conditions guaranteed that nothing important or useful could be found from his research. His research and the tradition it spawned is alleged to lack external validity.” (Roediger, 1985) Though, Roediger’s criticisms do have merit, this author believes it is a bit of an overstep to claim that the entire tradition spawned by Ebbinghaus lacks external validity. The work of Ebbinghaus was a starting point. In the over 100 years since, a great deal of good research has been done to expand on what Ebbinghaus started and to bring it into the real world through practical application.
Follow-up Studies on Forgetting Curves
In 1917 Margaret Wylie began a study wherein she used Chinese symbols to test retention. Rather than learning to recite the symbols or recall them when questioned, the Wylie study merely tested whether participants could recognize the symbols they had seen in the past. Wylie found that the ability to recognize symbols followed the same type of curve that Ebbinghaus observed for relearning nonsense syllables (Wylie, 1926). Th. Boreas published a study in 1930 in which he replicated Ebbinghaus’s work with the addition of using verses instead of nonsense syllables and over a much longer period of time. He found that retention for verses parallels that of nonsense syllables although the slopes in both the short and long terms are shallower. In one case Boreas observed that absolutely none of the original learning for nonsense syllables could be detected after a 10 month period (Boreas, 1930). This may indicate that the forgetting curve does asymptote to zero after all.
A. R. Gilliland tested to see if the Ebbinghaus forgetting curve applied to more real world material. Gilliland used a series of picture cards depicting a complex office scene. Participants were given 30 seconds to study the image and then were immediately asked to recall as much as possible. The quantity of material recalled was used as a baseline for later trials. Gilliland found that initial retention did not fall off as quickly as Ebbinghaus had observed. Gilliland concluded that Ebbinghaus was far too pessimistic in his estimation of how fast retention initially falls off and attributed this to Ebbinghaus’s use of nonsense syllables rather than real world material (Gilliland, 1948). In the end, though, the results Gilliland obtained did reflect more rapid fall off of retention nearer to the original learning than later, thus confirming the crux of the work Gilliland contends to refute.
The forgetting curve has also been explored in the context of witness viability (Deffenbacher, Bornstein, McGorty, & Penrod, 2008). In their meta-analysis of 53 other studies, the researchers determined that retention and recognition of human faces fell off at a rate that matched that described by the Wickelgren Power Law, a function which accurately describes the Ebbinghaus forgetting curve and will be discussed later in this review. Other contexts that have been studied are remembering pictures (MacLeod, 1988), visual memory decay (Gold, Murray, Sekuler, Bennett, & Sekuler, 2005), and even marijuana use (Lane, Cherek, Lieving, & Tcheremissine, 2005).
Finding a Mathematical Definition Through Curve Fitting
Once it had become firmly established that the curve of forgetting did, in fact, follow the same path first described by Herman Ebbinghaus in 1885, researchers began to look for an equation – with a theory to back it up – which accurately expressed the nature of this forgetting curve while allowing for all the variations due to context and material that had been observed over the years. Researchers devised various equations and used curve-fitting (a mathematical process by which parameters of an equation are adjusted until the equation most closely matches the available data) to test these equations against their data.
Ebbinghaus made the first attempt at curve fitting, though he did not call it that, in his book. He devised a logarithmic equation that closely fit the data from his original experiment: b = 100k/((log t)c + k) where b is percent retained, t = time since original learning, and c and k are constants: k = 1.84 and c = 1.25 (Ebbinghaus, 1885, § 29.3). He then created the chart comparing the observed and calculated values shown below:
In order to make these numbers easier to understand a plot of them has been created in Microsoft Excel and is presented below:
Notice how closely the calculated values follow the observed values with the exception of the forth data point discussed earlier.
Further attempts at finding a mathematical expression to define the forgetting curve were hit and miss for a while, as were the theories that accompanied them. Matthew N. Chappell of Columbia University presented an interesting theory that because learning involves a transfer of energy and that energy begins being dissipated right from the beginning then the forgetting curve must necessarily follow a logarithmic function (1931). After extensive searching of the CSA Illumina aggregated database of research articles using search terms such as “AB=forgetting and AB=(curve* or function or functions) AND AB=(mathematical or (curve NEAR (fit or fitting))” no further attempts to express the forgetting curve mathematically could be found up until 1960. At that time, Murdock and Cook published “On fitting the exponential” where they endeavored to educate the Psychology community about the mathematical methods of curve fitting using the exponential as an example(1960). In 1971 a Czech researcher did a review of attempts to mathematically model learning from 1962 to 1971 (Brichacek, 1971) however the full text of the article was not available. This reviewer can only surmise that either not much thought was given to finding an accurate function up until the 1960s or that earlier researchers merely assumed the Ebbinghaus formula was correct.
The Wickelgren Power Law
Wayne Wickelgren, according to Wixted and Carpenter, “studied the time course of forgetting more assiduously and more effectively than anyone since Hermann Ebbinghaus” (Wixted & Carpenter, 2007, p. 133). In 1972, Wickelgren published a major study which included his Strength-Resistance Theory, a mathematical theory based on the logarithm which he claimed accounted for many different aspects of the forgetting curve and how it varies depending on subject matter and context. Wickelgren also presented a considerable body of research illustrating how well his theory matched the data (Wickelgren, 1972). However, just two years later he published his groundbreaking “Single-trace fragility theory of memory dynamics” (Wickelgren, 1974). In this theory, Wickelgren refuted the notion that short-term memory is separate from long-term memory and instead proposed a mathematical theory which encompassed them both. This theory claims that a memory “trace” (a term used by Ebbinghaus (1885, § 26)) can be described by a series of equations that include both an exponential and a power function. A mathematical explanation of these equations is outside the scope of this review (see Wickelgren, 1974, 775-776). For those of us who are not experts in calculus and differential equations, a better – though still not entirely simple – explanation of the math behind this theory can be found in Wixted and Carpenter’s “The Wickelgren Power Law and the Ebbinghaus Savings Function” (2007).
One of the main tenets of this new theory is that the forgetting curve is affected by two primary parameters: the strength of the memory trace and its fragility. Strength refers to the degree of learning associated with a particular memory. Whereas fragility refers to the difficulty the mind will have in retaining that memory over time. If one studied a long series of nonsense syllables until one could recite them easily multiple times the memory trace would likely be characterized by a high degree of strength during the recitations. However it would suffer from a high degree of fragility because there is nothing on which to attach the memory within long term memory and the nonsense syllables have no value for the individual. On the other hand, an important conversation with a loved one might have a memory trace characterized by low strength and low fragility. One might not remember the exact words after just an hour but one would remember the gist of the conversation for perhaps a lifetime. The third parameter is a constant which has yet to be determined. Its purpose is to account for the fact that we measure time in arbitrary units to which brain cells do not adhere. The final parameter is simply the time since the original or most recent learning period.
This reviewer finds Wickelgren’s theory quite compelling not only because it has a simple logical explanation but because it matches the data so very well. In “One hundred years of forgetting: A quantitative description of retention,” Rubin and Wenzel analyzed 210 different data sets and attempted to curve fit each of 105 different two-parameter equations (1996) — not including the additional parameter of time. They made sure to include equations that had been suggested by researchers but also included some relatively random equations just to see what would come up. In the end they presented three different equations as having the most promise: a simple power law, the hyperbola-in-the-square-root-of-t forgetting function (which had never been proposed before), and what they called the “Rubin-Wenzel-Wickelgren-Weibull-Williams-Watts exponential-power law” (p. 758). It is the review of Wickelgren’s work done by Wixted and Carpenter (2007) that is the most convincing. Not only do they clearly explain the logic behind Wickelgren’s theory, but they illustrate how the equation accurately predicts latter data points when only given the first few. As the authors state, this is a much stronger indication of the accuracy of a formula than simply being able to be fit to a complete set of points (p. 133).
Below is reproduced Figure 1 from Wixted and Carpenter’s 2007 paper. In this graph, Wixted and Carpenter use the data from the original Ebbinghaus results and attempt a curve-fitting with both the Wickelgren Power Law and a simple exponential function. You may recognize these data points from the Excel plot presented earlier.
As the caption says, even when only the first five data points are used in the curve-fitting process, the Wickelgren Power Law accurately predicts the last data point even though Ebbinghaus, himself, admitted that the forth data point must be in error (Ebbinghaus, 1885, § 29.2). In fact, when multiple different curve-fittings are overlain on one another, it is almost impossible to discern a difference between them. Pay close attention to the thickness of the plot at the top compared to the bottom. It is slightly thicker at the bottom indicating that the plots don’t line up exactly, but are very close. Compare this to the curve-fitting of a simple exponential to the same data using the same procedure: first with only the first five data points then with one more for each iteration of the process. Notice how the curves produced (though they look like straight lines) do not even intersect the last two data points as oversized as they are. Also notice how the line moves dramatically with each additional data point that is considered. It should be noted that the simple exponential function was not one of the top three functions selected by Rubin and Wenzel (1996) and that none of the other top candidates were analyzed in this manner by Wixted and Carpenter. It would be interesting to see just how well the other two functions stack up against the Wickelgren Power Law under the same graphical analysis.
“The Power Law” as an Emergent Property
An “emergent” property is a property or characteristic that appears only when other more fundamental properties or aspects of a phenomenon are combined. For instance, each water molecule in a lake has a mass and velocity, along with the quantum forces that keep the molecules from getting too close or far away from each other. It is the combined movement of all those individual molecules in a lake or ocean that create the waves and currents that we see. In the same way, it has been shown that the Wickelgren Power Law may likely be the result of a combination of many more fundamental phenomenon acting in concert.
Drawing on the work of Rubin and Wenzel (1996), Sverker Sikström of the University of Toronto used advanced mathematical modeling and simulation to show that power functions appear when multiple different exponential curves are combined. He made what he claims are biologically sound assumptions: that learning rates will vary slightly over time, that individual parts of the system (e.g. brain cells) will have varying effects, or “weights” within the system, and that those weights will have “bounds” or limits on their total effect within the system. Sikström concluded that the power function is a natural result of the combined effect of all the brain cells participating in a network involved in learning a particular thing. These individual brain cells behave in an exponential manner but their combined activity produces the emergent property of the power function for the network as a whole (1999). Additionally, Sikström makes the flat statement in his abstract that “Empirical forgetting curve data have been shown to follow a power function” (1999, p. 460).
Soon after Sikström presented his paper, Richard Anderson of Bowling Green State University did additional work showing that the power function could appear emergently from the combination of many different more fundamental functions. Anderson also simulated the forgetting process using a computer. However, he used range-limited linear functions, range-limited logarithmic functions, basic power functions as well as basic exponential functions as the assumed function for each individual simulated brain cell in the network. He found that the power function emerged most strongly when the variability between the different “cells” in the network was highest in almost all cases. The only exception was when the basic function was already a power function, in which case the simulation always resulted in a power function regardless of variability. Anderson concludes by speculating that the power law, rather than being ubiquitous because it is a fundamental biological property of brain cells, is ubiquitous simply because it the mathematical result of combining many different response curves with a high degree of variability between them (2001).
Conclusion / Discussion
In the hundred-odd years since Herman Ebbinghaus did his first experiments, using savings in learning time as a measure of retention, and made his first attempts at formulating a mathematical function that could describe the forgetting curve he obtained, much research has been done to both verify Ebbinghaus’s findings and to perfect an equation to describe that curve. Many functions have been theorized, and one has been found to both fit the available data and predict future results accurately. Though no wide-spread, unanimous support was found for the “Wickelgren Power Law,” a preponderance of the articles that were discovered seemed to take “the power law” as a given and use it as the basis for further research. In addition, the power law may not be a fundamental property of the nature of learning. However, as a rather ubiquitous emergent property, it will likely be possible to use it as the basis around which to devise learning and education systems.
Anderson, R. B. (2001). The power law as an emergent property. Memory & Cognition, 29(7), 1061-1068. Retrieved from www.csa.com
Brichacek, V. (1971). Mathematical models of learning: II. Ceskoslovenská Psychologie, 15(2), 144-157. Retrieved from www.csa.com
Boreas, T. (1930). Experimental studies on memory. II. the rate of forgetting. Praktika De l'Académie d'Athènes, 5, 382 ff. Retrieved from www.csa.com
Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed practice in verbal recall tasks: A review and quantitative synthesis. Psychological Bulletin, 132(3), 354-380. Retrieved from http://content.apa.org/journals/bul/132/3
Chappell, M. N. (1931). Chance and the curve of forgetting. Psychological Review, 38(1), 60-64
Deffenbacher, K. A., Bornstein, B. H., McGorty, E. K., & Penrod, S. D. (2008). Forgetting the once-seen face: Estimating the strength of an eyewitness's memory representation. Journal of Experimental Psychology: Applied, 14(2), 139-150
Dictionary of Psychology, A. (2001). Retrieved July 2, 2008 from http://www.encyclopedia.com/doc/1O87-obliviscence.html
Ebbinghaus, H. (Ruger, H. A. & Bussenius, C. E., Trans. 1913). Memory: A contribution to experimental psychology. New York, NY, US: Teachers College Press. (Original work published 1885)
Gold, J. M., Murray, R. F., Sekuler, A. B., Bennett, P. J., & Sekuler, R. (2005). Visual memory decay is deterministic. Psychological Science, 16(10), 769-774
Lane, S. D., Cherek, D. R., Lieving, L. M., & Tcheremissine, O. V. (2005). Marijuana effects on human forgetting functions. Journal of the Experimental Analysis of Behavior, 83(1), 67-83
MacLeod, C. M. (1988). Forgotten but not gone: Savings for pictures and words in long-term memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14(2), 195-212
Murdock, B. B. Jr., & Cook, C. D. (1960). On fitting the exponential. Psychological Reports, 6, 63-69.
Roediger, Henry L.,III (1). (1985). Remembering Ebbinghaus. US: American Psychological Association
Rubin, D. C., & Wenzel, A. E. (1996). One hundred years of forgetting: A quantitative description of retention. Psychological Review, 103(4), 734-760
Sikström, S. (1999). Power function forgetting curves as an emergent property of biologically plausible neural network models. International Journal of Psychology.Special Issue: Short-term/working Memory, 34(5-6), 460-464
Wickelgren, W. A. (1974). Single-trace fragility theory of memory dynamics. Memory & Cognition, 2(4), 775-780. Retrieved from www.csa.com
Wixted, J. T., & Carpenter, S. K. (2007). The wickelgren power law and the ebbinghaus savings function. Psychological Science, 18(2), 133-134
Wylie, M. (1926). Recognition of chinese symbols. American Journal of Psychology, 37, 224-232
This post is Copyright © 2009 by Grant Sheridan Robertson.