Operant Conditioning
Relaunched: 3 May 2020
Unlike Classical Conditioning, which is based on association, Operant Conditioning is based on consequences. The basic principle is that behaviour which brings reward is likely to be repeated to gain the reward again, thus reinforcing the behaviour; on the other hand, behaviour which brings punishment is unlikely to be repeated, to avoid the punishment.
Operant Conditioning has its roots in the ‘instrumental learning’ work of Edward Thorndike (1905). His Law of Effect stated positive effects (rewards) of some behaviours ‘stamped in’ those behaviour while negative effects (punishments) of other behaviours ‘stamped out’ those behaviours.
Thorndike had developed his ideas from ‘puzzle box’ experiments, usually with cats. Typically he placed a hungry, young and active cat in a box from which it could only escape by pulling on a loop attached to a string. (In later studies Thorndike used buttons and levers.) To motivate the hungry cat to escape, Thorndike hung fish outside the puzzle box door. The cat initially scratched, clawed and miaowed, exploring all corners and openings in the box and trying to squeeze out. Eventually it clawed at the loop, causing the door to open and allowing the cat to get the fish. On successive trials the cat, apparently going through a sequence of somewhat random behaviours, would work its way to the loop sooner and sooner. Eventually the cat, upon entering the box, would run straight to the loop. Overall the time it took for the cat to escape reduced from about 5 minutes to as little as 5 seconds over 10 trials. By observing and recording the animals’ escapes and escape times, Thorndike was able to graph the times it took for the animals in each trial to escape, resulting in a learning curve. Although the animals had gotten faster and faster with each successive puzzle box trial, the times eventually did level off. Thus, the quickened rate of escape resulted in an S-shaped learning curve.
In 1932 Thorndike modified his Law of Effect, saying: “Rewarding a connection always strengthened it substantially; punishing it weakened it little or not at all.” He had come to the view that behaviour which is stamped in through satisfying results is not greatly weakened by unsatisfying results.
Skinner’s rats
B F Skinner (1938) developed Thorndike’s Law of Effect into Operant Conditioning using what came to be called a ‘Skinner box’ in which a rat operated on the environment.
The Skinner Box contained a lever for a rat to press for food to be delivered. It also had a speaker and lights that could be used to trigger a behaviour. A shock generator was connected to the floor of the box to deliver an electric shock in response to a behaviour. This set-up enabled Skinner to vary the conditions. Eg: a food pellet was only released when the rat pressed the lever while the upper light was on and not when the lower light was on. (In such a case the rat would eventually learn to ignore the lower light and press the lever when the upper light was on.)
What Skinner derived from his experiments with rats in a Skinner box were 4 types of effect:-
- Positive Reinforcement: behaviour results in reward – eg: rat presses lever, gets food
- Negative Reinforcement: behaviour removes unpleasant stimulus – eg: rat presses lever, electric shock ceases
- Positive Punishment: behaviour results in unpleasant stimulus – eg: rat presses lever, receives electric shock
- Negative Punishment: behaviour results in desired entity being removed – eg: rat presses lever and food drops out of bottom of bowl
From these studies, Skinner also concluded there were 2 types of reinforcer:-
- Primary – eg: food
- Secondary – eg: the click of the food dispenser
Secondary reinforcers are associated with a primary reinforcer, Eg: money is a secondary reinforcer associated with buying things someone really needs or wants like food. Skinner found that rats experiencing positive reinforcement – press lever, get food pellet – continued to press the lever even after the food dispenser was empty, providing the animal could hear the food dispenser click. He reasoned the rat had learned an association between the food pellets and the click of the food dispenser, until in the end the sound of the click became a reward in itself. Thus, Skinner termed the food pellets the primary reinforcer and the click a secondary reinforcer. Effectively the association of the secondary reinforcer with the primary reinforcer is Classical Conditioning.
Like Thorndike, Skinner came to the conclusion that reward has a much more significant effect on behaviour than punishment. He thought a key problem with punishment was that it did not strengthen alternate desirable behaviour and thus could only be temporarily effective. However, Richard Solomon (1964) provided evidence that truly severe punishment of a behaviour could result in its permanent disappearance from an animal’s repertory of behaviours.
Skinner found that different reinforcement schedules or patterns produced different results. Reward for every performance of the behaviour – continuous reinforcement – tends to produce low but steady response rates and the behaviour can become extinct quite quickly if reinforcement is withheld. Partial reinforcement, which offers rewards for only some instances of the behaviour, is more effective in bringing about repetitions of the behaviour. Skinner investigated 2 dimensions of reinforcement:-
- Ratio schedules give the reinforcer (food) either at a fixed number of lever presses or at a random number of presses
- Interval schedules give the reinforcer either at a fixed interval of time or at random intervals of time – regardless of the number of lever presses
Thus, the two dimensions together…
He found that variable intervals produced high, steady response rates but variable ratios produced the fastest responses.
John Pearce (1997) asserts that conditioning is more effective if reinforcement is close to the response. However, the reinforcer can be delayed, according to Kennon Lattal & Suzanne Gleeson (1990). Over 20 sessions they found their rats’ lever pressing rate increased steadily even when the reinforcer came upto 30 seconds after the response. The delay between behaviour and reinforcement is known as the post-reinforcement pause. The variable in this may relate to secondary reinforcers which have proved useful in overcoming the effects of delayed reward.
Skinner found he could reduce the amount of time for an animal to produce a desired behaviour through behaviour shaping. This involves breaking complex behaviours down into incremental steps and then rewarding successive approximations. Then:-
- reward first increment
- next reward 2 increments
- next only reward 3 increments
- next only reward 4 increments and so on until…
- finally only reward full behaviour
Famously Skinner taught pigeons to play table tennis using behaviour shaping!
Radical Behaviourism
When John B Watson (1913) had inaugurated Behaviourism to redefine Psychology as the ‘Science of Behaviour’, he had characterised the mind as a ‘black box. which could not be investigated objectively and scientifically. However, in 1974 Skinner wrote (p233): “The organism is not empty, of course, and it cannot adequately be treated like a ‘black box’.”
Skinner (1945) used his work to move beyond this and propose that everything humans do should be considered ‘behaviour’, including ‘private events’ such as thinking and feeling. Moreover, such private events should be seen as subject to the same principles of learning and modification as had been discovered to exist for overt behaviour. Although private events are not publicly observable behaviours, Radical Behaviourism accepts that we are each observers of our own private behaviour. In Radical Behaviourism, what people think or feel, or how they act, doesn’t occur in a vacuum but rather is the result of their experiences and environments. Eg: behaviours such as a person’s acting shy at a social gathering or a boss yelling at an employee seemingly without reason can be attributed to external forces. The shy person may be accustomed to an environment of isolation. In this way Radical Behaviourism is a form of Reciprocal Determinism.
However, in 1974 Skinner was still not acknowledging the existence of the mind. He writes (p18): “The position can be stated as follows: what is felt or introspectively observed is not some nonphysical world of consciousness, mind, or mental life but the observer’s own body.” He goes on: “An organism behaves as it does because of its current structure, but most of this is out of reach of introspection. At the moment we must content ourselves, as the methodological behaviourist insists, with a person’s genetic and environment histories. What are introspectively observed are certain collateral products of those histories.” So Skinner certainly moved beyond environmental determinism…but only to acknowledge genetic determinism as an alternative cause of behaviour.
With Walden Two (1948), a novel depicting a utopian society achieved by conditioning, Skinner took on Sociology.and developed Radical Behaviourism into a political philosophy.The novel describes a fictional ‘experimental community’ in 1940s United States. The productivity and happiness of citizens in this community is far greater than in the outside world because the residents practice scientific social planning and use Operant Conditioning in raising their children.
Skinner made proposals to the US government for programmes to condition the American public, using Operant Conditioning. He asserted that such programmes could eliminate crime and deviance, drastically reduce marital unhappiness and divorce and produce a nation of ‘good citizens’. Inevitably Skinner’s political views produced as many criticisms as his psychological experiments.
One often-repeated story claims that Skinner ventured into human experimentation by raising his daughter Debbie in a Skinner box which led to her life-long mental illness and a bitter resentment towards her father.
Debbie Skinner did indeed spend a large part of the first two years of her life in something that looked like a modified Skinner Box. According to the rumour mill, Debbie’s box included very similar controls to the cage used for and rats pigeons and Skinner tried to condition her in the same way he did the rats and pigeons.
In fact, the ‘Heir Apparent was an attempt by Skinner and his wife to design an alternative to the restricting baby crib in common use then (and today!). It was heated, cooled, had filtered air, allowed plenty of space to walk around in and was much like a miniature version of a modern home. It was designed to make the baby more confident, more comfortable, less sick, less prone to cry, and so on. Reportedly it had some success in these goals.
Attempts were made (unsuccessfully) to market the Heir Apparent commercially but it never really caught on.
Over the years the rumour mill went further and said that Debbie turned psychotic when her father finally let her out. In 2004, psychologist and author Lauren Slater published a book, Opening Skinner’s Box, which incorporated claims that Debbie Skinner unsuccessfully sued her father for abuse and later committed suicide by shooting herself in a Montana bowling alley. In response, Debbie emerged from a relatively low-profile life as a moderately-successful London-based artist to refute the rumours. She blasted Lauren Slater’s book for repeating this urban legend as being vicious and harmful.
Evaluation
There is little doubt that people do learn by Operant Conditioning. People do tend to repeat behaviour they find rewarding and avoid behaviour that leads to punishment. So the concept has strong face validity.
Operant Conditioning has produced several therapeutic techniques. Extinction of undesirable behaviours by removing the positive reinforcer has proved particularly important – especially in parenting. Eg: Carl Williams (1959) has a case study of a 21-month-old male infant who threw tantrums that could last nearly an hour at bedtime. The parents were attempting to cope with this by sitting with the child and comforting him until he fell asleep – thus reinforcing the tantrum behaviour. Under Williams’ advice, the parents refused to respond to the screams and crying of the infant. The tantrum behaviour was extinguished quickly over a number of nights when he was put to bed. Spontaneous recovery took place but this was extinguished too.
‘Token economies’ work on the idea of giving a secondary reinforcer as a reward to enable the individual to get something more meaningful, the primary reinforcer. Token economies have been used in a variety of settings – eg: school sticker charts where the secondary reinforcer is the sticker awarded and the primary reinforcer is the praise of proud parents for the stickers achieved. Token economies have been used with great success in psychiatric units. For example, in a classic 1968 study Teodoro Ayllon & Nathan Azrin looked at how female clients, who had been hospitalised 16 years on average, were rewarded with plastic tokens for behaviours such as making their beds and combing their hair. The tokens were then exchanged for things like watching a movie or being allowed an extra visit to the canteen. The number of daily chores the clients carried out increased from around 5 to over 40.
The use of token economies in psychiatric contexts and its distinction between primary and secondary reinforcer goes some way towards resolving one of the conceptual weaknesses in Operant Conditioning per se – ie: Skinner’s refusal to accept mental processes. Operant Conditioning fails to recognise individual differences. Different people will find different things rewarding or punishing – and sometimes these will change over time and from context to context. What we find rewarding or punishing may well depend on what beliefs and values – schemas – are dominant in our selfplex. Also what vMEMES are dominating our vMEME stacks – ie: producing our motivations – will influence how much we do or don’t find certain things rewarding or punishing.
Where Skinner’s Radical Behaviourism really came unstuck was his proposition that Operant Conditioning could be used to explain how humans learn to speak. Skinner (1957) argued that children learn language based on Behaviourist reinforcement principles – ie: by associating words with meanings. Correct utterances are positively reinforced when the child learns the communicative value of words and phrases. Eg: when the child says ‘milk’ and the mother smiles and give her some as a consequence; the child will find this outcome rewarding, enhancing the child’s language development. However, Skinner’s theory was heavily criticized by Noam Chomsky (1959), the world’s most famous linguist to date. In the spirit of the ‘cognitive revolution’ in the 1950’s, Chomsky argued that children would never acquire the tools needed for processing an infinite number of sentences if the language acquisition mechanism was dependent on language input alone. Consequently, Chomsky (1964) proposed the theory of Universal Grammar: an idea of innate, biological grammatical categories, such as a noun category and a verb category that facilitate the entire language development in children and overall language processing in adults. Universal Grammar is considered to contain all the grammatical information needed to combine these categories – eg noun and verb – into phrases. The child’s task is just to learn the words of their language. Eg: according to the Universal Grammar account, children instinctively know how to combine a noun – eg: a boy – and a verb – eg: to eat – into a meaningful, correct phrase – eg: a boy eats.
Chomsky’s forensic dismantling of Skinner’s language theory not only undermined Radical Behaviourism but Behaviourism as a whole. Nonetheless Skinner remained an influential figure, appearing on a television chat show to promote the application of Behaviourism to society’s ills as late as 1971.