| Learning | Article View | ||||
| On the File menu, click Print to print the information. | |||||
| IV. | Operant Conditioning |
One of the most widespread and important types of learning is operant conditioning, which involves increasing a behavior by following it with a reward, or decreasing a behavior by following it with punishment. For example, if a mother starts giving a boy his favorite snack every day that he cleans up his room, before long the boy may spend some time each day cleaning his room in anticipation of the snack. In this example, the boy’s room-cleaning behavior increases because it is followed by a reward or reinforcer.
Unlike classical conditioning, in which the conditioned and unconditioned stimuli are presented regardless of what the learner does, operant conditioning requires action on the part of the learner. The boy in the above example will not get his snack unless he first cleans up his room. The term operant conditioning refers to the fact that the learner must operate, or perform a certain behavior, before receiving a reward or punishment.
| A. | Thorndike’s Law of Effect |
Some of the earliest scientific research on operant conditioning was conducted by American psychologist Edward L. Thorndike at the end of the 19th century. Thorndike’s research subjects included cats, dogs, and chickens. To see how animals learn new behaviors, Thorndike used a small chamber that he called a puzzle box. He would place an animal in the puzzle box, and if it performed the correct response (such as pulling a rope, pressing a lever, or stepping on a platform), the door would swing open and the animal would be rewarded with some food located just outside the cage. The first time an animal entered the puzzle box, it usually took a long time to make the response required to open the door. Eventually, however, it would make the appropriate response by accident and receive its reward: escape and food. As Thorndike placed the same animal in the puzzle box again and again, it would make the correct response more and more quickly. Soon it would take the animal just a few seconds to earn its reward.
Based on these experiments, Thorndike developed a principle he called the law of effect. This law states that behaviors that are followed by pleasant consequences will be strengthened, and will be more likely to occur in the future. Conversely, behaviors that are followed by unpleasant consequences will be weakened, and will be less likely to be repeated in the future. Thorndike’s law of effect is another way of describing what modern psychologists now call operant conditioning.
| B. | B. F. Skinner’s Research |
American psychologist B. F. Skinner became one of the most famous psychologists in history for his pioneering research on operant conditioning. In fact, he coined the term operant conditioning. Beginning in the 1930s, Skinner spent several decades studying the behavior of animals—usually rats or pigeons—in chambers that became known as Skinner boxes. Like Thorndike’s puzzle box, the Skinner box was a barren chamber in which an animal could earn food by making simple responses, such as pressing a lever or a circular response key. A device attached to the box recorded the animal’s responses. The Skinner box differed from the puzzle box in three main ways: (1) upon making the desired response, the animal received food but did not escape from the chamber; (2) the box delivered only a small amount of food for each response, so that many reinforcers could be delivered in a single test session; and (3) the operant response required very little effort, so an animal could make hundreds or thousands of responses per hour. Because of these changes, Skinner could collect much more data, and he could observe how changing the pattern of food delivery affected the speed and pattern of an animal’s behavior.
Skinner became famous not just for his research with animals, but also for his controversial claim that the principles of learning he discovered using the Skinner box also applied to the behavior of people in everyday life. Skinner acknowledged that many factors influence human behavior, including heredity, basic types of learning such as classical conditioning, and complex learned behaviors such as language. However, he maintained that rewards and punishments control the great majority of human behaviors, and that the principles of operant conditioning can explain these behaviors.
| C. | Principles of Operant Conditioning |
In a career spanning more than 60 years, Skinner identified a number of basic principles of operant conditioning that explain how people learn new behaviors or change existing behaviors. The main principles are reinforcement, punishment, shaping, extinction, discrimination, and generalization.
| C.1. | Reinforcement |
In operant conditioning, reinforcement refers to any process that strengthens a particular behavior—that is, increases the chances that the behavior will occur again. There are two general categories of reinforcement, positive and negative. The experiments of Thorndike and Skinner illustrate positive reinforcement, a method of strengthening behavior by following it with a pleasant stimulus. Positive reinforcement is a powerful method for controlling the behavior of both animals and people. For people, positive reinforcers include basic items such as food, drink, sex, and physical comfort. Other positive reinforcers include material possessions, money, friendship, love, praise, attention, and success in one’s career.
Depending on the circumstances, positive reinforcement can strengthen either desirable or undesirable behaviors. Children may work hard at home or at school because of the praise they receive from parents and teachers for good performance. However, they may also disrupt a class, try dangerous stunts, or start smoking because these behaviors lead to attention and approval from their peers. One of the most common reinforcers of human behavior is money. Most adults spend many hours each week working at their jobs because of the paychecks they receive in return. For certain individuals, money can also reinforce undesirable behaviors, such as burglary, selling illegal drugs, and cheating on one’s taxes.
Negative reinforcement is a method of strengthening a behavior by following it with the removal or omission of an unpleasant stimulus. There are two types of negative reinforcement: escape and avoidance. In escape, performing a particular behavior leads to the removal of an unpleasant stimulus. For example, if a person with a headache tries a new pain reliever and the headache quickly disappears, this person will probably use the medication again the next time a headache occurs. In avoidance, people perform a behavior to avoid unpleasant consequences. For example, drivers may take side streets to avoid congested intersections, citizens may pay their taxes to avoid fines and penalties, and students may do their homework to avoid detention.
| C.2. | Reinforcement Schedules |
A reinforcement schedule is a rule that specifies the timing and frequency of reinforcers. In his early experiments on operant conditioning, Skinner rewarded animals with food every time they made the desired response—a schedule known as continuous reinforcement. Skinner soon tried rewarding only some instances of the desired response and not others—a schedule known as partial reinforcement. To his surprise, he found that animals showed entirely different behavior patterns.
Skinner and other psychologists found that partial reinforcement schedules are often more effective at strengthening behavior than continuous reinforcement schedules, for two reasons. First, they usually produce more responding, at a faster rate. Second, a behavior learned through a partial reinforcement schedule has greater resistance to extinction—if the rewards for the behavior are discontinued, the behavior will persist for a longer period of time before stopping. One reason extinction is slower after partial reinforcement is that the learner has become accustomed to making responses without receiving a reinforcer each time. There are four main types of partial reinforcement schedules: fixed-ratio, variable-ratio, fixed-interval, and variable-interval. Each produces a distinctly different pattern of behavior.
On a fixed-ratio schedule, individuals receive a reinforcer each time they make a fixed number of responses. For example, a factory worker may earn a certain amount of money for every 100 items assembled. This type of schedule usually produces a stop-and-go pattern of responding: The individual works steadily until receiving one reinforcer, then takes a break, then works steadily until receiving another reinforcer, and so on.
On a variable-ratio schedule, individuals must also make a number of responses before receiving a reinforcer, but the number is variable and unpredictable. Slot machines, roulette wheels, and other forms of gambling are examples of variable-ratio schedules. Behaviors reinforced on these schedules tend to occur at a rapid, steady rate, with few pauses. Thus, many people will drop coins into a slot machine over and over again on the chance of winning the jackpot, which serves as the reinforcer.
On a fixed-interval schedule, individuals receive reinforcement for their response only after a fixed amount of time elapses. For example, in a laboratory experiment with a fixed-interval one-minute schedule, at least one minute must elapse between the deliveries of the reinforcer. Any responses that occur before one minute has passed have no effect. On these schedules, animals usually do not respond at the beginning of the interval, but they respond faster and faster as the time for reinforcement approaches. Fixed-interval schedules rarely occur outside the laboratory, but one close approximation is the clock-watching behavior of students during a class. Students watch the clock only occasionally at the start of a class period, but they watch more and more as the end of the period gets nearer.
Variable-interval schedules also require the passage of time before providing reinforcement, but the amount of time is variable and unpredictable. Behavior on these schedules tends to be steady, but slower than on ratio schedules. For example, a person trying to call someone whose phone line is busy may redial every few minutes until the call gets through.
| C.3. | Punishment |
Whereas reinforcement strengthens behavior, punishment weakens it, reducing the chances that the behavior will occur again. As with reinforcement, there are two kinds of punishment, positive and negative. Positive punishment involves reducing a behavior by delivering an unpleasant stimulus if the behavior occurs. Parents use positive punishment when they spank, scold, or shout at children for bad behavior. Societies use positive punishment when they fine or imprison people who break the law. Negative punishment, also called omission, involves reducing a behavior by removing a pleasant stimulus if the behavior occurs. Parents’ tactics of grounding teenagers or taking away various privileges because of bad behavior are examples of negative punishment.
Considerable controversy exists about whether punishment is an effective way of reducing or eliminating unwanted behaviors. Careful laboratory experiments have shown that, when used properly, punishment can be a powerful and effective method for reducing behavior. Nevertheless, it has several disadvantages. When people are severely punished, they may become angry, aggressive, or have other negative emotional reactions. They may try to hide the evidence of their misbehavior or escape from the situation, as when a punished child runs away from home. In addition, punishment may eliminate desirable behaviors along with undesirable ones. For example, a child who is scolded for making an error in the classroom may not raise his or her hand again. For these and other reasons, many psychologists recommend that punishment be used to control behavior only when there is no realistic alternative.
| C.4. | Shaping |
Shaping is a reinforcement technique that is used to teach animals or people behaviors that they have never performed before. In this method, the teacher begins by reinforcing a response the learner can perform easily, and then gradually requires more and more difficult responses. For example, to teach a rat to press a lever that is over its head, the trainer can first reward any upward head movement, then an upward movement of at least one inch, then two inches, and so on, until the rat reaches the lever. Psychologists have used shaping to teach children with severe mental retardation to speak by first rewarding any sounds they make, and then gradually requiring sounds that more and more closely resemble the words of the teacher. Animal trainers at circuses and theme parks use shaping to teach elephants to stand on one leg, tigers to balance on a ball, dogs to do backward flips, and killer whales and dolphins to jump through hoops.
| C.5. | Extinction |
As in classical conditioning, responses learned in operant conditioning are not always permanent. In operant conditioning, extinction is the elimination of a learned behavior by discontinuing the reinforcer of that behavior. If a rat has learned to press a lever because it receives food for doing so, its lever-pressing will decrease and eventually disappear if food is no longer delivered. With people, withholding the reinforcer may eliminate some unwanted behaviors. For instance, parents often reinforce temper tantrums in young children by giving them attention. If parents simply ignore the child’s tantrums rather than reward them with attention, the number of tantrums should gradually decrease.
| C.6. | Generalization and Discrimination |
Generalization and discrimination occur in operant conditioning in much the same way that they do in classical conditioning. In generalization, people perform a behavior learned in one situation in other, similar situations. For example, a man who is rewarded with laughter when he tells certain jokes at a bar may tell the same jokes at restaurants, parties, or wedding receptions. Discrimination is learning that a behavior will be reinforced in one situation but not in another. The man may learn that telling his jokes in church or at a serious business meeting will not make people laugh. Discriminative stimuli signal that a behavior is likely to be reinforced. The man may learn to tell jokes only when he is at a loud, festive occasion (the discriminative stimulus). Learning when a behavior will and will not be reinforced is an important part of operant conditioning.
| D. | Applications of Operant Conditioning |
Operant conditioning techniques have practical applications in many areas of human life. Parents who understand the basic principles of operant conditioning can reinforce their children’s appropriate behaviors and punish inappropriate ones, and they can use generalization and discrimination techniques to teach which behaviors are appropriate in particular situations. In the classroom, many teachers reinforce good academic performance with small rewards or privileges. Companies have used lotteries to improve attendance, productivity, and job safety among their employees.
Psychologists known as behavior therapists use the learning principles of operant conditioning to treat children or adults with behavior problems or psychological disorders. Behavior therapists use shaping techniques to teach basic job skills to adults with mental retardation. Therapists use reinforcement techniques to teach self-care skills to people with severe mental illnesses, such as schizophrenia, and use punishment and extinction to reduce aggressive and antisocial behaviors by these individuals. Psychologists also use operant conditioning techniques to treat stuttering, sexual disorders, marital problems, drug addictions, impulsive spending, eating disorders, and many other behavioral problems. See Behavior Modification.