Milgram’s Obedience Experiments
Relaunched: 27 February 2018
“I observed a mature and initially poised businessman enter the laboratory smiling and confident. Within 20 minutes he was reduced to a twitching, stuttering wreck who was rapidly approaching nervous collapse. He constantly pulled on his earlobe and twisted his hands. At one point he pushed his fist into his forehead and muttered, ‘Oh, God, let’s stop it!’ And yet he continued to respond to every word of the experimenter and obeyed to the end.”
– Stanley Milgram, 1963
Stanley Milgram’s ‘electric shock’ experiments of the 1960s and 1970s – and the many replications and variations of them throughout the Western world and way beyond – are some of the most audacious, genuinely creative and thought-provoking sociopsychological studies ever undertaken. They provide truly disturbing insights into the human readiness to obey those in authority to the point of carrying out horrific acts of violence, secure in the knowledge that the person is ‘doing the right thing’ and that no unpleasant consequences will follow from carrying out those orders. Yet the experiments are as controversial for validity of their methodologies as their results. The theory the experiments gave birth to, Agentic Shift Theory (aka Agency Theory), despite the strength of its explanatory powers, is now seen as only a partial explanation and that other elements often need to be factored in to explain blind obedience and the appalling harm it so often leads to.
The video below provides a basic overview of what might be termed the ‘Milgram phenomenon’.
Background to the experiments
While Milgram’s experiments and the theory they birthed are potentially relevant to all epochs and eras of human existence, they are located generally in the aftermath of World War II and specifically in the implications of the notorious 1961 trial of Adolf Eichmann, the so-called ‘architect of the Holocaust’ who had been commended for the efficiency of the death camp under his command at Auschwitz
In the aftermath of the Holocaust and the Nuremberg Trials, many psychologists and sociologists were fascinated with explaining how such an advanced and civilised people as the Germans – including men who were faithful husbands, good fathers and otherwise law-abiding citizens – could have indulged systematically in such barbarism and cruelty. The defence of many Nazis and concentration camp guards that they were ‘just following orders’ reached its apogee in Eichmann’s trial. Many of a dispositional view argued that, so despicable were the atrocities, there must be something inherently evil in the German nature – a sort of national character defect.
This meme formed the basis of Theodore Adorno et al’s (195o) investigation and their concept of the authoritarian personality. (Briefly, an authoritarian personality looks to obey superiors, is rigid in moral views, intolerant of difference and looks to scapegoat those who are different for whatever problems are being experienced.)
However, political theorist Hannah Arendt (1963), covering the Eichmann trial in Jerusalem for The New Yorker, coined the phrase, ‘the banality of evil’, writing: “It would have been comforting indeed to believe that Eichmann was a monster… The trouble with Eichmann was precisely that so many were like him, and that the many were neither perverted nor sadistic, that they were, and still are, terribly and terrifyingly normal.”
Stanley Milgram was of a situationalist viewpoint. He thought it possible most people could do serious injury to others if ordered to by the ‘right’ authority in the ‘right’ context. He aimed to test the hypothesis, “Germans are different”, by investigating how the situational context could lead ordinary people to show obedience to authority and inflict harm on others. Arendt’s ‘banality of evil’ was to be put to the test.
The basic set-up
Part of Milgran’s genius was to create a unique and controlled standardised procedure and then change one (indepedent) variable at a time to see what the effect would be (dependent variable).
The studies all took place at Yale University. The basic premise had the participants deceived into thinking it was a test of learning – the effect of punishment on recall. In a rigged draw, the naive participant was always assigned the role of ‘teacher’ and a confederate, ‘Mr Wallace’, played the role of ‘learner’. Mr Wallace, a mild-mannered 47-year-old accountant, advised that he had had a heart complaint in the past but would participate nonetheless. The researcher in his lab coat was 31-year-old ‘Jack Williams’. (‘Wallace’ and ‘Williams’ rehearsed for a fortnight before the first trial.)
Participants were shown the equipment – a shock generator with 30 switches and lights going from 15v to 450v with various descriptions about the shock levels (ranging from “slight shock” to “danger: severe shock” – the final 2 switches were labelled “XXX”) and a chair in the next room, with straps on it wired to the generator. The teacher was shown the learner being strapped into this so he was immobile and electrode paste applied “to avoid blisters and burns”. The teacher was given a sample shock of 45v from a battery wired into the generator to convince him the shocks were indeed real. The teacher was assured that, although the higher voltage shocks would be painful, there would be no permanent tissue damage.
A word association test was the learning task. The teacher was instructed to read a list of 2 word pairs – such as “blue/girl”, “fat/neck” – and the learner was supposed to memorise them. The teacher next read the first word of each word pair again and asked the learner to choose the correct second word from a choice of 4. The learner indicated his choice by pressing one of 4 switches in front of him which in turn lit up one of 4 numbered quadrants located above the shock generator. If the learner got the answer correct, then they would move on to the next word. If the answer was incorrect, the teacher was instructed by the researcher to deliver an electric shock to the learner. Each incorrect answer incurred a 15v increase in the shock administered. Approximately 3/4 of the answers were scripted to be incorrect and the electric shocks were fake – the only genuine shock administered was the teacher’s sample shock.
In the very first set of trials, the teacher received no feedback from the ‘punished’ learner – visual or auditory – other than a brief pounding on the wall at 300v. In this condition, there was 100% obedience to the researcher’s commands. All the participants delivered the full (and fatal) 450v.
The classic study
Milgram’s next experiment – reported in detail in 1963 – is the one which appears in text books and caused outrage among behavioural scientists and, to some degree, the general public. (It is sometimes referred to as the ‘classic study’.)
40 adult male volunteers aged between 20 and 50 (a self-selected volunteer sample answering a newspaper advert or direct mail) were paid $4.00 (plus 50c carfare) to take part in a study of ‘memory and learning’. They were New Haven locals (not in high school or college) – a mixture of postal clerks, high-school teachers, salesmen, engineers and labourers, with education ranging from one who had not finished elementary school to doctorates and other professional degrees. The table below left shows the age and occupation types of the sample which Milgram intended to be representative.
The layout was exactly the same as in the first experiment, with the researcher in close proximity to the teacher and a wall separating them from the learner – see graphic below.
The teacher was required to announce the voltage level before administering the shock (to emphasise the severity of the shock). From 75v on the learner grunted; at 120v he started complaining of pain; at 150v he began complaining his heart was bothering him and demanded to be released – although he continued to answer questions; from 270v on he screamed with each shock; at 300v he pounded on the wall; at 315v he refused to answer any more questions and stopped responding; at 330v he screamed and, now hysterical, demanded to be let out; from 345v on he no longer responded. The teacher was then told to treat the absence of response as a wrong answer. When the max of 450v was reached, the teacher was instructed to carry on and to administer the full 450v as the punishment for each unanswered question.
Teachers who protested at giving increasingly-dangerous shocks to someone who was clearly suffering were told they had to continue by the researcher with the standardised prompts:-
- “Please continue” or “Please go on”
- “The experiment requires that you continue“
- “It is absolutely essential that you continue”
- “You have no other choice – you must go on”
If the teachers asserted that the learner clearly did not want to continue, the researcher would respond: “Whether the learner likes it or not, you must go on until he has learned all the word pairs correctly. So please go on.” If the teacher still refused to go on, then that trial of the experiment was ended.
The participants did not know until the end of the experiment – when they were reintroduced to ‘Mr Wallace’ – that the learner responses were tape recorded and no shocks were actually administered. Milgram ensured that all participants were fully debriefed and given some assurance that their behaviour in the experiment was normal – whether they had refused to continue or gone on to the full 450v – and that other participants had behaved similarly.
As shown on the chart below left all participants gave shocks up to the 300-volt level, and 26 of the 40 men (65% of participants) continued to the highest level, 450 volts. 14 teachers stopped between 300v and 375v.
Explaining Milgram’s findings
These findings completely contradicted the predicted results that 3% or less would reach 450 volts. (Milgram had surveyed groups of people, including 40 professional psychologists and psychiatrists, 14 Psychology students and various middle-class adults – most of whom thought the teachers would stop at the point (140v) the learner asked to be released.) About 4%, it was speculated, might actually go upto 300v and only a pathological fringe of 1 in 1000 could be predicted to go to the full 450v.
The findings undermined substantially the “Germans are different” theory. (Milgram had originally intended the Yale experiments to be a pilot for actually conducting the study in Germany on Germans; but the results, which Milgram himself was astounded at, rendered the proposed German expedition unnecessary.) On the face of it, Milgram had supported Arendt’s concept of the ‘banality of evil’: the most ‘ordinary’ men could be influenced to do serious harm to – even kill – others if told to do so by a legitimate authority figure the person recognised as having the right to issue such orders. (Ie: the researcher in charge of the experiment at prestigious Yale University, his authority symbolised by his lab coat.)
Many participants asked who would take responsibility for any harmful effects resulting from shocking the learner at such a high level. Upon receiving the answer that the legitimate authority (the researcher) assumed full responsibility, the majority of the teachers seemed to accept this transfer of responsibility (agentic shift) and continue shocking, even though many were obviously extremely uncomfortable in doing so.
The moral strain most of the participants displayed suggested they were not cruel sadists enjoying the learners’ pain but obeying the authority figure (the reseracher) with the greatest reluctance, There were marked effects on the naive participants’ behaviour, with most showing signs of extreme tension. For example, they trembled, sweated, stuttered, groaned, swore, wept, dug their finger nails into their flesh, and 3 had full-blown uncontrollable seizures. (One, a 46-year-old encyclopaedia salesman, had such a violently convulsive seizure, the experiment had to be stopped!) 14 of the 40 showed nervous laughter – though, when debriefed, they made it clear that they weren’t sadists and hadn’t found the experience funny. Many participants heaved a sigh of relief when it was over. Despite the considerable distress most of them experienced, with many of them arguing repeatedly with the researcher, they felt they had no choice other than to obey orders.
This suggests that the BLUE vMEME was dominant in their vMEME stacks. Doing the ‘right thing’ they were ordered to by the legitimate authority was more important than the cost either to themselves or their ‘victims’.
Milgram himself initially identified 13 (mostly situational) factors he believed contributed to the high levels of obedience:-
- The location of the study at a prestigious university provided authority – as did Jack Williams’ lab coat
- Participants assumed the experimenter knew what he was doing, had a worthy purpose and so should be followed
- Participants assumed that the ‘learner’ had consented voluntarily to take part
- The participant didn’t wish to disrupt the experiment because he felt under obligation to the researcher due to his voluntary consent to take part
- The sense of obligation was reinforced because the participant was being paid – although he was told he could leave at any point and would still receive the payment
- Participants believed that the role of learner was determined by chance; therefore, the learner couldn’t really complain
- It was a novel situation for the participant who, therefore, didn’t know how to behave. If the teacher had been able to consult with others, he might have behaved differently
- Some participants assumed that the discomfort caused was minimal and temporary – and that the scientific gains were important. However, others were desperately aware that they almost certainly had seriously hurt the learner and may even have killed him
- As the learner ‘played the game’ up to 300v, some participants assumed the learner might be willing to continue with the experiment
- The participant was torn between the demands of the victim and those of the experimenter
- The 2 demands were not equally pressing and legitimate
- The participant had very little time to resolve the conflict at 300v and didn’t know the victim would remain silent for the rest of the experiment
- The conflict was between 2 deeply-ingrained tendencies – not to harm others and to obey those perceived to be legitimate authorities.
Milgram investigated a number of these and other factors in variations on the shock experiment over the following decade.
From these Milgram increasingly explained his findings through the concept of agency – that people are more willing to take orders – especially when experiencing moral strain (going against their own sense of right and wrong) – if the order-giver is understood to be both a legitimate authority for giving the orders and taking responsibility for the consequences of those orders.
Implications of Milgram’s Agency Shift Theory apply not only to the real-life atrocities of the Second World War but subsequent atrocities such as the My Lai massacre in Vietnam (1968) and the Srebrenica massacre in Bosnia (1995).
John Darley (1992) offers a different, perhaps more in-depth explanation than Milgram for the findings: the possibility that ‘evil’ is latent in all of us and merely requires a conversion process to become active. On that basis, he speculates that Milgram turned innocent participants into evil people. To support his theory, Darley cites Robert Jay Lifton’s (1986) interviews with a number of physicians who had worked in the Nazi death camps. Initially banal, ordinary individuals, by performing their extraordinary evil acts under the auspices of a “demonic killing machine”, themselves changed to become ‘evil people’. This is an interactionist approach – situationalist factors triggering a dispositionalist potency, latent until ‘converted’. Theoretically, this ‘conversion’ could be a case of epigenetic modification.
Criticisms of the classic study
Martin Orne & Charles Holland (1968) claimed that the research lacked experimental realism, meaning that the experimental set-up was simply not believable. They thought the participants realised that the electric shocks were not real because powerful electric shocks were not a believable punishment for making a mistake on a word-pair test. Thus, the research lacked internal validity, as the obedience was not a genuine effect. Orne & Holland claimed the participants were just playing along to please the experimenter – demand characteristics. They based this on Holland’s (1967) replication of Milgram’s experiment, in which he found afterwards that 75% of the participants did not believe the deception.
However, Milgram argued the participants’ stress reactions contradict this, indicating they were so caught up in the situation it seemed real to them, meaning the study did have experimental realism. Additionally, in the post-experimental interview the participants were asked to rate how painful they thought the last few shocks they administered were to the learner on a scale of 1 (‘not at all painful’) to 14 (‘extremely painful’). The mode of the results was 14, with a mean of 13.42. Assuming the participants were answering honestly, they clearly believed they were seriously harming the learner. In a post-experiment questionnaire completed a year later, 56.1% of the participants stated that they “fully believed”, 24% “had some doubts” but believed, 11.4% had doubts but thought it unlikely they were being deceived, 6..1% “just weren’t sure” and only 2.4% were “certain” the shocks were not real. However, Gina Perry (2012) has produced evidence which, on the face of it, seems to contradict the questionnaire results. She found that Taketo Murata, one of Milgram’s research assistants, had divided participants into ‘doubters’ (who thought the shocks were fake) and ‘believers’ (who thought the shocks were real. When Murata looked at the data, he found the believers were more disobedient and gave only low intensity shocks.
Nonetheless, Milgram’s case was strengthened by David Rosenhan’s (1969) replication, after which 70% of his participants claimed to have believed. (Interestingly, Rosenhan also achieved a compliance rate of 85% going all the way to 450v! His sample size, it should be noted, though, was only 20.)
Orne & Holland also claimed that the research lacked mundane realism. The research set-up is unlike real life as it was an artificial, controlled, environment. (How often in real life do people electrocute others for failing a word pair association test?) Consequently, they claimed the findings have low ecological validity as they lack generalisability to real-life settings. However, Milgram argued that, in this case, experimental realism compensated for a lack of mundane realism.
Alex Haslam & Steve Reicher (2012) also dispute the study’s internal validity on the grounds that only the fourth of the researcher’s standardised prompts is a true command – the first 3 are justifications – and they point out that it is use of the fourth prompt at which participants quit, meaning it wasn’t in fact a study about agency and obedience. Haslam & Reicher propose that the study is really about resolving 2 conflicting ‘voices’ – the pleas of the learner and the demands of the researcher – and they link the issue of whom the teacher identifies most with to Social Identity Theory. If they are correct, then the teacher’s responses would have been governed more by the the PURPLE vMEME’s concern with who to belong to than BLUE’s compulsion to do the ‘right thing’. Interestingly, point 10 of Milgram’s original 13 factors explaining obedience (before he focused so heavily on Agentic Shift Theory) is in accord with Haslam & Reicher.
Milgram’s situationalist explanation of obedience in carrying out atrocities has been challenged by David Mandel (1998) whose research has uncovered much evidence of German soldiers willingly taking part in the maltreatment and extermination of Jews.
Most notoriously he quoted the example of the Józefów massacre of 13 July 1942 in Poland. Having notified his men that he had received orders to carry out a mass killing of Jews, Major Wilhelm Trapp of the Reserve Police Battalion 101 told his men that those who did not “feel up to the task of killing Jews” could be assigned to other duties. In spite of it being made clear by Trapp that no stigma would be attached to choosing not to participate, only a dozen of the approximately 500 men chose to extricate themselves from the killing.
Mandel notes instances where German soldiers and concentration camp guards did not require close supervision and the suffering of their victims seemed to cause no moral strain whatsoever. He asserts that opportunities for professional advancement and the lucrative personal gain from plundering Jews and their corpses almost certainly were motivating factors in some instances. To Mandel, Milgram offered little more than an ‘obedience alibi’ for the behaviour of Holocaust perpetrators.
On a technical level the experiment is open to criticisms of population validity – how representative of the general population the 40 local men really were – and gender bias as no females participated.
The fact that the experiment took place in prestigious Yale University, with ‘Jack Williams’ looking and sounding like a competent research assistant (rather than the high school Biology teacher he really was) may have helped some of the participants convince themselves they were in a genuine experiment. To test the importance of the setting, Milgram switched a future version of the experiment to an industrial setting away from the university.
A limitation with Milgram’s research is that he himself did not provide a clear, in-depth explanation for the high levels of obedience to authority that he obtained. Rather it is through the work of others – eg: Darley and the application of the Gravesian approach – that more in-depth explanations can be put forward. While he made a great deal out of the 65% who had obeyed all the way to 450v, Milgram never really explored the issue of why 35% refused to go all the way. Nor does Milgram focus overly on the differences between the 2 first experiments. By making the ‘trauma’ Mr Wallace was going through audible in the second (classic) experiment, Milgram reduced the 100% obedience of the first experiment by 35%.
In fairness, a strength of the study was that it waswell-controlled so that all participants experienced the same conditions, allowing cause-and-effect to be inferred. The high levels of control also meant that the study was replicable.