Criticisms of the classic study
Martin Orne & Charles Holland (1968) claimed that the research lacked experimental realism, meaning that the experimental set-up was simply not believable. They thought the participants realised that the electric shocks were not real because powerful electric shocks were not a believable punishment for making a mistake on a word-pair test. Thus, the research lacked internal validity, as the obedience was not a genuine effect. Orne & Holland claimed the participants were just playing along to please the experimenter – demand characteristics. They based this on Holland’s (1967) replication of Milgram’s experiment, in which he found afterwards that 75% of the participants did not believe the deception.

However, Milgram argued the participants’ stress reactions contradict this, indicating they were so caught up in the situation it seemed real to them, meaning the study did have experimental realism. Additionally, in the post-experimental interview the participants were asked to rate how painful they thought the last few shocks they administered were to the learner on a scale of 1 (‘not at all painful’) to 14 (‘extremely painful’). The mode of the results was 14, with a mean of 13.42. Assuming the participants were answering honestly, they clearly believed they were seriously harming the learner. In a post-experiment questionnaire completed a year later, 56.1% of the participants stated that they “fully believed”, 24% “had some doubts” but believed, 11.4% had doubts but thought it unlikely they were being deceived, 6..1% “just weren’t sure” and only 2.4% were “certain” the shocks were not real. However, Gina Perry (2012) has produced evidence which, on the face of it, seems to contradict the questionnaire results.  She found that Taketo Murata, one of Milgram’s research assistants, had divided participants into ‘doubters’ (who thought the shocks were fake) and ‘believers’ (who thought the shocks were real. When Murata looked at the data, he found the believers were more disobedient and gave only low intensity shocks.

Nonetheless, Milgram’s case was strengthened by David Rosenhan’s (1969) replication, after which 70% of his participants claimed to have believed. (Interestingly, Rosenhan also achieved a compliance rate of 85% going all the way to 450v! His sample size, it should be noted, though, was only 20.)

Orne & Holland also claimed that the research lacked mundane realism. The research set-up is unlike real life as it was an artificial, controlled, environment. (How often in real life do people electrocute others for failing a word pair association test?) Consequently, they claimed the findings have low ecological validity as they lack generalisability to real-life settings. However, Milgram argued that, in this case, experimental realism compensated for a lack of mundane realism.

Alex Haslam & Steve Reicher (2012) also dispute the study’s internal validity on the grounds that only the fourth of the researcher’s standardised prompts is a true command – the first 3 are justifications – and they point out that it is use of the fourth prompt at which participants quit, meaning it wasn’t in fact a study about agency and obedience. Haslam & Reicher propose that the study is really about resolving 2 conflicting ‘voices’ – the pleas of the learner and the demands of the researcher – and they link the issue of whom the teacher identifies most with to Social Identity Theory. If they are correct, then the teacher’s responses would have been governed more by the the PURPLE vMEME’s concern with who to belong to than BLUE’s compulsion to do the ‘right thing’. Interestingly, point 10 of Milgram’s original 13 factors explaining obedience (before he focused so heavily on Agentic Shift Theory) is in accord with Haslam & Reicher.

Milgram’s situationalist explanation of obedience in carrying out atrocities has been challenged by David Mandel (1998) whose research has uncovered much evidence of German soldiers willingly taking part in the maltreatment and extermination of Jews.

Most notoriously he quoted the example of the Józefów massacre of 13 July 1942 in Poland. Having notified his men that he had received orders to carry out a mass killing of Jews, Major Wilhelm Trapp of the Reserve Police Battalion 101 told his men that those who did not “feel up to the task of killing Jews” could be assigned to other duties. In spite of it being made clear by Trapp that no stigma would be attached to choosing not to participate, only a dozen of the approximately 500 men chose to extricate themselves from the killing.

Mandel notes instances where German soldiers and concentration camp guards did not require close supervision and the suffering of their victims seemed to cause no moral strain whatsoever. He asserts that opportunities for professional advancement and the lucrative personal gain from plundering Jews and their corpses almost certainly were motivating factors in some instances. To Mandel, Milgram offered little more than an ‘obedience alibi’ for the behaviour of Holocaust perpetrators.

On a technical level the experiment is open to criticisms of population validity – how representative of the general population the 40 local men really were – and gender bias as no females participated.

The fact that the experiment took place in prestigious Yale University, with ‘Jack Williams’ looking and sounding like a competent research assistant (rather than the high school Biology teacher he really was) may have helped some of the participants convince themselves they were in a genuine experiment. To test the importance of the setting, Milgram switched a future version of the experiment to an industrial setting away from the university.

A limitation with Milgram’s research is that he himself did not provide a clear, in-depth explanation for the high levels of obedience to authority that he obtained. Rather it is through the work of others – eg: Darley and the application of the Gravesian approach – that more in-depth explanations can be put forward. While he made a great deal out of the 65% who had obeyed all the way to 450v, Milgram never really explored the issue of why 35% refused to go all the way. Nor does Milgram focus overly on the differences between the 2 first experiments. By making the ‘trauma’ Mr Wallace was going through audible in the second (classic) experiment, Milgram reduced the 100% obedience of the first experiment by 35%.

In fairness, a strength of the study was that it was well-controlled so that all participants experienced the same conditions, allowing cause-and-effect to be inferred. The high levels of control also meant that the study was replicable.

The ‘Obedience’ movie
In 1965 Milgram boosted his burgeoning notoriety with the release of ‘Obedience’, a movie documentary of a repeat of the classic study. Below is an edited compilation of clips from the movie – copyright © 1991 Alexandra Milgram.


According to such commentators as Hugh Coolican (1996), most people who see the movie are convinced that the behaviour of the participants in ‘Obedience’ is authentic and that the stress caused by their moral strain is real.

However, ‘Obedience’ may not be quite what it appears to be, according to Kathryn Millard (2011).

In fact, the raw footage for ‘Obedience’ was shot over a weekend in May 1962, using what Milgram called ‘Condition 25’, a slight variation on the classic study. He used the same actors to play ‘Mr Wallace’ and ‘Jack Williams’ as always and the participants were genuinely naïve. The camera filmed through the same 2-way mirror Milgram used to observe proceedings.

However, it was 1965 before the completed film was made publicly available. Why did it take Milgram so long to make the movie available? Millard (p660) comments on the finished product: “‘Obedience’ is as much art as science, as much drama as experiment. It was carefully art-directed, scripted, shot and edited to accentuate dramatic tension within a seemingly neutral setting. These are compelling images constructed by an accomplished dramatist and filmmaker.”

Ethical issues with the classic study
As Diana Baumrind stated only too clearly in 1964, there are serious ethical issues with Milgram’s experiments. She expressed concern that such behaviours – especially deception – could damage trust in psychologists and their research.

When Milgram began his experiments, the concept of ethical guidelines was in relative infancy. Despite details of the inhumane medical and psychological experiments conducted in the Nazi concentration camps and Japanese facilities like the notorious Unit 731 becoming more widely known, ethical guidelines were developed and adopted relatively slowly. As Thomas Blass (2004, p71) put it: “There were no formal ethical guidelines for the protection of the human subjects. Researchers tended to use their own judgement about whether their research posed an ethical problem…ethical questions…took a back seat to scientific value.”

The first ethical dilemma with Milgram’s experiment is deception. The researcher deceived the participants, who were made to believe that they were truly inflicting pain on the learners and were purposely put in a position of high stress. According to James Nairne (2011), some teachers even believed they had badly hurt, or even killed the learner, causing them a lot of distress.

Milgram also lied about the purpose of the experiment. While it was truly to measure obedience, he told his participants that he was studying the effects of punishment on learning.  This meant the participants couldn’t give fully-informed consent. Although the participants were debriefed after the experiment was over, Nairne asserts many critics believe that it wasn’t enough because it didn’t prevent the subsequent psychological damage that could have affected the participants.

Then there is the question of harm. Clearly the moral strain most participants experienced caused them considerable stresspsychological harm. According to Blass (p115), though, Milgram claimed that “relatively few subjects experienced greater tension than a nail-biting patron at a good Hitchcock thriller”.

However, 2 participants gave accounts to Blass (p116) that contradict Milgram:-

  • William Menold said, “It was hell in there… .[I was] hysterically laughing, but it was not funny laughter…It was so bizarre. And I mean, I completely lost it, my reasoning power”. He said that he couldn’t believe “that somebody could get [him] to do that stuff”.
  • Herbert Winer said that his experience of the experiment was “very difficult to describe…the way [his] feelings changed [about it], and the conflict and tension that arose”, and that his “own heart condition went into an extremely tense and conflicted state”. Talking about the debrief at the end of the study, Winer said he “was angry at having been deceived… resented the whole situation [and] was a little embarrassed at not having stopped earlier”.

Responding to the post-experiment questionnaire a year after the classic study, 84% said they were either glad or very glad to have taken part, 15% were neither glad nor sorry to have taken part and 1.3% were either sorry or very sorry to have taken part. 74% stated that they had learned something of personal importance. The questionnaire seems to justify Milgram’s argument about tension and stress.

Of course, when physiological harm is considered, in the classic study, 3 men had full-blown seizures!

In terms of the ethical guidelines to come, Milgram breached the participants’ privacy by watching them through a 2-way mirror and filming one set of volunteers;. Their right to withdraw was not denied outright but it was undermined by Jack Williams’ verbal prods. Milgram did respect the participants’ confidentiality. None was ever named unless they came forward themselves. In the ‘Obedience’ movie, the 14 volunteers who resisted the researcher’s pressure are well-represented; but only one man out of the 26 who went to 450v is shown. Presumably the other 25 refused to give their permission to be included. Clearly Milgram also respected their right to have their data withdrawn.

As for debrief, Milgram effectively created the modern debrief.

After the 1963 publication such was the professional outrage at what Milgram had put his participants through – though he suggested it was the results people found unacceptable more than the methodology – that his membership of the American Psychological Association (APA) was suspended. Later, however, the APA gave Milgram a major award for his obedience experiments!

Obedience Experiment Variations
In 1974 Milgram published details of a whole series of variations on the original 1962 ‘Obedience Experiment’. He wanted to investigate reasons for the high levels of obedience – how much the concept of agency could be validated and how much moral strain participants could take. So, in a systematic way, he changed one variable at a time to see what the effect was. Apart from that one change, Milgram replicated the original experiment precisely. In all, Milgram studied over 1,000 participants.

Some of the variations are discussed in the clip below from the ‘Obedience’ movie…


Photo copyright © 1991 Alexandra Milgram

What Milgram effectively found was that increasing the proximity of the ‘learner’ to the ‘teacher’ – thus making the learner’s plight more obvious – decreased the % of participants who went to 450v. Decreasing the proximity of learner to teacher increased the % of participants going all the way.  For example, with the learner just one metre from the teacher, seen and heard, obedience decreased to 40%. When learner started expressing reluctance at 150v, the teacher being required to force the learner’s hand down onto the ‘electric plate’ reduced obedience to 30%. (However, the mean shock level was still nearly 270v.)

Decreasing the authority and/or proximity of the researcher also had the effect of decreasing obedience while increasing the researcher’s authority increased obedience. Percentages of participants going to 450v when…

  • Having only to pull a switch to signal an ‘assistant teacher’ (confederate of the researcher) to administer the shocks on the teacher’s behalf,  the assistant obeyed without question all the way – 92.5%
  • Being told that the learner had only agreed to a limited contract – “you let me out when I say so” – 40%
  • The researcher phoned in commands or commands were on pre-recorded on tape – 23%
    (Some of the teachers cheated in this condition and either pretended to administer shocks or gave lower ones than instructed)
  • Another participant (confederate of the researcher) took over the authority role (using the researcher’s brief) when the researcher had to leave the room – 20%
  • 2 fellow ‘teachers’ (confederates of the researcher) were in the room – one reading the list of word pairs, the other informing ‘Mr Wallace’ whether he was right or wrong – and the naive participant  required to administer the shocks – with one ‘upset’ confederate quitting at 150v and the other at 210v – 10%, with 50% refusing to carry on past 150v
    (Interestingly, Milgram carried out an alternative version of this experiment in which 2 confederate teachers gave shocks without protest and chiding the real participant when he expressed discomfort with the procedure, resulting in 72.5% going all the way)
  • 2 researchers were arguing – one telling the participant to stop and the other to continue – 0%
  • The teachers were allowed to set their own voltage levels resulting in lower levels of shock being administered
    (Only one participant went to 450v and 95% refused to administer any more shocks after the learner protested for the first time)

Milgram also considered whether the location of the experiments had any bearing on the authority of the experimenter. He moved the experiment to some run-down offices in nearby industrial Bridgeport, ostensibly run by a private firm, Research Associates of Bridgeport, and had the researcher not wearing the scientist’s lab coat but a warehouseman’s overall. In that condition he found that only 47.5% would go to the full 450v.

Background was also investigated. Participants who had gone on to higher education were less obedient overall – possibly because they had learned to think independently and may also have gone on to higher positions in life where they were used to giving orders, rather than receiving them. Correspondingly, participants with a military background, used to receiving orders, tended to be more obedient. Roman Catholics tended to be the most obedient from amongst those who were members of Christian churches.

To counter the accusations of gender bias with regard to the classic study, Milgram replicated those conditions with 40 local women and found that gender made no difference – 65% went all the way!

A small number of participants in one of the experiments were rated on Lawrence Kohlberg’s Stages of Moral Development 34 according to Milgram  and 27 according to Kohlberg (1984). While the number of defiant participants (8) was small – too small to be anything other than ‘suggestive’ (Milgram) – they undoubtedly scored higher on Kohlberg’s scale – at a Post-Conventional level, indicating the ORANGE and/or GREEN vMEMES were dominating their selfplexes. Accordingly, the more obedient could be deemed to be at a Conventional level and concerned just with obeying the legitimate authority, indicating BLUE was dominant.


