Williams and Ceci just released National Hiring Experiments Reveal 2:1 Faculty Preference For Women On STEM Tenure Track, showing a strong bias in favor of women in STEM hiring. I’ve previously argued something like this was probably the case, so I should be feeling pretty vindicated.
But a while ago I wrote Beware The Man Of One Study, in which I wrote that there is such a variety of studies finding such a variety of contradictory things that anybody can isolate one of them, hold it up as the answer, and then claim that their side is right and the other side are ‘science denialists’. The only way to be sure you’re getting anything close to the truth is to examine the literature of an entire field as a gestalt.
And here’s something no one ever said: “Man, I’m so glad I examined the literature of that entire field as a gestalt, things make much more sense now.”
Two years ago Moss-Racusin et al released Science Faculty’s Subtle Gender Biases Favor Male Students, showing a strong bias in favor of men in STEM hiring. The methodology was almost identical to this current study, but it returned the opposite result.
Now everyone gets to cite whichever study accords with their pre-existing beliefs. So Scientific American writes Study Shows Gender Bias In Science Is Real, and any doubt has been deemed unacceptable by blog posts like Breaking: Some Dudes On The Internet Refuse To Believe Sexism Is A Thing. But the new study, for its part, is already producing headlines like The Myth About Women In Science and blog posts saying that it is “enough for everyone who is reasonable to agree that the feminists are spectacular liars and/or unhinged cranks”.
So probably we’re going to have to do that @#$%ing gestalt thing.
Why did these two similar studies get such different results? Williams and Ceci do something wonderful that I’ve never seen anyone else do before – they include in their study a supplement admitting that past research has contradicted theirs and speculating about why that might be:
1. W&C investigate hiring tenure-track faculty; MR&a investigate hiring a “lab manager”. This is a big difference, but as far as I can tell, W&C don’t give a good explanation for why there should be a pro-male bias for lab managers but a pro-female bias for faculty. The best explanation I can think of is that there have been a lot of recent anti-discrimination campaigns focusing on the shortage of female faculty, so that particular decision might activate a cultural script where people think “Oh, this is one of those things that those feminists are always going on about, I should make sure to be nice to women here,” in a way that just hiring a lab manager doesn’t.
Likewise, hiring a professor is an important and symbolic step that…probably doesn’t matter super-much to other professors. Hiring a lab manager is a step without any symbolism at all, but professors often work with them on a daily basis and depend on their competency. That might make the first decision Far Mode and the second Near Mode. Think of the Obama Effect – mildly prejudiced people who might be wary at the thought of having a black roommate were very happy to elect a black President and bask in a symbolic dispay of tolerance that made no difference whatsoever to their everyday lives.
Or it could be something simpler. Maybe lab work, which is very dirty and hands-on, feels more “male” to people, and professorial work, which is about interacting with people and being well-educated, feels more “female”. In any case, W&C say their study is more relevant, because almost nobody in academic science gets their start as a lab manager (they polled 83 scientists and found only one who had).
2. Both W&C and MR&a ensured that the male and female resumes in their study were equally good. But W&C made them all excellent, and MR&a made them all so-so. Once again, it’s not really clear why this should change the direction of bias. But here’s a hare-brained theory: suppose you hire using the following algorithm: it’s very important that you hire someone at least marginally competent. And it’s somewhat important that you hire a woman so you look virtuous. But you secretly believe that men are more competent than women. So given two so-so resumes, you’ll hire the man to make sure you get someone competent enough to work with. But given two excellent resumes, you know neither candidate will accidentally program the cyclotron to explode, so you pick the woman and feel good about yourself.
And here are some other possibilities that they didn’t include in their supplement, but which might also have made a difference.
3. W&C asked “which candidate would you hire?”. MR&a said “rate each candidate on the following metrics” (including hireability). Does this make a difference? I could sort of see someone who believed in affirmative action saying something like “the man is more hireable, but I would prefer to hire the woman”. Other contexts prove that even small differences in the phrasing of a question can lead to major incongruities. For example, as of 2010, only 34% of people polled strongly supported letting homosexuals serve in the military, but half again as many – a full 51% – expressed that level of support for letting “gays and lesbians” serve in the military. Ever since reading that I’ve worried about how many important decisions are being made by the 17% of people who support gays and lesbians but not homosexuals.
For all we know maybe this is the guy in charge of hiring for STEM faculty positions
4. Williams and Ceci asked participants to choose between “Dr. X” (who was described using the pronouns “he” and “him”) and “Dr. Y” (who was described using the pronouns “she” and “her”). Moss-Racusin et al asked participants to choose between “John” and “Jennifer”. They said they checked to make sure that the names were rated equal for “likeability” (whatever that means), but what if there are other important characteristics that likeability doesn’t capture? We know that names have big effects on our preconceptions of people. For example, people with short first names earn more money – an average of $3600 less per letter. If we trust this study (which may not be wise), John already has a $14,400 advantage on Jennifer, which goes a lot of the way to explaining why the participants offered John higher pay without bringing gender into it at all!
Likewise, independently of a person’s gender they are more likely to succeed in a traditionally male field if they have a male-sounding name. That means that one of the…call it a “prime” that activates sexism…might have been missed by comparing Dr. X to Dr. Y, but captured by pitting the masculine-sounding John against the feminine-sounding Jennifer. We can’t claim that W&C’s subjects were rendered gender-blind by the lack of gender-coded names – they noticed the female candidates enough to pick them twice as often as the men – but it might be that not getting the name activated the idea of gender from a different direction than hearing the candidates’ names would have.
5. Commenter Lee points out that MR&a tried to make their hokey hypothetical hiring seem a little more real than W&C did. MR&a suggest that these are real candidates being hired…somewhere…and the respondents have to help decide whom to hire (although they still use the word “imagine”). W&C clearly say that this is a hypothetical situation and ask the respondents to imagine that it is true. Some people in the comments are arguing that this makes W&C a better signaling opportunity whereas MR&a stays in near mode. But why would people not signal on a hiring question being put to them by people they don’t know about a carefully-obscured situation in some far-off university? Are sexists, out of the goodness of their hearts, urging MR&a to hire the man out of some compassionate desire to ensure they get a qualified candidate, but when W&C send them a hypothetical situation, they switch back into signaling mode?
6. Commenter Will points out that MR&a send actual resumes to their reviewers, but W&C send only a narrative that sums up some aspects of the candidates’ achievements and personalities (this is also the concern of Feminist Philosophers). This is somewhat necessitated by the complexities of tenure-track hiring – it’s hard to make up an entire fake academic when you can find every published paper in Google Scholar – but it does take them a step away from realism. They claim that they validated this methodology against real resumes, but it was a comparatively small validation – only 35 people. On the other hand, even this small validation was highly significant for pro-female bias. Maybe for some reason getting summaries instead of resumes heavily biases people in favor of women?
Or maybe none of those things mattered at all. Maybe all of this is missing the forest for the trees.
I love stories about how scientists set out to prove some position they consider obvious, but unexpectedly end up changing their minds when the results come in. But this isn’t one of those stories. Williams and Ceci have been vocal proponents of the position that science isn’t sexist for years now – for example, their article in the New York Times last year, Academic Science Isn’t Sexist. In 2010 they wrote Understanding Current Causes Of Women’s Underrepresentation In Science, which states:
The ongoing focus on sex discrimination in reviewing, interviewing, and hiring represents costly, misplaced effort: Society is engaged in the present in solving problems of the past, rather than in addressing meaningful limitations deterring women’s participation in science, technology, engineering, and mathematics careers today. Addressing today’s causes of underrepresentation requires focusing on education and policy changes that will make institutions responsive to differing biological realities of the sexes.
So they can hardly claim to be going into this with perfect neutrality.
But the lead author of the study that did find strong evidence of sexism, Corinne Moss-Racusin (whose name is an anagram of “accuser on minor sins”) also has a long history of pushing the position she coincidentally later found to be the correct one. A look at her resume shows that she has a bunch of papers with titles like “Defending the gender hierarchy motivates prejudice against female leaders”, “‘But that doesn’t apply to me:’ teaching college students to think about gender”, and “Engaging white men in workplace diversity: can training be effective?”. Her symposia have titles like “Taking a stand: the predictors and importance of confronting discrimination”. This does not sound like the resume of a woman whose studies ever find that oh, cool, it looks like sexism isn’t a big problem here after all.
So what conclusion should we draw from the people who obviously wanted to find a lack of sexism finding a lack of sexism, but the people who obviously wanted to find lots of sexism finding lots of sexism?
This is a hard question. It doesn’t necessarily imply the sinister type of bias – it may be that Drs. Williams and Ceci are passionate believers in a scientific meritocracy simply because that’s what all their studies always show, and Dr. Moss-Racusin is a passionate believer in discrimination because that’s what her studies find. On the other hand, it’s still suspicious that two teams spend lots of time doing lots of experiments, and one always gets one result, and the other always gets the other. What are they doing differently?
Problem is, I don’t know. Neither study here has any egregious howlers. In my own field of psychiatry, when a drug company rigs a study to put their drug on top, usually before long someone figures out how they did it. In these two studies I’m not seeing anything.
And this casts doubt upon those four possible sources of differences listed above. None of them look like the telltale sign of an experimenter effect. If MR&a were trying to fix their study to show lots of sexism, it would have taken exceptional brilliance to do it by using the names “John” versus “Jennifer”. If W&C were trying to fix their study to disguise sexism, it would have taken equal genius to realize they could do it by asking people “who would you hire?” rather than “who is most hireable?”.
(the only exception here is the lab manager. It’s just within the realm of probability that MR&a might have somehow realized they’d get a stronger signal asking about lab managers instead of faculty. The choice to ask about lab managers instead of faculty is surprising and does demand an explanation. And it’s probably the best candidate for the big difference between their results. But for them to realize that they needed to pull this deception suggests an impressive ability to avoid drinking their own Kool-Aid.)
Other than that, the differences I’ve been considering in these studies are the sort that would be very hard to purposefully bias. But the fact that both groups got the result they wanted suggests that the studies were purposefully biased somehow. This reinforces my belief that experimenter effects are best modeled as some sort of mystical curse incomprehensible to human understanding.
(now would be an excellent time to re-read the the horror stories in Part IV of “The Control Group Is Out Of Control”)
Speaking of horror stories. Sexism in STEM is, to put it mildly, a hot topic right now. Huge fortunes in grant money are being doled out to investigate it (Dr. Moss-Racusin alone received nearly a million dollars in grants to study STEM gender bias) and thousands of pages are written about it every year. And yet somehow the entire assembled armies of Science, when directed toward the problem, can’t figure out whether college professors are more or less likely to hire women than men.
This is not like studying the atmosphere of Neptune, where we need to send hundred-million dollar spacecraft on a perilous mission before we can even begin to look into the problem. This is not like studying dangerous medications, where ethical problems prevent us from doing the experiments we really need. This is not like studying genetics, where you have to gather large samples of identical twins separated at birth, or like climatology, where you hang out at the North Pole and might get eaten by bears. This is a survey of college professors. You know who it is studying this? College professors. The people they want to study are in the same building as them. The climatologists are getting eaten by bears, and the social psychologists can’t even settle a question that requires them to walk down the hallway.
It’s not even like we’re trying to detect a subtle effect here. Both sides agree that the signal is very large. They just disagree what direction it’s very large in!
A recent theme of this blog has been that Pyramid Of Scientific Evidence be damned, our randomized controlled trials suck so hard that a lot of the time we’ll get more trustworthy information from just looking at the ecological picture. Williams and Ceci have done this (see Part V, Section b of their supplement, “Do These Results Differ From Actual Hiring Data”) and report that studies of real-world hiring data confirm women have an advantage over men in STEM faculty hiring (although far fewer of them apply). It also matches the anecdotal evidence I hear from people in the field. I’m not necessarily saying I’m ambivalent between the two studies’ conclusions. Just that it bothers me that we have to go to tiebreakers after doing two good randomized controlled trials.
At this point, I think the most responsible thing would be to have a joint study by both teams, where they all agree on a fair protocol beforehand and see what happens. Outside of parapsychology I’ve never heard of people taking such a drastic step – who would get to be first author?! – but at this point it’s hard to deny that it’s necessary.
In conclusion, I believe the Moss-Racusin et al study more, but I think the Williams and Ceci study is more believable. And the best way to fight sexism in science is to remind people that it would be hard for women to make things any more screwed up than they already are.
Outside of parapsychology I’ve never heard of people taking such a drastic step – who would get to be first author?! – but at this point it’s hard to deny that it’s necessary.
Yet another reasons other fields should just do like mathematics and go alphabetical! 😛
No WONDER Scott Aaronson is so famous!
That’s it, I’m changing my name to Aaaronson.
For a long time th forest service dispatched 3rd party contractor fire fighting companies in allhabetacle order.
Once people caught on the more savy owners started renaming their companies with ‘A’s in front. This was eventually fixed but not until you started getting names like AAAThunderbolt and AAAzteck. Which means it went through at least 3 cycles of name changes before anything was done.
The Soviet space program, after the first few successful launches with Soviet cosmonauts, began sending up cosmonauts from their fraternal socialist nations. The first guest cosmonaut was from Albania, then Bulgaria, then Hungary. The Kremlinologists were in a tizzy trying to figure out what this all meant, until someone realized that they were in Russian alphabetical order.
Similarly, currency from India has something important written in 15 or so different languages of India. The languages are arranged in *English* alphabetical order (Assamese, Bengali, Bihari …)
Similarly, the mission rosters for the Apollo moon landings are suspected to have originally been filled out in alphabetical order, then shuffled around as practical considerations required.
And the procession of the nations at the Olympics proceeds by to alphabetical order according to how the host nation spells them.
There is a series of towns in Nebraska that are in alphabetical order. They started out as alphabetically named stops along the Burlington Railroad from Lincoln going west: Asylum, Berks, Crete, Dorchester, Exeter, Fairmont, Grafton, Harvard, Inland, Juniata, Kenesaw, and Lowell.
There’s a poster in the student center at the University of Pennsylvania that lists the names of all the fraternities and sororities: ΑΧΡ, ΑΔΦ, ΑΣΦ, etc. Though written in Greek letters, they’re listed in alphabetical order in English.
It’s not uncommon in the DC area for the streets to try to be in alphabetical order. (Scroll up.)
(If you look around in that area, you’ll see similar short runs of (almost) alphabetically-named streets: Romlon St, Samar St, Tonquil St, Tonquil Pl, Usange St; Cedar Ln, Emack Rd, Foreston Rd, Garove St, Hennessey Dr, Indigo Dr, Longhorn Dr, Montgomery Pl; Ash Rd, Battersea Dr, Chilcoate Ln, Emack Rd; Lexington Ave, Manheim Ave, Naples Ave, Olympia Ave, Lincoln Ave, Quimby Ave; Jamestown Rd, Kelliher Rd, Kimberly Rd, Lancer Dr, Lancer Pl, Lancer Pl again, Madison St, Manorwood Dr, Nicholson St, Oglethorpe St, Oneida Pl, Oliver St, Oliver St again, Powhatan Rd…)
I especially like this part — in addition to reusing street names up to three times (and notice that the two Gallatin Streets are one-way in opposite directions), it’s clear that they named the streets by looking at a map, and didn’t pay much attention to how the system would work for people going up all the intersecting roads.
The effect of it all, just like that of DC’s grid system, is that it looks like a bunch of committees of not too terribly bright people got together and tried to impose many different systems at once, resulting in an incoherent and ridiculous mash that is best ignored as much as possible.
This also happens to describe the entity that lives in the area.
And it’s a good job that Todd Zywicki is a professor of law, not mathematics.
He could still be first author if he collaborated with David Zywina! http://www.math.cornell.edu/~zywina/
The company Airfix was named to get noticed in alphabetical directories.
Similarly, Dr. Madsen Pirie of the Adam Smith Institute credits the libertarian think tank’s emergence in part to the fact that when journalists were looking for people to quote for stories, they just went down an alphabetized list of organizations.
Aaaaand I know if I start a company, how will I call it. Even better, base the business on this. Such as I could deal in aaaaantiques, or sell Aaaaandroid phones…
When I moved to Houston in 1976, the last name in the Houston phone book was Zukie Zzulch. In 1977, though, we noted that Zukie was second to last, displaced by Chocko Zzzych. One drunken evening, I called up Mr. Zzzych and left a message on his answering machine declaring that I’ve just moved to town, my name is Zyrcon Zzzzygurat, and your days in last place are numbered, baby!
An excellent demonstration why so many people refuse to be listed in phone books 🙂
On this type of collaboration, I’d say the researchers should arrange in advance to give top billing according to the outcome of the study. I can see arguments for either rule.
A rule for giving top billing to the side whose priors were confirmed would reward good priors with a greater share of the prestige from the study and would have desirable incentive effects along the lines of the “betting is a tax on bullshit” principle.
On the other hand, giving top billing to the “losing” side would act as a consolation prize, softening hurt feeling and reducing the likelihood of the collaboration falling apart over the interpretation and analysis once the data has been collected, and it would mitigate against risk-aversion as a stumbling block stopping people from entering this sort of collaboration.
Do we really want the people running the study to have such an obvious stake in its outcome?
On emotionally/politically contentious questions, there’s a large stake in the outcome already.
If we go with the option of giving top billing to the “losing” side, the stake in the outcome given by publication billing will cut in the opposite direction as the stake given by confirmation bias and political mind-killing. The former stake would have to be more than twice as big as the latter for the net effect to be to increase the bias of the people involved in the studies.
For example, people with short first names earn more money – an average of $3600 less per letter.
I used to wonder if people with short names didn’t have an advantage on standardized tests — because they spend less time bubbling in their names and thus have more time for the actual test questions — and thus had a slight edge at the margin in getting into more prestigious schools. (Obviously this wouldn’t explain an effect of that magnitude, though.)
I bet this is being confounded by something. My guess is that part of this is that when they compare nicknames (eg Stephen vs. Steve) they’re capturing whatever personality trait (extraversion? easygoingness?) makes people want shorter names. But I haven’t looked through the study enough to see if that explains everything.
Anecdote: I’ve been told to start using the abbreviation of my name as a social strategy — by someone who said it’d work because it “sounds frattier”.
I mean, they’re not wrong. It does sound frattier.
Male names tend to be shorter, I think.
This is true.
My guess would be that name length is a proxy for ethnicity. Anecdotally, east asian names in particular tend to be short, and black names tend to be longer.
No, it’s a study of names vs nicknames.
Sort of. The emphasis in Scott’s link is about comparing nicknames to regular names, but it also mentions ethnic names specifically as not being viewed as positively, and also mentions a Linkedin report that found that American CEOs had short names *or* nicknames, indicating that short names that are not nicknames still have an effect.
Actually, the dollar figure Scott quotes is derived from a linear regression, so it could be subject to confounding by race or sex.
I have a four letter first name – I should be rolling in the dough, then? (Note: I am not by any means rolling in the dough).
So I could do better if I went to a three letter first name? Or initials? 🙂
Any idea of the earning potential of names below?
Men: Jay, Tim, Joe, Bob, Tom, Lou, Sal, Lee, Ed
Women: Ann, Sue, Jo, Mia, Tia, Kay, Bea, Dee, Vi
Reading the article, I think that it implies that if you moved to America you might start rolling in the dough – the effect doesn’t seem to work in Europe.
I don’t know, Peter; when Joanne Rowling became J.K. Rowling, she made a fortune, so it must be true – it’s science! 🙂
According to the study, 3 letter names are much, much worse than 4 letter names, even worse than 7 letter names; and 2 letter names don’t exist.
Al, Jo, Bo, and Lu were all people in my high school graduating class whose names I happen to remember and contain 2 letters.
I wonder if I could convince my boss that my first name is the empty string, and that I should therefore be paid another $15000.
This must explain why mononymous people are overrepresented among celebrities!
I have a colleague who, apparently out of a sincere delight in frustrating bureaucratic systems (rather than, for example, being Indonesian), asserts that he has a single name; sometimes this comes out as a null-string first name, sometimes as a null-string surname.
He has not risen high in the company, but I think that’s likely to be a common consequence of a root cause rather than a consequence of the null string.
Maybe this is why the Emperor of Japan is so powerful and well-respected.
Nah, it’s his magical sword.
At least two episodes of TNG have screens referring to data as “NFN NMI Data”, each acronym being “no first name” and “no middle initial.”
Trying to get people to call you by a single name is a power play, the implications of which are “I’m so important you don’t need a lot of syllables to recognize me.” Madonna, for example. Like all power plays, however, the result of failure is worse than never having tried. If you’re middle class and try to ironically act low class in order to appear high class, you run the risk of being mistaken for low class.
Maybe same dynamic applies to shorter names, albeit to a lesser degree. I wonder also about very common names like “John.” Most people named “John” end up being called Johnny, Jack, John [First Letter of last name], etc. Having a common name means you’re either so forgettable you need extra identifying info or else so memorable you can be recognized even by a short, common name?
I know someone who has two middle names, and, rather than privileging one over the other, declines to use either, and instead always enters her middle name as X.
(Also, part of my family has a middle name that’s passed down like a last name, but it’s considered to be a middle name.)
Change one of your names to “Null”
It’s entirely possible it will take down multiple systems that people try to use to store your name.
http://stackoverflow.com/questions/4456438/how-do-i-correctly-pass-the-string-null-an-employees-proper-surname-to-a-so
And change your other name to “; drop table customers; –“
That’s part of why i liked the “null” story. it wouldn’t be caught by almost any input filtering.
This could potentially be tested by redoing the experiment in Denmark, since nicknames are less common here – if a friend has a long name, I’ll shorten it in daily speech, but in most cases it’d be weird to do the same with an acquaintance. I know exactly 1 person here who wouldn’t say their whole first name when introducing themself to new people, and his dad is American.
(not counting:
– People who go exclusively by their middle name (I know several)
– Trans people who have changed their name socially but not legally (I know one)
)
It also mentions a linkedin report which says that the effect is limited to the US.
It’s not completely clear to me if that specific finding controlled for gender. They also don’t seem to have controlled for anything else (field, SES, ethnicity) as TheLadder is a job-search site, not scientists.
But you bubble in your name and other info in a separate period before the test begins. Nobody’s allowed to look at the questions until everyone is finished with that part.
When I took a GRE, you were required to copy out a statement saying that you were yourself and promised not to cheat, and then sign the statement. The directions indicated that your signature, and the entire copied statement, had to be written in cursive. I did this… but I might have been the only one; the proctor went through several rounds of asking “OK, who’s still not finished?” — “Anyone still not done?” — “Is anyone still writing?”
For the second and subsequent checks, I was the only person still not done painstakingly copying out this promise that was already printed on the test. That may have been the only time in more than a decade that I had to use cursive for any reason other than to sign my name. I forgot how to write a cursive k and just snuck a print one into that word.
Summing up — I’m quite certain personal information is filled in separately from test answers. :p
Was no one else copying it out at all? Because I’d have thought that cursive was at least equally fast as printing.
My cursive is wayyyy slower than my printing, just because I haven’t done cursive since approximately grade 7. So I was in the same boat as Michael Watts.
Cursive is slow and painful for me, too. Not even my signature is in cursive.
> For example, people with short first names earn more money – an average of $3600 less per letter.
Isn’t that contradicting itself within the space of one sentence?
I think that’s meant to be read as $3600 less for each additional letter in the name.
I think the sentence should be read as: people with longer given names earn less money, about $3600 less per additional letter.
Obviously there are ethno-gender issues with names, but I think the negative effects of a longer name would be seen even among WASP* men.
That’s because a lot of traditional long male names get very negative ratings on masculinity and likeability.
Consider, e.g., Abraham, Alexander, Augustine, Benjamin, Clarence, Jeremiah, Nathaniel, Reginald, Sebastian, Sylvester, Terrance, Theodore. In 2015, most of those names come across as old-fashioned and pretentious.
(Two exceptions: Christopher and Jonathan are long names that get favorable ratings.)
Given identical credentials, would you choose Sylvester or Scott to manage your lab? You might not think you’re affected by something so superficial, but I bet a well-designed experiment would show strong bias for Scott.
Here’s one study on the general topic: http://epublications.marquette.edu/cgi/viewcontent.cgi?article=1002&context=mgmt_fac
* White Anglo-Saxon Protestant
This is a shot in the dark, but I wonder how much of the issue with many long names is specific associations that either make the person harder to take seriously or pigeonhole them in your mind to a particular role. You say Sylvester, and my first thought is ‘Thufferin Thuccotash!’ My second thought is ‘Rambo.’ The same thing goes for many of the names you listed, like Abraham, Sebastian or Theodore.
Someone should try a similar study to the one Scott mentioned, paying attention to names that aren’t necessarily long, but are rare and typically associated with particular fictional characters or famous historical figures. Ernie, Bert, Elmo, and Grover (a twofer!) would be at the top of my list.
Ape, if it is due to short names having fewer associations, that would really be a common vs rare effect. They didn’t try that comparison, but they did something similar, which is pairwise comparisons. They still got the effect, but they did not quantify it, so it might have been attenuated. Bill vs William – both very common, both with lots of associations. They also did variants, like Michele vs Michelle.
I love reading these kinds of things about the American educational system, because it is so different to how we do things over here.
The major national state examinations are all – brace yourselves! – hand written answers by the candidates. We don’t have the kind of “shade in the correct dot” answer sheets you’re talking about. We need real actual people to read and mark the exams, which is a handy nixer for teachers during the summer holidays (we get approximately three months’ summer holidays) – see link to sample examination paper.
Why, in my day, we weren’t even allowed bring calculators into the maths exams! We were provided with log tables instead. That has changed nowadays, you will probably be glad to hear 🙂
Attempts to drag us forward into the American-style mechanised future don’t go very well – see the e-voting machine fiasco.
Electronic voting systems like the one tried in Ireland — with no audit capability or “paper trail” — have been scrapped in the U.S. as well.
Here in Michigan, we use optical scan paper ballots.
Note that elections in the U.S. are enormously more frequent and complicated than elections anywhere else.
I personally get to vote on almost a hundred different elected posts, including federal, state, county, city, school district, library district, and community college district offices.
A general election requires large ballots printed on both sides (additional sheets are sometimes required). Here’s an example:
http://www.twp.waterford.mi.us/Departments/Clerk/Ballots/2014/November/Precinct-7.aspx
My guess is that WASPS tend to have short names: “Bud, Chip, Phil,”
I think that in hiring faculty there is more of a concern of “how will it look to outsiders if our staff is not diverse?” leading to a conscious effort to seek out women and non-whites, while that may be less of a factor with hiring a lab manager.
Yes, this is the obvious candidate: faculty hiring committees in STEM are under heavy affirmative action pressure to hire women or at least interview, while I never heard that being a concern wrt lab managers (which I tend to think of as a stereotypically female position, although I couldn’t find data to back this up).
I would think most of this would just fall out from the fact that there are a lot of faculty members but only one lab manager. Moreover, the necessary qualifications of a faculty member are much looser than for a lab manager. So even without worrying about appearances, you might argue that there is value to a faculty with lots of variation, which would lead you to prefer candidates that are different from what you already have (i.e., in STEM, men). There’s no such dynamic in filling the lab manager position, and once it’s filled you’re not going to hire an additional one next year.
Daniel Kahneman is a supporter of adversarial collaboration and has written papers with theoretical “rivals”, e.g., Do Frequency Representations Eliminate Conjunction Effects? An Exercise in Adversarial Collaboration.
Also, one of my favorite papers: “Conditions for intuitive expertise: a failure to disagree“. Kahneman is a great man. (For serious.)
I notice that the short name effect is the only explanation for differences between the studies that Scott has put any numbers on.
The W&C study make it clear that the hiring scenario is imaginary. This is the text sent to the participants:
” Imagine you are on your department’s personnel/search committee. Your department plans to hire one person at the entry assistant-professor level. Your committee has struggled to narrow the applicant pool to three short-listed candidates (below), each of whom works in a hot area with an eminent advisor. The search committee evaluated each candidate’s research record, and the entire faculty rated each candidate’s job talk and interview on a 1-to-10 scale; average ratings are reported below. Now you must rank the candidates in order of hiring preference. ”
The MR&a article presents the scenario as though the participants are giving feedback to real candidates, pooled from a nationwide database. Here is their text:
“To study this question, we have compiled and summarized information from actual applications of students who have recently applied to be lab managers at universities across the country. These students have volunteered to share their information in exchange for mentoring opportunities as part of their participation in the study. … Today, we will be assigning you to read the applicant profile of one randomly-selected student from the nationwide database. Please imagine that you are actually evaluating this student’s application to work in your own lab. After reading the applicant profile, you will be asked to provide your opinions of the student and offer them feedback as they make decisions about moving forward with their career.”
I point out this difference because it seems like it could also be very relevant—if you don’t believe your actions /matter/ and this is all fake, and if you even subconsciously notice that fake names like Dr. X and Dr. Y are being used but the genders are not masked, then maybe under those conditions you’ll be more likely to favor the female candidates. (This is another case of potential Near vs. Far Mode bias. Except Far = Fake in this case.)
On the other hand, while W&C ask professors to imagine hiring the applicant, MR&A also ask them to imagine hiring the applicant, but at the same time telling them that the applicant will in reality be hired by someone else and the point is to help them with that.
I think those instructions would push me into a Far Mode direction on the second case, or at least interfere with my ability to imagine it in Near Mode.
I have a hard time seeing why the resumes weren’t sent to lots of actual job openings and record the calls for interviews each received.
With actual applications, they would have to fabricate complete CVs, references, etc., and it would be very easy to see that they are fake (e.g., from Google Scholar profiles). And it would be difficult to make two applicants equivalent unless they had exactly the same CVs.
Perhaps because there are ethical concerns with submitting fake resumes, and making people who are supposed to be working on hiring do an awful lot of work reviewing CVs of candidates that don’t even exist? Never mind the effect on the careers of the people who get rejected for interviews for these actual jobs in favour of the fake candidates.
These ethical concerns don’t seem to bother people who test private industry.
Yes, this seems like a very strong reason for the difference. As someone who has done hiring for a charity, the fake scenario, names that are initials and the ‘would you hire’ question all strongly signal that it is a test about doing the right thing. It may not be immediately apparent that it’s about gender, but if there’s not other major difference in the candidates it seems likely that some people will notice that it is.
I think they only got one resume each. Not one male one and female one that were completely identical.
So it shouldn’t be “obvious” that it’s about gender. It could theoretically be about any trait included or omitted in the resume.
Of course, gender is a good guess as it is a hot topic.
In one of the five experiments, this was the case. In the four others, the participants received three “summaries” of potential hires. Then, they reversed all the genders and gave that set to different participants. This is how they set up comparisons.
This is an important point. Unlike most private-sector resume-bias experiments, it is virtually impossible to fake an academic search. It is not quite accurate to describe this as a ‘hiring experiment’ when everyone involved was aware that there were no hires to be made. Tenure-track hires are so rare in most academic departments, compared to hires in a department/team of comparable size in the private sector, that the mapping between stated and revealed preferences is difficult to discern. The researchers made a strong effort to obfuscate their aims, but if you are an academic participating in an artificial candidate search, you are likely to suspect that one of the variables being tested is either sex, race, or both. Experiment 5 (rate a single candidate)addresses this somewhat, but still suffers from the main issue affecting the study: this hiring experiment bears much less resemblance to real academic hiring conditions than most recent private-sector bias experiments bear to private sector hiring. The scenario of rating a candidate after not participating in the search activities (job talk, dinner to assess ‘collegiality’, cv review, search committee meetings, etc.) does not replicate the experience of an academic on a search committee where the candidate recommendation is generated, but rather resembles the process of the Dean accepting or rejecting the search committee’s recommendation. The Dean does not have to work with the candidate and often has different priorities than the search committee. Given the volume of applications for most academic positions, it is not difficult to ensure that the longlist, and often the shortlist, are gender-balanced to please the Dean, but still recommend a male candidate preferentially. I think the study’s conclusion regarding the internalization of norms remains strong in spite of these concerns.
One of the references cited – National Research Council (2009) Gender Differences at Critical Transitions in the Careers of Science, Engineering and Mathematics Faculty (National Academies Press, Washington,
DC). can be viewed online and provides more direct hiring data and supports the argument that hiring is equitable from the interview stage onward.
This suggests that it is better to just carefully examine the ecological data.
I agree that the lab manager thing does seem odd enough to warrant some kind of explanation. Maybe the hiring of lab managers is something that’s normally done by a single professor for their own lab, so the researchers chose it because it’s a situation the professors would be familiar with? Whereas ordinarily a professor wouldn’t have the sole responsibility of deciding which new professor to hire.
Otherwise, I can’t imagine why you’d choose to look at lab manager hirings when investigating gender bias in STEM. It seems a bizarre sort of thing to go out of your way to do in a study.
Right, a professor hires a lab manager for their own lab and is done with it. Whereas hiring faculty is not so much about judging the individual, but about trading favors with other faculty.
I think you’re on to something here. Maybe imagining a collective decision-making situation leads the subjects to be affected by some sort of desirability bias?
I’m a grad student in engineering, and in my own lab the “lab manager” is just another one of the grad students. She didn’t apply and interview for the position. She was appointed because of her competence at things like filling out the labels on the waste containers and taking inventory. She isn’t paid any extra, though she is able to put “lab manager” on her resume/fellowship applications.
Neither of the labs I worked for as an undergrad had a special “lab manager” job you specifically applied for either. In both cases it was just one of the grad students who was loosely in charge of nuts-and-bolts lab stuff. This was one of the things that always made me look at that MR&a paper askance.
Is it different in different fields, or at more prestigious universities where the labs are bigger and have more money? Anyone have personal experience? I’m genuinely curious.
Yeah, I had actually never heard of the position “lab manager” before reading this article, so I have no idea. I just assumed it was something that was common either to countries other than my own, or to fields other than my own (Canada and physics, respectively). And Kiya below seems to think it’s something completely different from what I assumed it was. Anyone want to clarify?
Kiya said nothing about lab managers, having not heard of them. Kiya talked about something completely different in place of lab managers.
I have never heard of lab managers, but several hits for MIT lab managers appear to be careers (eg, ten years duration). One does appear like in the study, a temporary position just out of undergrad.
When I was in grad school for some sort of engineering, there were a couple of facilities that weren’t part of any particular research group that were run by people with the title “_____ Lab Manager”. They were Ph.D.s, they trained people in the use of equipment, gave advice on experimental procedures, kept everything maintained and in stock, etc.
I’m not sure that’s what we’re talking about here, though.
The only Lab Managers I’ve had were in early undergraduate classes–Chemistry I and Physics I,II–open to all students requiring them(2 or 4 year).
And they were graduate students, none of which were in my major: Electrical Engineering.
Once specific courses were passed, usually by 2nd semester junior level, our labs(EE & CSE specific) were pin code access &OR student ID swipe. As long as there was a department faculty member behind their desk somewhere, aka the doors were unlocked lol, it was open for use.
Depending on the university overall or its IEEE ranking, determines a lot of what ‘happens’ within the halls of the engineering department. High IEEE ranking=filled positions for qualified lab managers in their respective areas. (ex. CalTech, CalPoly, MIT) Notable Universities for liberal arts with E programs=upper level student lab volunteers plus multiple department ‘Dr-Study-Everybody-Because-Equality-N-Stuff’ makes for good class studies and school newspaper publishings. LOL
Side note: I’ve never had a female professor in any of my EE courses. I’ve also never WORKED with a female EE, or worked with female technicians (only 5 total in 12 years!) worth a crap and they all were transferred to different departments and/or left the job completely. Interesting, eh.
Bias in hiring for things like post-doc positions could influence who goes on to be faculty in a number of ways. This fits with the reason they give: “We focused on hiring for a laboratory manager position as the primary dependent variable of interest because it functions as a professional launching pad for subsequent opportunities.”
The new paper mocks that by surveying professors about whether they were ever a lab manager.
Anyhow, the observation we are trying to explain is that there is a “leaky pipeline,” not that any particular stage is special. W&C are wrong to say that the end is particularly worthy of study, just as much as MR&a are wrong to say that the beginning is particularly worthy of study. Nor is the station that they actually study unworthy, just because it isn’t in pipeline.
Publication bias maybe?
Perhaps they did various experiments testing various job positions and only got a significant result in the direction they wanted with lab manager.
The position of undergrad research assistant (which I’m rounding “undergrad lab manager” off to because none of the CS labs I worked in as an undergrad had “managers”) is really hard to hire for on the basis of raw qualifications. Your applicants come pre-filtered as students who got into your university. You can look at their grades while there, but grades aren’t that good a predictor of whether the student will do good part-time research work (students with better grades might be smarter, but also might be more likely to drop everything outside of classwork during academic crunchtime). You don’t expect them to know details of your specific field, because they’re an undergrad. You care most about whether the student seems interested in your work (so that they’ll learn about it quickly and do it more reliably), responsible, and pleasant to work with; that’s not answerable based on their resume, which probably lists their grades and any previous research assistantships they’ve done. (You could call professors they’ve worked with previously and ask what they thought of the student? Wait, no, they’re fictitious.)
It also won’t show up on any publicly-visible diversity statistic how many of your female undergrads have part-time research jobs.
Who says it has to be conscious? Do we know how many experiments they’ve done?
Or how many other groups have done similar experiments? As Scott points out, these experiments consist of surveying bunch of other professors. They’re comparatively cheap and easy to do.
Maybe a dozen different teams tried to find gender bias, and got a variety of results because they had a variety of confounds, and the ones with the biggest result published and got the headlines.
Then a dozen different teams tried to find lack of gender bias, and got all sorts of results, and the ones who happened to stumble on an experimental set-up that yields a pro-female bias (for some or all of the reasons they suggest) got the publication and the headline.
Re effect of names on career, does the effect come from how people in general tend to react to a type of name, or does it come from parents who give their child a particular type of name also tend to have a particular parenting style?
I think it’s a social class, ethnicity, or perhaps cohort effect. See here for a critique of research in area.
“Remarriages to same divorced spouse are frequent.”
Huh. Frequent enough to throw off the study?
It’s just one explanation, but the effect size was small to begin with.
The original source quantifies the claim: 40% of marriages between people with the matching surnames are remarriages.
www8.gsb.columbia.edu/rtfiles/marketing/seminar_papers/paper_simonsohn.pdf
Makes sense. Raises a new question though: when remarriages are excluded, are the odds of you marrying someone with the same surname at chance, or is it actively avoided?
Interesting question. My surname is sufficiently rare that I would feel ickily related to anybody else with it, even though I know all branches of the family back to great-grandpa, so could rule out any significant consanguinity.
On the other hand, I know a Brown who married a Brown. I expect that when you have a common surname like that, it doesn’t mean much to encounter someone else with it.
I have heard that the Chinese (used to?) have a taboo against marrying someone with the same family name.
My last name is so rare in the U.S. that I’m not sure there *are* any women with my last name other than my sister (and now my wife). In the country of origin of the name, it’s still rare, but not quite that rare.
When reading Spanish news sources, every so often there’s a person named “Uribe Uribe” or the like, which indicates that father and mother have the same last name. It’s not very common, but it does happen, especially (as in the case of Rafael Uribe Uribe) within small elite social circles.
One complication that always bothered me in the lab manager case—probably because I normally see it cited as saying “sexist hiring managers discount women’s GPA”—is that men tend to have lower GPAs than women, so assuming people are on average about equal quality employees, an employer should prefer a man to a woman if both have the same GPA and all else is equal.
Why should you assume two people with different GPAs are equally smart?
@loki: He doesn’t say that two people with two different GPAs are equally smart, he said that if a man and a woman have the same GPA, a hiring manager should hire the man because men have lower GPA on average.
@JM: I don’t see how a man having an equal GPA to the woman would make the man preferable, all else being equal. If GPA is an indicator of fitness for a position, wouldn’t identical GPA indicate identical fitness? In a gender-blind hiring scenario, you are still left with two equally fit candidates.
If you are going to take externalities under consideration in hiring, why consider overperformance against the average GPA of men vs. women as a deciding factor? Why not consider instead the value a female candidate (underrepresented in STEM) could add as a different lens for viewing problems? That seems a much stronger indicator of potential fitness than gender differences in GPA.
It isn’t, it’s a proxy for one or more other things. If what you’re after is intelligence, GPA is intelligence with noise, where one of the noise sources is gender and can be controlled for.
Personally? Anecdotally speaking, I find that female STEM students are drawn from the same pool of, like, two personalities as male STEM students, and therefore suspect that won’t work as a viewpoint-diversification scheme. (Whatever the difference between Male Typical Engineer-Mind and Female Typical Engineer-Mind is, it’s a lot smaller than the difference between Male Typical Engineer-Mind and Male Typical Business-Mind.) We’d want to test for and favor hiring qualified unusual personalities for the field directly if that’s one of our values.
@Irrelevant what are the two personalities?
>If GPA is an indicator of fitness for a position, wouldn’t identical GPA indicate identical fitness?
*If* men and women are equal quality employees, then men must on average be better employees than women with the same GPA, by simple arithmetic. Conversely if GPA is correlated equally with quality independent of gender then men must be worse employees than women on average. I have no idea which of these is true.
>Why not consider instead the value a female candidate (underrepresented in STEM) could add as a different lens for viewing problems?
Is this supported by evidence? If we’re just speculating then my speculation would be that teams with a clear gender majority would outperform balanced teams made of equal-ability individuals, because people of the same gender have more in common and so can communicate with each other more effectively.
What is a better predictor of work performance, GPA or IQ?
Men average IQ is equal or slightly higher than female average IQ, and and certainly higher once you restrict to the right tail of the distribution due to higher variance.
GPA and IQ are positively correlated, but if you are using GPA as a proxy for IQ, then it would be epistemically rational to control for gender.
Instrumentally, it would be probably better to measure IQ (or some other high-quality proxy of work performance) directly.
Women tend to be more conscientious about academics (although the gap diminishes with age), but to the best of my knowledge that’s not reflected in superior workplace performance.
How do we define “hirable”? What if the effect is the following:
When being asked who a person would hire, they are expressing their beliefs about the productivity of genders. When asking who is more hirable, people think they’re being asked who their neighbor is more likely to hire. If the majority of the population isn’t sexist but also thinks that the majority of the population is sexist, we could see the result we’re seeing.
I like this.
This part looks especially conducive to this line of thought, as the professor would be specifically thinking about how the average employer would see the candidate:
Not placing much confidence on anything at this point, though.
Yeah, there are a number of papers showing that reasoning by counterfactual is different.
If that’s the correct interpretation of the results then it would be a major case of groupthink: everybody thinks that everybody else is sexist against women, therefore everybody overcorrects and society ends up being sexist against men.
Relevant: http://en.wikipedia.org/wiki/Keynesian_beauty_contest
“John” and “Jennifer” have very different age demographics. In particular, John is likely to be much older (and therefore more experienced) than Jennifer.
Did the resumes have ages on them? Could the age-associations of the names have an effect, even if the actual ages are known?
The study asserted that John and Jennifer were “students” which suggests a pretty serious restriction on their ages. But, yes, it is plausible that age associations could have an effect.
I saw the title and thought this was going to be about the priming study claiming an effect on speed walking down a hallway, and the various replications attempting to double-blind it.
When I was in marketing research in the 20th Century, the basic rule of interview research is that respondents will normally give you the answers they think you want to hear. It’s hard work to devise studies that don’t prime respondents to repeat back your own biases.
Just seen this new post, and before even reading it I want to say: I adore your standards of “less blogging”! 😉
If it really is just a matter of walking down the hallway, couldn’t this be explained by differences in sample populations? Maybe feminist and antifeminist researchers tend to draw their research from within their respective bubbles.
Note that this requires the chain of causality to be bubble -> position on feminism, not position of feminism -> bubble. Not that that’s at all implausible, but I think it’s worth stating explicitly.
Are you referring to the fact that the authors are claiming that their subjects are biased in the opposite direction from the authors? But it could be that the results are due to the subjects wanting the results to come out that way, because they have the same opinion as the researcher.
At that point you’re alleging a novel form of data falsification in which everyone answers questions as if they were their own bogeyman in order to screw up studies. Which, this not being an internet poll, seems implausible.
Whatever is going on, I doubt it’s a novel form of fabrication. I’m pretty sure it’s common in the literature.
We see it with Stereotype Threat studies.
You don’t think people answer questionnaires and polls differently to how they would actually behave? Because several referenda and campaigns in my country have crashed when the proponents of one side or the other believed the poll data over the idea that “the people have spoken – the bastards” 🙂
If I were filling up something for a study, of course I’d be careful to be as diversity-aware, inclusive, gender-blind, non-racist, non-homo/trans/other-phobic as possible in my answers. What I would do in practice might be a different thing.
One explanation is that people who pay attention to gender issues are more likely to become feminists if locally surrounded by misogyny.
Another explanation is that the pervasive meme is something like “women are weak and helpless” — which makes people want to build institutions to protect them but doesn’t make people trust them with labs.
First seems implausible: they would move.
Only if they realize other places have different cultures. It’s very common to assume everywhere is basically like where you are.
Wouldn’t that only get the desired results if all the feminists are friends with tons of misogynists and the anti-feminists only know pure merit judges?
If we can’t draw a conclusion on what is happening, then can we conclude “If we can’t actually draw a conclusion about what is happening, you don’t get to yell at people about it?”
ObHanson: “Yelling is not about Disagreement.”
Ahahahahaha, that’s a fantastic joke. You’re a card, Doc B. Feminist-oriented researchers, bloggers and journalists ever not yelling at people about gender bias. Heeheehee. Well done. wipes tear
We can yell at people for drawing conclusions on what is happening when we can’t draw conclusions on what is happening.
L2MetaDebate.
One more hypothesis:
I notice that W&C give three applicants and ask for a direct comparison among them, with ranking in order of preference, while MR&A give a single profile for analysis.
So maybe people choose women over men when the choice is explicitly framed in their minds as “should I choose the woman over the man?”
ETA: this would also account for W&C’s observations that their results compare well to real world hiring data, as these decisions would usually involve directly choosing among a number of candidates.
This seems pretty plausible to me. (Of course it’s also quite possible that one or both studies are seriously flawed in some non-obvious way).
This is why they had 3 applicants instead of 2, one was definitely less qualified, and (in theory) disguised the purpose of the experiment.
>So maybe people choose women over men when the choice is explicitly framed in their minds as “should I choose the woman over the man?”
In experiment 5 of W&C they did an experiment very similar to MR&a, where they sent one resume and asked for a numerical rating from 1-10. W&C found a pro-female bias of 1 point on that scale.
I don’t see the relevance of the first point. The second point is good, I had not seen that.
The point is that it isn’t simply man vs woman. I don’t find it very convincing, but I do find it relevant.
Well, if the third option is obviously less qualified it would be immediately discarded, and the task does become “qualified man vs qualified woman” after a trivial step.
(I’m assuming that whatever bias there might be wouldn’t be strong enough to put the obviously inferior candidate ahead of one of the others)
The W&C paper actually looks pretty bad. The main studies are using narratives, instead of just handing out CVs like a hiring committee would normally get. They are also explicitly told that this is an imaginary situation, and given multiple narratives to rank. They do have a validation study with CVs, but they only have 35 data points there.
I don’t understand why they wouldn’t just send CVs out like any other (fake) hiring process, instead of turning it into a low consequence way to signal your willingness to hire women, unless they were intentionally courting bias.
Their attempts to control for selective response are also a bit strange. They offered a sample some money, in order to increase the response rate, but it looks like from their data that the entire control study is within psychology. This is also going to introduce bias.
And I saw a blogger mention that the paper was a direct submission to PNAS, and so had a pre-arranged editor. That seems like a potential red flag.
Academics don’t hire based on CVs, but based on letters of recommendation, that you would probably call “narrative.”
In my field, the short list is constructed entirely by CVs and published papers. Letters come in to play after that (unless someone knows someone, in which case the short list might be constructed out of friends-of-friends).
BUT- look at their methods appendix, the narratives presented by William and Ceci are nothing like a letter of recommendation! Instead, they contain things about family information (married, divorced,children, how much parental leave they’ve had in the past,etc), a bit of non-specific information about research (“excellent”). It’s very obviously a role-play where the only information you are given about the candidates is their gender and some family related information and a lot of non-specific platitudes about their work ethic and their research quality.
Put someone in a role play situation, and they think “of course I’d hire the woman,” “I would never punish someone for family leave.” Tell them it’s an actual situation and they might be thinking “oh no, if we hire this person with this difficult family situation they might not get as much research done and I might not get that umbrella grant.”
Also, in my experience, to be seriously considered for a faculty position, you need a reputation and an extensive paper trail outside of a CV. If the first minute of googling reveals that the publications listed in your CV don’t exist, and none of the faculty have ever heard of you or met you at a conference, it’s not surprising if you don’t get invited for an interview.
How much experience do you have with academic hiring?
When I was a grad student, I was the student representative in a few commisions to find a new professor. All of the candidates who were in close contention were known to the current faculty, personally or by reputation. This was in Germany, but I can’t imagine that science works much differently in the US.
As a post-doc at a small institute in the US, I was involved in the selection process for a new post-doc. I sorted out a ton of applications without double-checking, but for the more interesting ones, a quick web search was the obvious way to get an impression of the quality of their work, and fake CVs would have been detected at that stage.
This is pretty well outside my field but I found the 8th comment on this post very interesting.
https://feministphilosophers.wordpress.com/2015/04/14/new-study-shows-preference-for-women/
This comment suggests that the narrative approach rather than CVs is not a novel methodolgy from W&C but rather something that has been used in other, similar studies. He also notes that while 35 is a relatively small validation group previous research hasn’t been characterized by very large sample sizes either, 60 per group for the Moss-Racusin study Scott is comparing this study against.
I think the overall point of the comment, and Scott’s posts on topics like this generally, is well taken. I know from reviewing papers in my field (microarchitecture), you can find methodological issues with just about any paper if you are so inclined. In microarchitecture this isn’t generally a huge deal because most people don’t have underlying biases in favor of one cache replacement algorithm over another. In cases like gender bias however its not just plausible, but likely that people critiquing on the methodologies of each paper do have some underlying bias toward one result or another. In that sense it would be interesting to have people examine the methodologies blind to the results and see how that might change the critique.
In the Moss-Racusin study, they used 60 per group. That’s much larger than 35 total (they only did the CV validation with one group- engineering professors. They only did the control for response bias with one group- psychology).
I’m not familiar with the standards of their field, but the fact that they direct submitted it to PNAS makes me think that they probably couldn’t get it through actual review elsewhere. Most of the other studies I’ve seen attempt to validate more than 1 condition. Also, if you are going to validate only a single condition, you could do much better than 35 responses.
I’ll also add that I don’t think that gender biased hiring is a huge problem in faculty positions. I just think the 2:1 in favor of women result is far too high to be plausible.
Despite being relatively sure you’re right about the issues and the effect being difficult to believe, I’m not convinced that the limiting factor on publication was not being able to pass review due to methodological issues. The current political tone regarding women in STEM would make it hard to get a countervailing study published regardless.
How does it imply that narrative methodology has been used before?
(Answer based on my experience in math hiring; other fields may differ.) Sending out fave CV’s at the tenure track level doesn’t make sense — people who are being considered for TT hires already have extensive reputations. At the postdoctoral level, I still don’t think it would work. Sure, I read lots of postdocs applications from people I haven’t heard of, but their letters of recommendation come from people I’ve heard of. So either you get a dozen senior faculty to agree to put their names on fake letters of recommendation (3-4 per candidate, and you presumably don’t want them to overlap) or you give everyone letters of recommendation from people no one has heard of, which puts the candidate at a huge disadvantage.
I don’t see a reason fake CV’s at the lab manager level wouldn’t work. I know people with that job, and the process is much less elaborate. It’s a rude thing to do to the recipients though.
I admit that I’ve just started reading this blog, but can someone explain to me why does it matter if there’s a hiring bias? In my (admittedly naive understanding), if there’s a preferential hiring of women over men, then due to the reduced demand men’s salaries would drop. Then any company who simply hires men over women would outcompete the companies which do not due to the reduced cost. It seems like it’s a self-correcting problem similar to the wage gap.
Again, sorry if what I’m asking is a little basic.
In an academic context, which is what both of these papers examine, it doesn’t quite work that way. The supply for academic positions, especially at top tier institutions, FAR outstrips demand. Imagine for simplicity that all the average research group lead by one professor graduates on average 1 PhD per year. If the average academic career of a professor is 20 years then you potentially have 20 candidates for each open position. These numbers are somewhat arbitrary and skip over a bunch of factors but it should demonstrate why simple economics and competitive advantages don’t work in an academic setting.
This is very sector-dependent. If you’re a PhD in a field that has a decent non-academic market (I’m a PhD in engineering, for example), then it just pushes the advantage around. Universities have a hiring bias toward women/minorities so they can write that on NSF grant applications? Private-sector companies will happily hire a bunch of undervalued men and make a bunch of money.
We see all kinds of interesting market conditions due to the specific demands of subsectors (whether on the labor side or the employment side). I’ve personally known many international students who got their PhD in America, decided they liked it here, and realized that their best chance of staying is with a faculty position. On the other hand, many gov’t/military research labs require US citizens. There are plenty of private companies ready to pounce on anyone undervalued (to the extent that they aren’t being pushed by defense contracts which require citizenship or public investment that prefers specific groups).
It’s a massive maze, and it’s very difficult to say society-wide things about it.
Two things leap out:
1) This is relevant overall because it’s a political Hot Topic that is having large quantities of money thrown at it, rightly or wrongly. This is touched on at the beginning.
2) I got the impression the article is more about the issues with the studies than about the subject in question.
There are a lot of people who believe both
A) women are discriminated against
B) the efficient wage level argument does not work (or they do not understand this argument).
and who accordingly support policies like
1) gender quotas
2) explicit bias in favor of women
3) criticizing majority-male institutions
4) lower the status of stay-at-home moms
5) giving money, contracts, awards etc. preferentially to women
6) prosecute firms for having unequal numbers
if you thought that 1,2,3,4,5 were bad, you might think it was important to show that A) was false – though I agree that B) is also a logically adequate rejoinder, even if perhaps rhetorically lacking.
Is it possible that one or both parties ran similar studies that didn’t get the results they wanted and failed to publish them? If I was in this position, I can imagine thinking of lots of reasons why the study that got results I didn’t like had various flaws that meant that it should be redone. (After all, I know what the truth is, so clearly a study that doesn’t find the truth wasn’t designed very well.)
If my hypothesis is correct, I suspect that one party or the other will refuse to participate in a collaboration for whatever reason, and that this party will be the guilty one.
“Stereotype Threat” is notorious for existing mostly due to the File Drawer Effect.
From an interview with John List, Homer J. Livingston professor of economics at the U. of Chicago:
RF: Your paper with Roland Fryer and Steven Levitt came to a somewhat ambiguous conclusion about whether stereotype threat exists. But do you have a hunch regarding the answer to that question based on the results of your experiment?
List: I believe in priming. Psychologists have shown us the power of priming, and stereotype threat is an interesting type of priming. Claude Steele, a psychologist at Stanford, popularized the term stereotype threat. He had people taking a math exam, for example, jot down whether they were male or female on top of their exams, and he found that when you wrote down that you were female, you performed less well than if you did not write down that you were female. They call this the stereotype threat. My first instinct was that effect probably does happen, but you could use incentives to make it go away. And what I mean by that is, if the test is important enough or if you overlaid monetary incentives on that test, then the stereotype threat would largely disappear, or become economically irrelevant.
So we designed the experiment to test that, and we found that we could not even induce stereotype threat. We did everything we could to try to get it. We announced to them, “Women do not perform as well as men on this test and we want you now to put your gender on the top of the test.” And other social scientists would say, that’s crazy — if you do that, you will get stereotype threat every time. But we still didn’t get it. What that led me to believe is that, while I think that priming works, I think that stereotype threat has a lot of important boundaries that severely limit its generalizability. I think what has happened is, a few people found this result early on and now there’s publication bias. But when you talk behind the scenes to people in the profession, they have a hard time finding it. So what do they do in that case? A lot of people just shelve that experiment; they say it must be wrong because there are 10 papers in the literature that find it. Well, if there have been 200 studies that try to find it, 10 should find it, right?
This is a Type II error but people still believe in the theory of stereotype threat. I think that there are a lot of reasons why it does not occur. So while I believe in priming, I am not convinced that stereotype threat is important.
http://isteve.blogspot.com/2012/10/john-list-on-virtual-nonexistence-of.html
You blogged about this once and mentioned a paper presentated at a conference that proved this and was publication pending. As far as I know it was never published. Do you know what happened to it? It was some Dutch name.
Stereotype threat and the cognitive test performance of African Americans, by Jelte M. Wicherts & Cor de Haan, University of Amsterdam
Here’s the abstract from a 2009 conference presentation:
“Numerous laboratory experiments have been conducted to show that African Americans’ cognitive test performance suffers under stereotype threat, i.e., the fear of confirming negative stereotypes concerning one’s group. A meta-analysis of 55 published and unpublished studies of this effect shows clear signs of publication bias. The effect varies widely across studies, and is generally small. Although elite university undergraduates may underperform on cognitive tests due to stereotype threat, this effect does not generalize to non-adapted standardized tests, high-stakes settings, and less academically gifted test-takers. Stereotype threat cannot explain the difference in mean cognitive test performance between African Americans and European Americans.”
https://menghublog.wordpress.com/2012/12/06/race-and-iq-stereotype-threat-r-i-p/
I don’t know what’s happened in the last half-decade.
Personally, I think it would be pretty easy to generate stereotype effect results just by hinting to black or female test-takers that we’d be happy if they didn’t work too hard on this zero-stakes test.
I don’t believe a Stereotype Threat experiment has ever been carried out involving a high stakes test: it would be too obviously unethical to try to lower the performance of blacks or women when it matters to the individual test-takers.
So, Stereotype Threat experiments are carried out on low stakes tests where the test takers have little incentive to work hard. Sometimes the experiments produce the socially desired findings of Stereotype Threat and get acclaimed, sometimes they don’t and get forgotten. Very occasionally, a no BS guy like John List, who has a chai at the U. of Chicago explains what he thinks is really going on — publication bias — but mostly it gets hushed up for career reasons.
Wicherts has meanwhile published a meta-analysis of research on the effect of stereotype threat on girls’ math and science test performance. There was a small effect (-0.22 SDs, significantly different from 0). However, there was also clear evidence of publication bias, and when the missing studies were imputed by the trim and fill method, the effect was -0.07 (ns). I’d be surprised if the results were much different for racial stereotype threat.
I mean, did you read about the collaboration between a psi skeptic and believer described in “The Control Group Is Out Of Control”? There are many ways this could go wrong.
Kahneman and Gigerenzer did some adversarial collaborations.
Popperian Epistemology completely solves the problem these studies have.
http://blogs.discovermagazine.com/neuroskeptic/2015/03/15/is-science-broken-lets-ask-
karl-popper/
Popper isn’t even obscure, it’s like I’m taking crazy pills.
Science doesn’t actually work the way Popper says it does (or should – he tends to equivocate between the two). But even if it did, how would this field constitute an example of working science? People are finding evidence against two mutually exclusive and presumably exhaustive hypothesis (there is hiring bias vs. there isn’t) and now nobody knows what to believe. That’s not even something Popper can make a lot of sense of, is it?
There is no descriptive/prescriptive problem, “science” doesn’t work if you aren’t trying to falsify a theory- and it’s an equivocation to conflate evidence-gathering with science.
You can manufacture infinite amounts of evidence “for” a theory, and it can still be false. Geocentricity has plenty of evidence to support it- and people using evidence to deny heliocentric theory were obviously not helping find truth.
Both sides of the debate will be able to find as much evidence for their position as they can find grant-money for. Positivist/post-modern “science” (like this) is essentially a machine that transforms money into sciencish.
If you want to find the worst “science”, look for the party who are least willing to criticize their assumptions:
I’m wary of Moss-Racusin, because they have a Big Truth that transcends reality. Big Truths inform what evidence you consider crucial (Big Truth supporting), and what evidence is incidental or erroneous (anything that doesn’t fit with the un-criticizable interpretive framework of the Big Truth). This ultimately informs their studies, and what their studies will support.
W&C don’t seem to have as much ideological baggage.
Popperian epistemology also asserts that the problem of induction is insoluable (which it may be) and that we can never have positive evidence for hypotheses, just negative evidence. And what happens when two contradictory hypotheses are both “falsified”?
Replace ‘contradictory’ with ‘exhaustive’, but yeah. Falsificationism not really helpful here.
Popper’s position is that you cannot derive new knowledge (or certainty) from experiment – the whole debunking inductivism thing. Hume slam-dunked this, and Popper performed the end-zone dance.
What you can do with science is (provisionally, fallibly) test a theory.
And that is all science is good for.
What to do if all your theories are trash (as in this case)? Cry? Say “At least my theory is supported by this cherry-picked evidence, let’s go with it?” No.
Think of a new, not shitty theory, and try to tear it apart.
That’s science, comrade.
Addendum: Also, no theory is exhaustive- let alone these theories.
Was that mixed metaphor intentional? I hope so, but suspect no.
Is it possible both studies are correct observations of their time and place?
It’s not like “Academia” is really a thing. Even if you assume an entire campus will discriminate the same way, it seems almost implausible that different campuses would discriminate exactly the same way. MR&a appear to be using 6 locations, I think W&C use 371 different locations (I could be wrong on either of those numbers).
Also it seems plausible that a 3 year old research study would have had an effect on research institution hiring practices so the intervening time could explain the different results.
Of course i think this adds weight to your joint study suggestion.
A mediocre resume (like MR&a) primes negative stereotypes about women in a way that an excellent resume (like W&C) does not. MR&a was looking for the sexism effect, W&C was looking the the over-correction for that effect. It seems like MR&a is right – sexist stereotyping is a major issue – and W&C is right – academics are already care about sexist stereotyping and are trying to counteract it.
As other comments mention, its also critical that W&C makes it clear the scenario is imaginary whereas MR&a presents the scenario as having real implications. This fits with the pattern that people poll less racist then they vote.
Pingback: Trouble Walking Down The Hallway | Slate Star Codex | Witti blockt
@Scott Alexander – I think your second point is misleading. You say that they ensured that the “resumes for each candidate were equally good” but looking at the methodology of W&C, they did not provide resumes at all.
In general, the methodology of W&C is quite different than MR&a, and the different methodologies could well lead to the different results (CVs vs. narratives about family life; “Imagine you are ranking candidates…” vs “these are real students who want CV feedback”, 60 validation samples per group vs. 35 validation samples total). It’s probably not the subtle wording changes responsible for the differences, but the large methodological differences.
Do you even know what validation means?
Moss-Racusin et al. did not use CVs at all. The materials read by the respondents were presented as summaries of the original applications.
Williams & Ceci ran a number of experiments with Ns ranging from 35 (the CV experiment) to 363 (the narrative summary experiment). Moss-Racusin et al. ran a single experiment with N=127.
If you think the difference between the two studies represents a methodological problem rather than an actual difference, the obvious place to look to me seems like file drawer bias – if at least one of the researchers has an ideological bias, they may be less eager to publish results that appear to conflict with it. They may come up with post-hoc justifications why their earlier ideologically unsuitable results were scientifically tainted and they need to tweak some of the parameters and run the test again. They could continue doing this, “honestly” in the sense that ordinary hypocrites use the word, until finally their results came out “right”.
This is a hard effect to detect by reading a study. Ideally one of the researchers would have preregistered their study and you just trust that one, but sadly that’s not the scientific norm. Alternatively you can try to guess which one is more file-drawered by looking at which one has weirder parameters – for example, you say that they would have to “avoid drinking their own Kool-Aid” to realize they would need to use lab manager positions instead of faculty positions, but they might have just realized it because they tried it out.
Stereotype Threat is notorious for the File Drawer Effect:
http://isteve.blogspot.com/2010/01/stereotype-threat-scientific-scandal.html
I don’t know if it was true in this case, but I thought the usual method of introducing unconscious bias when it looks impossible to predict the outcome was something like:
1. Is the result what you wanted? If so look slightly less hard for methodological errors. Else, look harder for any errors.
1a. Fix any errors you find, or compensate for them.
2. Is the conclusion what you wanted? If so, publish as prestigiously as you can. If not, publish more reticently or not at all.
Then it’s really hard to point to any specific bias, since the problems are things like “you didn’t spot this minor flaw” which is true of ALL papers, and things like “but there’s other papers in the field which say the opposite, even if they’re not as convincing” which is ALSO true of all papers. But statistically, it can have an effect. (Eg. in drug trials where it’s even harder to rig the result in advance)
Speaking in my capacity as Representative of Token Idiots on discussion fora everywhere…
Could the drum-beating about lack of women in STEM fields be partly attributable to a basic difference in what’s being counted?
Scott mentions “Williams and Ceci have been vocal proponents of the position that science isn’t sexist for years now – for example, their article in the New York Times last year, Academic Science Isn’t Sexist.”
And yes, I’d be perfectly prepared to accept that academic science isn’t sexist (partly because Teaching Is (regarded as) A Female Profession). But, based on experience from Ireland, when government etc. starts campaigns for “Get more girls doing science in school”, they’re thinking of industrial STEM: getting kids to take subjects that they’ll go on to do degrees in and when they leave university go for jobs in business/industry with their technical skills.
And, as the “hiring male lab managers” scenario demonstrates, there may still be an attitude of “we need someone who can interact with the predominantly male workers on the factory floor and this needs to be one of the lads”. For clerical-type STEM positions (where there is more paperwork/less hands-on experience), maybe women have the edge, but for the ‘practical’ side, men do?
(Ireland is probably not a good example as we are heavily dependent on attracting foreign investment to create employment and our culture has been oriented around providing an educated workforce as one of the inducements for multinationals to set up here, rather than encouraging R&D for our own businesses and academic institutions and nurturing home-grown entrepreneurs – well, except in property development, as witness the boom, bubble and crash of the past decade).
Okay, I now stand back and prepare to be demolished for my errors here 🙂
Yes, I have heard the claim that, for example, sex ratios in academic vs industrial CS are diverging, that the proportion of female CS professors is going up simultaneously with the proportion of female programmers going down. But the two papers Scott compares are about academic hiring, so that does not explain the difference.
Then it might depend on the fields covered; if one study has respondents from a wider range of STEM fields than the other, or has a high proportion of the life sciences, then it might come out that there are more women than men hired there.
Still doesn’t quite explain why engineering* might be more male-populated (if it is) than psychology, but then we get into the messiness about “are men better at maths than women?” (whcih we’ve already discussed here before) and the rest of the hair-pulling.
*And having said that, in my workplace a lot of our technical staff – including engineers – are female.
I’m an engineer in the construction business.
And, as the “hiring male lab managers” scenario demonstrates, there may still be an attitude of “we need someone who can interact with the predominantly male workers on the factory floor and this needs to be one of the lads”
This attitude may be present, and almost certainly is among older hiring managers. However, the actual results don’t bear this out. There are a few women doing engineering jobs which get them out onto jobsites, sometimes telling the construction crews that the work wasn’t completed correctly, or that the crew should do something differently. This results in about the same amount of grumbling from the crews as it does when it comes from men engineers and inspectors.
The biggest difference I’ve seen is that any inspector or engineer who seems in the least bit hesitant or tentative will get a lot more pushback from the construction crew, and women are more likely to behave that way. But the reaction from the construction crew isn’t sexist per se – they give the same hard time to men who act that way. The next biggest difference is that if an inspector or engineer loses the respect of the construction crew, the terms of abuse applied behind their back are generally worse and more sexist for women than for men.
Yeah, they give a hard time to anyone who fails to act like a manly man — that’s not sexist at all.
This is the point where he calls you sexist for thinking that having a firm grasp of the authority of your office is inherently masculine and then the argument bottoms out.
It’s not industrial vs academic, but theoretical/mechanical vs. life sciences. It seems that women choose life science STEM careers just fine, and the fact that women are in STEM in respectable numbers in fields like chemistry or microbiology gets passed over for their lack of representation in engineering and comp sci.
I have a different hypothesis.
It turns out that the most extreme version of post-modernism is correct, and that:
1 – It is true that there exist academic bias against women, but only if you belong to a culture that thinks that exist bias against women in academia.
2 – It is also true that there exist academic bias favoring women, but only if you belong to a culture that thinks that exist bias pro women in academia.
Moss and Racusin belong to the first group, while Williams and Ceci to the second. Therefore, it is only expected that their experiments would confirm what they suspected. After all science is about finding truth, and truth is culturally defined.
/s
Pingback: Studie visar 2-1 preferens för kvinnor inom STEM | bittergubben
Hypothesis: there are two bias, one against women and one in favor of women. Both are independent of you political opinions on the matter. The bias against women is strengthened by thinking about gender, simply because gender is on your mind when making the decision. This theory neither requires nor demands that the bias in favor of women is affected by the circumstances.
This would imply that people who worry a lot about sexism are more sexist, which is so politically convenient that I’m pretty sure my brain generated this hypothesis in an extremely biased way.
Thinking about gender will strengthen negative stereotypes about gender, as findings on implicit association tests show. Stronger negative stereotype associations may or may not translate into actual discrimination, since implicit association tests neither capture all associations encoded, nor capture the behavior that results when the associations are primed.
You’d be less likely to be thinking about gender when asked to evaluate a single candidate than when asked to compare and rank three candidates, men and women, no? But the direct comparison study is the one who showed pro-women bias.
The name thing is interesting – I’d be interested to know how they defined masculine and feminine names however.
It immediately occurred to me that these things are going to have to be culturally/linguistically linked, and that there is *probably* no such thing as an inherently masculine or feminine name.
I mean, to my ears, names like ‘Amelia’ sound prettier, somehow lighter, but also less serious than names like ‘David’. But I don’t know what’s happening here – does ‘Amelia’ sound like that because I’m subconsciously associating it with femininity, or is it given to women because it sounds like that and those things are associated with femininity?
When you get cross-cultural, it’s gotta be weird. In the rest of Europe, men frequently have names that sound very feminine to Anglophone ears, including some pronounced exactly like English womens’ names (Michele springs to mind). ‘Jacques’, to me, sounds much more feminine than ‘Jack.’ Conversely, in more Germanic languages a lot of women’s names sound more macho to me – Helga, Olga – I can imagine both of those names with beards and a big axe.
In other languages, such as Chinese or Russian, I couldn’t reliably guess whether any given name was a man’s name or a woman’s.
I’m pretty sure Olga is actually a Russian name.
Yes: Russian form of HELGA. The Varangians brought it from Scandinavia to Russia. The 10th-century Saint Olga was the wife of Igor I, Grand Prince of Kievan Rus (a state based around the city of Kiev). Following his death she ruled as regent for her son for 18 years. After she was baptized in Constantinople she attempted to convert her subjects to Christianity.
source: http://www.behindthename.com/name/olga
I remember reading somewhere that there is a cross-linguistic tendency for female names to end in vowels as opposed to consonants, or long vowels as opposed to short ones. I don’t remember the source and I don’t know the generalisation really holds up.
There maaaay be some buba/kiki style effects here buried beneath lots and lots of language-specific phonology and morphology. But I’m sure their role is vanishingly small compared to the cultural/linguistic associations.
As a non-French person, I feel compelled to point out that the rest of Europe is not France. 😉
And yes, Olga is a Russian women’s name. The male version is Oleg.
I know a Swedish guy called Kim who took high school in the US. He spent the entire first semester trying to get off the girls track team.
“David” has two plosives whereas “Amelia” doesn’t have any, so that might be why it seems to be heavier and more serious.
A little googling turns up this article which has some information about sound differences in the US, but what it says presumably only applies to English (some of the rules cited couldn’t apply to other languages, e.g. Japanese or languages with fixed stress). And in the US, the phenomenon of masculine names becoming feminine suggests that this is pretty fluid. I hear names like Ashley or Courtney as obviously feminine since they made the switch before my time, but the people who were using them as male names a hundred years ago clearly didn’t agree.
I don’t think that it is clear that people a century ago disagreed very much. People who adopted elite surnames as boys’ names may well have wanted names that were refined as opposed to macho.
Shirley used to be a boy’s name (see Charlotte Bronte’s “Shirley”, where the heroine has to explain to everyone that her parents expected their child to be a boy, picked out a name, and when she was born they wouldn’t change it).
Apparently, the influence of her novel was such that by the time Shirley Crabtree was establishing his career, although it was still a male name, the predominant use of it was for females.
I’ve seen the same in my lifetime with Robin (and its variant spellings such as Robyn): when I was a child, it was very definitely a boy’s name; nowadays, it’s so shifted to being a girl’s name that the only male Robins I can think of off-hand are the late Robin Williams or “Robin, the Boy Wonder” as in “Batman and Robin”.
I’ll add Robin Hood, Robin Thicke, and Robin the Avatar. That last case is amusing, since it’s the name of a character that defaults male, but can be male or female, and therefore the name was chosen to be deliberately androgynous for script purposes. They have a child in-plot with the same flexibility but opposite default sex who is, appropriately, named Morgan.
There is some evidence that the mapping between sound and meaning isn’t arbitrarily assigned by culture.
(the only exception here is the lab manager. It’s just within the realm of probability that MR&a might have somehow realized they’d get a stronger signal asking about lab managers instead of faculty. The choice to ask about lab managers instead of faculty is surprising and does demand an explanation. And it’s probably the best candidate for the big difference between their results. But for them to realize that they needed to pull this deception suggests an impressive ability to avoid drinking their own Kool-Aid.)
I think the best explanation for focusing on lab managers is that a professor can usually hire a lab manager completely on their own, without the involvement of a committee, so you can ask what sorts of decisions an individual professor will make when left to their own devices. Faculty hiring, OTOH, is a decision made by a committee or even an entire department (or, most frequently, a two-step where a committee runs most of the process but the department makes the final decision based on the committee’s findings and presentations).
It seems plausible to me that people will act differently in a unilateral decision on a person that will work primarily with them vs. a group decision on a person who will work with the group.
Here’s a thought – people are well aware of the previous study, especially people in the academia. I think that that previous study has sort offish became a “common knowledge. I don’t even mean the truth value of that study, but its existence. Then, since I don’t want to look like I bias women and I know that in all likelihood, if someone asks me whether I would hire a woman called “Y”, it is a part of a study about sex bias in academia. I will not want to look sexist in front of the researchers, so I say I will hire her. If it is about a guy, there is no such pressure on me, so I just consider his qualities. If that were true, a replication of the previous study, even with the lab manager, would today yield a similar result. Then again, I can hardly believe that people are this erratic and that so much can change in two years.
When you are hiring for tenure-track positions (especially at R1 universities), it’s easy to eliminate the bottom 80% candidates, but the top 20% are all usually absolutely exceptional, with multiple top tier publications and fantastic research plans. At that point, what makes the difference are very subjective criteria – strong recommendation letters from people that you personally know and trust, their chosen area of research is a very good fit for the department, or even things like demographics.
My impression is that it’s harder for women to reach the top then it is for men, and that sexism and discrimination are very real things. However, when you’ve reached a point where the amount of superbly qualified candidates far exceeds the supply of available positions, then being a discriminated minority can suddenly become quite a strong advantage. For example, when submitting grants for highly competitive EU projects (Horizon 2020) I’ve seen consortia actively seek out women/minorities to join on the project because they thought that would significantly increase their odds of being accepted.
I add a huge caveat that this is just my own personal experience when helping select candidates, and I’ve never really seen systematic studies – but I think that among absolutely exceptional candidates, demographics makes a very big difference.
I suspect that in most areas (and assuming honest researchers) bias towards certain results and the experimenters’ opinion do not necessarily cause each other, but are both caused by the experimenters’ way of gathering information about the world. This may affect “formal” information gathering (i.e. study design) in the same direction as “informal” (i.e. making up your mind based on anecdotes and newspaper articles).
At least I usually find that people with similar personalities tend to come to very similar conclusions on most questions (once they exchange the relevant facts, at least), while dissimilar personalities often have such strong disagreements that they are unable to even comprehend each others arguments or get to the stage where they can exchange facts in a civil manner, and having read the same sources rarely helps either.
It would surprise me if scientists were immune to this. And it would also explain why a lot of STEM people think the accusations of sexism by social science people are completely unfounded and don’t even merit as much as an answer. The whole notion of making assumptions about someone’s skill based on their gender when you have way more accurate data (like, you know, grades, publications and their actual performance on previous tasks) seems so absurd and alien to them that any accusation of sexism just looks to them like someone hurling meaningless insults in their general direction.
Seems to me that if the field is dominated by men, then “John” is more familiar than “Jennifer.” Using “him” and “her” helps control for that.
“…studies of real-world hiring data confirm women have an advantage over men in STEM faculty hiring (although far fewer of them apply).”
Is it possible that there’s a Simpson’s Paradox going on with these numbers? If men are more likely to apply for very difficult STEM positions, then women would have better hiring rates overall, but men would have better hiring rates within some (or all) types of position.
This probably doesn’t explain the contradictory findings of the two articles Scott discusses, though.
Good thought.
From their earlier paper(linked in the nytimes academic science isn’t sexist article), talking of this one:
A recent large-scale national tenure-track-hiring experiment was specifically designed to address the question of whether the dearth of women in math-intensive fields is the result of sex bias in the hiring of assistant professors in these fields. This study sampled faculty from 347 universities and colleges to examine bias in the hiring of tenure-track assistant professors in various STEM fields (W. M. Williams & Ceci, 2014).[19]
This finding is consistent with the other evidence on productivity presented below, which also fails to show female superiority in hiring outcomes as being due to objectively higher female quality.
I think the lab manager study did separate them out by the field, so the 120 odd sample size was even less than the 60 for a side bandied above.
The berkeley case is still on top eh, another data point for the third view I outlined in other reply.
But when examining the individual departments, it appeared that no department was significantly biased against women. In fact, most departments had a “small but statistically significant bias in favor of women.”
Oh, and there would be the famous asian discrimination proving Espenshade study as well for the ‘subtle’ case.
Looking at the the Bureau of Labor Statistics numbers (http://www.bls.gov/opub/reports/cps/womenlaborforce_2013.pdf page 33), the closest matches I see to the titles used in the two studies are “postsecondary teachers” (“tenure-track faculty”) vs “general and operations managers” (“lab manager”). These are 48.2% female and 29.1% female, respectively. So I think that’s where the real difference comes from: there’s a pro-male hiring bias for managers, and a pro-female hiring bias for teachers, and this is reflected in the studies and reflecte-ish in the BLS statistics.
Hey Scott,
There is another major difference between these studies that is worth keeping in mind. In C&W’s experiment, the resumes weren’t identical. Half of the narratives used female-gendered personae (focusing on teaching quality and sociability) while half of the resumes used male-gendered personae(calling the candidate a powerhouse, for instance). The results represent the summation of half the time giving the male candidate the female adjectives, and half the time giving the female candidate the male adjectives.
I think it’s crucial to know – was the male described in typically female terms hugely penalized? If so, that would be consistent with a lot of previous literature on gender stereotypes, and might reconcile the differences between results.
See pages 2-3 here: http://www.human.cornell.edu/hd/ciws/upload/PNASAddt-lResources-Williams-Ceci.pdf
Or just good old publication bias?
My nasty and suspicious mind is telling me to look at incentives. In fact, you can treat this situation in the beloved-by-LW prisoner’s dilemma framework. Will our distinguished scientists cooperate and continue to receive funding for research into this burning and controversial issue keeping their names in the limelight — or will they defect and solve the problem once and for all forcing them all to find a different line of inquiry?
One possible explanation (a generous reading) for the whole “I support gay marriage but not homosexual marriage”: one conservative argument against equal marriage rights is that gay people can get married — to people of the opposite sex. In other words, it isn’t discriminatory to say that two men can’t get married because each man is equally free as any other man in the country to marry any woman of his choosing. (And similarly with two women, etc.) Thus gay and lesbian people have equal access to marriage as everyone else.
Ok, not persuasive to me. But some people find it so. (IMS, a similar argument was made about miscegenation back in the day.) Anyway, if one assumes that the sign-holder was trying to make that argument, then the sign makes sense: he supports gay marriage (i.e. he’s not discriminating, gay men can marry women too!) but not homosexual marriage (i.e. gay men marrying each other, similarly lesbian women).
I’m not saying it’s what was intended. But it’s the only reading I can think of that makes a lick of sense.
Just in case you didn’t notice this, the text on the sign is shopped. (This looks like the original.)
Ah, didn’t notice. Thanks.
Some people might reject “do you support homosexual marriage” on the grounds that (a) homosexual is an offensive term, they prefer to use LGTB or queer or something else and (b) it’s not “homosexual” marriage, it’s marriage, otherwise you’re still discriminating by making a difference between ‘normal’, ‘traditional’ marriage and ‘homosexual’ marriage.
[Disclaimer: I’m not from the US. Neither do I get the finer points of your language (so maybe I’m stating the obvious and you were joking) nor of your political debate (so maybe the reception of LGBT topics differs vastly from european “standards”). That being said:]
There is research to the effect that the word “homosexual” triggers the concept “gay” in people’s mind as in “gay marriage, that’s just gross”, wheras the wording “gay and lesbian” triggers the concept “lesbian” as in “lesbian marriage, I’m all for it. Get it going ladies, I’ll be watching you on you*ahem*tube.”.
That explaination strikes me as plausible. I can’t be bothered to look up the references just now though, sorry.
Judging by the outcome of that parapsychology study, coming up with a joint protocol will not help. They should probably just swap their existing protocols.
Two years ago Moss-Racusin et al released Science Faculty’s Subtle Gender Biases Favor Male Students, showing a strong bias in favor of men in STEM hiring
So, I’ve noticed a thing through this whole post (and it’s not about you, Scott – you’re pretty consistent about your usage); “STEM” here is really “STEM Academia”.
In the non-academic world, though, “STEM” mostly means “real-world jobs involving science and engineering and that kind of stuff”; academia is a side-note, in technical and engineering fields, outside of Pure Science Research.
This doesn’t affect the studies themselves, but it does affect how we should look at the importance of their results – is STEM Academia hiring differently than the private sector, in terms of proportion of people hired by gender, relative to the population?
If so, in which direction (and why, though that’s – as noted- tough)?
For the rest of us, STEM academia is … a place to train working engineers and doctors and scientists, not to churn out more professors; thus private-sector hiring patterns are super important if we care about gender bias.
The unintended side-effect of that could be:
(a) if there are more post-graduates and higher qualifications looking for academic work than there are openings in academia to absorb them and
(b) if academia is hiring more women than men then
(c) the men who were rejected for academic positions must be going into industry to look for work which
(d) explains why private-sector hiring patterns show more men than women 🙂
I know that was a snark, but:
That assumes that people in STEM fields prefer academia to industry. Sort of a reverse of ‘those who can’t do, teach’ which I somehow doubt.
That’s actually a very good question, and has anyone asked it?
Any surveys of graduates asking them do they want to go into academia or into industry/business?
Is there an assumption that Genius McSmartypants is of course going to be interested in winning a Nobel and so will get onto an academic track, but Genius decides they want to makes tons of money and looks for a job in Big Pharma (or wherever) instead?
Wouldn’t attempting to study people who can exert peer-pressure influence over you, and care about what the results of your study imply about them, make a study worse? You’d at least have to study professors at a different university; probably also filtering out ones in the same field, to prevent citation-graph peer-pressure.
Yes, both studies used multiple universities and multiple departments.
I don’t know what Scott’s point was with that paragraph. The line in isolation seems reasonable: it is easier for academics to create realistic CVs to fool other academics than to fool other people.* But social psychology seems to me inherently more difficult (at least in the sense of hard to identify researcher influence) than astronomy, climatology, and even other parts of psychology, like the genetics of personality.
* On the other hand, researchers sending resumes to job openings and getting calls back is an actual test of fooling people, unlike these studies, which don’t try to hide their artificiality.
I say the correct response is epistemological despair. Not nihilism. The truth is there, and it can be known, but only by people smarter and better than us.
Timely and relevant slate article about a new study that correlates higher anxiety levels with higher IQ.
Then they reference another study that correlates higher anxiety levels with lower IQ.
I think the scientific method may be broken… I actually mean that. Not in theory, mind you, but in how it is practiced. The incentives are wrong. Nobody is rewarded for confirming/not replicating the results of others. I would think that a “true” scientific method would reward this (and definitely not punish it, as is done now).
Huh. And now I see that up-thread someone linked to what apparently is already a thing, an ongoing debate about whether “Science is Broken”.
h/t @Avery
Slightly tangential, but there’s some supporting evidence (of whatever you want) via this cool tool:
http://benschmidt.org/profGender/#
For instance in engineering, biology, chemistry, and science (?), females are more often described as “young”, which could indicate more new hires, or could just be perjorative use.
Interesting tool, but if you want to know about new hires, just get the data.
I tested the hypothesis that “young” is a pejorative by filtering by positive or negative reviews. It is 2-3x as popular in positive reviews as negative. Also, among positive reviews, it is more common for females in almost every field, for whatever reason. I don’t think that there is a trend for negative reviews. I tried to look at specific fields, but that was difficult because the tool rearranges the fields.
PS – for graphing ratios (including these proportions), log scales are almost always the correct way to go. For example, that would make the sex ratio visible across fields with different base rates. (Engineering vs Communications)
Youth is attractive in women, but suggestive of immaturity and inexperience in men. Explains why someone with a positive disposition towards a person would be more likely to call a woman “young”.
This critique claims, among other things, that the Ceci and Williams study is flawed because the participants could likely deduce that the study was about gender bias, and so were on their best behavior, as it were. Thoughts?
Well, I’m tempted to say that the article suffers from a high degree of motivated skepticism (would they have criticized the study nearly as much if it obtained a result that reaffirmed their position?), but in any case they make a very good point. I’ve always felt that I would make a terrible subject for most psychological studies because I’m familiar with a lot of psychological literature, and I suspect that as a participant I would be constantly trying to guess the purpose of the study and respond in an “appropriate” fashion. Have there been studies (meta-studies?) on this kind of genre-savviness among study participants and how it affects study outcomes?
Yes, the article I linked has many problems. Still, I think that particular one is a potentially fatal flaw for this particular study. I was wondering if anyone knew if the researchers had attempted to answer this question. Guess I should probably just read the paper myself, huh?
The paper explicitly addresses this, though most of what it says is pretty stupid. It includes many (small N) variants, addressing all common complaints. In particular, it includes a variant where the subject scores a single applicant, rather than comparing three. This gets it pretty close to MR&a on this axis (unless MR&a actually fools people into believing that it is not a study).
Stupid things: very few subjects surveyed admitted that they figured out that it was about sex differences; very few subjects issued a tie between the serious candidates.
That critique is definitely wrong in almost all claims it mains.
Ceci and Williams explain how they disguised the hypothesis they were testing in the section ‘Disguising the Research Hypotheses: Use of Adjectives to Create Gendered Personae.’ They report that the participants reported that they thought that the experiment was about judging different kinds of personae, who were described very differently. The blog author doesn’t seem to understand that it was a between-subject design.
The rest of their complaints often clearly wrong or ill-motivated, e.g. they call out Williams and Ceci for a low response rate of 34% and then cites as counter-evidence Moss-Racusin (who have a 30% response rate, “rounding up”).
Indeed, as I said above, the critique has major problems. Perhaps linking to it was pointless.
I had a look at the study, and think it’s fair to say that they attempted to test whether experimental intent was exposed and did not find a problem. Of course, their “tests” arguably suffer from small sample size problems, but so do all studies.
Also, in regards to the response rates, the “sexism is everywhere” camp generally has a pretty strong asymmetric advantage here: a low response rate can be problematic for W&C, but less so for M-R. A low response rate in the former could indicate that only progressively-minded people responded, while a low-response rate for M-R is unlikely to indicate that only the really sexist people responded.
More generally, I think it’s safe to assume that sexism tries to “hide itself”, so that finding a lack of sexism may indicate that you failed to conceal your intent enough and people are just on best behavior. However, finding sexism is unlikely to indicate that people were acting extra sexist because they knew you were looking for it.
I have the most cynical explanation!
“Scientific” studies which are based in giving survey questions are in large part simply not. Who you happened to survey, whether they had breakfast that morning or are dealing with allergies in the spring, have a far greater impact on the answers to the survey question than whatever (questionably extant) factor you are trying to study. The end result of this process is a bunch of “data” that might as well be randomly generated.
Well, if the results of a study are truly random, it would be odd for them to so often match the bias of the experimenter performing the study.
Of course a “noisy” scientific environment probably lends itself more to biased studies than a “clean” one does, simply through publication bias if nothing else – but I can’t see publication bias being the main culprit here. The two studies in question both get very large effect sizes; they’re just in opposite directions. I feel like you would have to have an unrealistic number of failed/unreported studies in order to “randomly” generate a result that is both (a) large in effect size, and (b) completely opposed to other similar studies that also have large effect sizes. I’d put way more money on there being some subtle methodological difference between the studies in this case, like the lab manager/professor discrepancy.
So an interesting question is, how much time and money does it take to gather data of this sort?
I know from work experience that pharmaceutical companies will start and then cancel clinical trials when they’re not giving the data required for safety or indication needs. But that’s a different breed of a situation. Both the cost of the study and the potential profit are waaaaay out of proportion compared to a psychology paper.
Let me be clear, I think you’re making a good point. I just want to offer one more possible consideration. There are *a lot* of biased researchers. So one researcher or team of researchers doesn’t need to do 20 studies to have one randomly produce the strong results. It’s only required that lots of studies are going on, and the lucky researchers who happen to have their study be the one with strong results gets to publish.
“Both W&C and MR&a ensured that the male and female resumes in their study were equally good. But W&C made them all excellent, and MR&a made them all so-so. Once again, it’s not really clear why this should change the direction of bias.”
I do remember a study about hiring decisions of college students based on resumes.
It was to investigate racism. There were two racial conditions (black and white), and three competence conditions (excellent qualifications, okay qualifications, and unqualified).
The results were something like:
95% of excellent blacks and 90% of excellent whites were “hired”.
30% of okay blacks and 70% of okay whites were hired.
10% of unqualified blacks and 5% of unqualified whites were hired.
My personal explanation for this is that there were three factors used in decisions:
1) qualifications (consciously)
2) anti-black racism (subconsciously)
3) desire to not appear racist (consciously)
When you are unsure, your gut plays a bigger role in making the decision. And so implicit racism plays a bigger role. The reason that blacks seemed to do better in non-ambiguous situations was that people were consciously trying to be non-racist.
Unfortunately, this anti-racism correction was too strong in non-ambiguous situations. But completely insufficient in ambiguous situations.
Anyways the pattern in these male/female studies would seem to be similar.
We’re 46 years into the current feminist era, so what’s more likely: that secret covens of male chauvinist warlocks, having conceded in law school, B-School, and so forth, are making a last stand in the math and physics departments? Or that the warlock-sniffing has long ago hit diminishing returns, but society keeps doing it because if Larry Summers can lose his job in part because of expressing Doubts about feminist theory, you can too?
“And the best way to fight sexism in science is to remind people that it would be hard for women to make things any more screwed up than they already are.”
But what if women are the ones screwing it up? In MR&a, the men were only slightly biased against women, but the women were extremely biased against women. So if you want to talk about studies that potentially might have been carefully designed to show exactly what the authors want to show, think about this: the setup of MR&a is such that the entire study was able to achieve statistical significance in demonstrating bias against women in general, yet the sample size of each gender taken separately was not large enough to compel the authors to conclude that women are significantly more biased than men — although that pops out instantly to anybody who looks at the graphs. So if women are being held back by other women, perhaps the best way to achieve equality is to keep women out of hiring decisions. Either that or send them to anti-bias training.
Or perhaps the women know something about women that the men don’t. What if MR&a, not W&C, is the study that demonstrates how individuals attempt to correct for widespread societal bias? If people believe women are the beneficiaries of significant affirmative action and other social forces, then the exact same resume honestly does convey different information to them about the candidate’s competence depending on whether there’s a male or female name at the top. Aren’t women in STEM more aware of the pressure to place more women in STEM fields than men are? Haven’t they been through all those banquets at the faculty club telling them how badly they’re needed, regardless of their level of ability?
What if we’re willing to help women along until they graduate from college, and after that our attitude is, okay, no more tilted playing field, time to get by on your own merits? So at that point, a female name becomes a liability due to assumed prior favoritism?
(I only believe some of these hypotheses.)
To see where power exists today in academia, consider the career of Doctor Faust.
When Larry Summers went all Steven Pinker in 2005, he quickly gave $50 million in reparations (in other people’s money, of course) to Harvard’s Head-Feminist-in-Charge, Drew Gilpin Faust, boss of the Radcliffe Institute. She used Larry’s $50 million to make lots of good friends, so when Larry got the boot awhile later, Doctor Faust got Larry’s job.
Pretty great up until Valdes and Cornelius show up.
A Faustian bargain.
W&C are quite neutral, though this paper of theirs brings the third view into sharp relief. Discrimination against men and not just the vacillation between the two of discrimination against women and no discrimination against women.
The problem is, the third viewers don’t have as much of a presence. Sometimes they could have their ammunition without lifting a finger though.
But the data, she says, show that female professors in the study actually were more likely to be second through fourth authors than first. It knocked down her theory that male scientists had failed to ask her to collaborate on academic articles because she is a woman. Since she first visited Mr. Bergstrom’s lab, in fact, she has published three academic articles on which she is not the lead author. The article on gender and authorship will be her fourth.
“For me,” she says, “this really showed the beauty of science, that you can have this personal experience that isn’t reflected in big data.”
http://www.chronicle.com/article/The-Hard-Numbers-Behind/135236/
Or when they find that men having an advantage in peer review process only got exacerbated with a blind review.
And the more subtle ‘looking for something else, so not remarked on’ studies.
http://andrewgelman.com/2012/12/29/sexism-in-science-as-elsewhere/#comment-123122
Stoet and Geary are more ‘offensive’ in this regard, however their papers were at school level, PISA and stereotype threat. As in, boys are underperforming and not that society is or is not discouraging girls.
And JM already made the other point for me. I’d include SAT in it as well. Of course, not to the same level as grades.
In both Study 1 and Study 2, girls ended the school year with
GPAs that were more than half a standard deviation above those of
their male classmates. Notably, girls outperformed boys in every
course subject, including both basic and advanced math. In con-
trast, gender differences favoring girls on a standardized achieve-
ment test were more modest and not statistically significant. And,
contrary to our expectation that girls and boys would do equally
well on an IQ test, the mean IQ score for girls was about half a
standard deviation lower than that for boys.
In the Simonsohn comment you linked to there’s this hilariously boneheaded quote from the Moss-Racusin et al. paper:
Simonsohn quotes that passage because the citation they give in support of this (il)logic is a paper by Simonsohn and colleagues who definitely don’t recommend anything of the sort.
For crying out loud, this really isn’t difficult. I knew the answer the instant I saw that Moss-Racusin et al were talking about a “lab manager” position. As Alexander notes, that’s an odd choice.
Well, yes. It is also the perfect choice to get that answer (gee, I wonder – nah…).
Here’s the thing: lab managers are very important people with limited careers. A good lab manager is central to the running of a good lab. But – and this is key – it’s a job where you are ideally being employed long term, and I mean ideally from the boss’s perspective. You want someone who is always there, dependable, and who learns the inns and outs of both the lab and the research institution. The best lab managers I’ve known – and I’ve known some very good ones – have all been in the same lab, a dependable fixture, for fifteen, twenty, thirty years. Sometimes they’ll even accompany the Professor, the Big Man, from one institution to another.
Now why might someone not want to higher a woman for this long-term position? Because women have babies.
A woman falls pregnant, in most places she is guaranteed mandatory leave (often paid leave). If a lab suddenly loses a researcher (quits, is highered away) the lab can adjust, others can pick up the slack and so on. If a lab is suddenly without a good manager for six to twelve months, it can foul up the whole thing. Or at least make things more difficult. Remember, by definition, highering a _good_ lab manager for the short term is difficult.
And what if the woman decides – as she has every right to do – that, gee, its more important for her to bring up her kids? Then you are stuck doing the whole business all from scratch. Or what if a female lab manager marries to a careerist scientist who has to suddenly move away? (the reverse case is possible, but it just doesn’t happen as much. I know plenty of women who have followed their husband’s careers; I’ve known far fewer men who have followed their wives)
This just isn’t the case for a man who becomes a father. A lab manager who becomes a father is likely to be more dependable, not less, because he is now _it_. The head of the house, the one bringing home the bacon. He is far less likely to take a chance on another job, and much more to guard this one closely.
So, a man is a better bet there.
I’ll go over this in more depth on my own blog, but I’ll let this stand for the moment.
“Lab manager” means a lot of different things. In some labs it is career, while in others, it is a temporary position. This experiment specifically said that it was a temporary position, between undergrad and grad school. That may not be enough to make people play along, but the information available to evaluate the candidate made it pretty obvious.
I’m not sure how relevant it is, but it appears to me that the job is pretty heavily female in practice.
If anything, that enhances the point. Not wishing to be blunt, but the competition is merciless out there. A middling student who becomes a lab manger is more likely to stay there, all other things being equal. That would also explain why the biggest difference found in that study is between salaries – a higher salary may well be there in order to entice someone to stay.
All of the data can be understood by the idea of people wanting a male lab managger (as the study says).
As regards the last sentence – I’ve had the opposite experience, but “the plural of experience isn’t data” or however it goes.
Lab manager positions are a lot less competitive than tenure track positions. A lab manager is basically a lab tech for a slightly bigger lab – they are in charge of ordering consumables, making sure instruments are running smoothly, etc…
Which results do the incentive structure in 21st Century America pay more for?
Google says “accuser on minor sins” originates in this post.
I am very, very impressed.
Thanks for pointing this out.
Another biasing factor: a meta-analysis shows that studies into how often men and women interrupt each other, which have a female first author find that men interrupt women more, compared to studies with a male first author.
http://people.uncw.edu/hakanr/documents/genderandinterruption.pdf
Of course, which direction the bias is in, who can say…
Pingback: 1p – Gender and Academia Redux: Researcher Bias Is a Big Deal | Profit Goals
Pingback: 1p – Gender and Academia Redux: Researcher Bias Is a Big Deal | Exploding Ads
Hang on, has anyone looked at how the participants were determined?
MR&a:
“We recruited faculty participants from Biology, Chemistry, and Physics departments at three public and three private large, geographically diverse research-intensive universities in the United States”
W&C:
“In experiments 1–3, 20 sets of materials were sent to 2,090 faculty members across the United States, half female and half male; 711 voluntarily ranked applicants (34.02%). (Cornell University’s institutional review board approved this study; faculty were free to ignore our emailed survey. [emphasis mine])”
Regardless of how “diverse” the W&C self-selected respondents look by various measures, the one thing you can’t control for is their actual gender bias (it doesn’t look like they tried to separately assess gender bias as MR&a did.)
I would love to see W&C’s prompt for participation in the survey, and how many individuals started the survey and abandoned it. I would also love to know how MR&a recruited their participants.
In what way was MR&a’s survey (which received a slightly lower response rate at 30% (rounded up)), not subject to selection bias to the same extent?
Interesting… If self-selection was the cause of the disparity in results, I would expect that the results should have been the opposite of what they actually were.
I would think that, if there was significant self-selection bias, people would be more likely to respond to the authors they favor — which would mean that people with a bias against women should have been more likely to respond to the W&C study, and people with a bias against men would have been more likely to respond to the MR&A study. Any leakage in the wording of the recruiting efforts, etc., should more likely than not have revealed the author’s own persuasion, and thus motivated cooperation from those most likely to agree with the author. But this would mean that W&C’s study should over-represent bias against women, and MR&A’s study should over-represent bias against men. (Unless, of course, the respondents consciously or subconsciously grasped the purpose of the study and systematically lied… to further their own cause.)
Well, I am unsure how MR&a conducted their recruitment, and I am also unclear how many people abandoned providing answers once the tasks became clear. That is why am interested in whether MR&a put more effort into recruitment than W&C, which was essentially no effort all (1 email not followed up). Given similar response rates, but different recruitment effort, that would already point toward some confounding factor.
In addition, because W&C comes after MR&a, respondents may be primed with an incentive to avoid discrediting their particular STEM field. Given low recruitment effort, one might expect a differential response if the respondents could understand what W&C were attempting to study. One might also expect issues with initial response but then failure to complete the task if the respondent understood that their views might result in negative outcomes for their field. For example, given a scenario where they felt the male candidate was more qualified than female candidate, they might simply abandon the survey, rather than complete it.
” Given similar response rates, but different recruitment effort, that would already point toward some confounding factor.”
Why are you assuming that, and not that the two studies used basically the same recruitment method and so got roughly similar results?
“Given low recruitment effort, one might expect a differential response if the respondents could understand what W&C were attempting to study.”
But W&C checked to see whether participants understood the point of the study and they reported that they thought that it was about something totally different.
Information you pointed out but didn’t emphasize that is noteworthy is that W&C chose to systematically over-represent women as respondents to their survey. Women are far less than 50% of the female faculty of STEM fields. A study meant to determine the typical gender bias in STEM fields should have represented each gender to how represented that gender is in that field. The non-representative sample could easily explain the disparity of the two results. If male faculty are on average biased against hiring women, and female faculty are on average biased against hiring men, and the average bias of female faculty is stronger than the average bias of male faculty, a representative sample would still probably show net discrimination against women because men vastly outnumber women to begin with in these fields.
So MR&A cheated by first looking to find the position in academia where men were most over-represented (lab director) when plenty of other positions over-represent women (e.g. department secretary; note both of these positions are essentially just assistants to people on the tenure track). Whereas, W&C cheated by surveying women at a much higher rate than someone would actually find women in these fields in academia.
Excellent. I always get worried when I see a study that doesn’t seem like it was obviously rigged, especially if it deals with sensitive subjects.
MR&A had concluded that “faculty gender did not affect bias”. In their data female faculty were actually slightly more biased against female applicants: http://www.pnas.org/content/109/41/16474/T1.expansion.html.
If you think the sex of the faculty is driving the effect, the right thing to do is use a sample with an even sex ratio so that you can test that hypothesis.
Both studies found that the faculty sex was not statistically significant. The sex difference was small compared to the effect, let alone the difference between the studies. The moderate sex difference found by MR&a, in the opposite direction than you suggest (though I think in the same direction as the rest of the literature) was not statistically significant because MR&a have so few women in their sample, a 3:1 ratio.
Where do you get that belief? (lab manager, not director)
You could say the same about grad students. Every detail of the prompt made it clear that this position was a proto-grad-student.
But both studies found both men and women to be biased and to similar degrees (women were a little bit more biased in both studies).
So the idea that each study slanted things by over-sampling men/women seems like a non-starter…
I’m reminded of Richard Mitchell on out-of-touch professors:
“So probably we’re going to have to do that @#$%ing gestalt thing.”
— it is this utterly brilliant attitude that makes me come back again and again to read what Scott has been up to. Please never stop writing this blog.
What is the prior probability of a published study fabricating data?
It depends on what you mean by “fabricating.” Most people would feel very unethical making up data altogether, but most people are pretty comfortable with massaging their results. The net result is that most published studies are false.
How can we trust it, then?
Sorry if this has been suggested in the hundreds of comments already, but since there’s a two year delay between two studies, and since the original study was highly publicized in the press, maybe most faculty in the second study were aware of the first study and were actively selecting for women to counteract that trend. This would suggest that even if we repeated MR&a exactly, at this point, people are aware of these studies and we’d get a pro-female result. I the end, this may, in fact, have very little to do with actual hiring practices.
I like this explanation so much, it’s tooo kawaii!!
Rather than individual people changing their selection behavior, it could be that the prior survey led the most biased to be less likely to take or complete the survey. And the least biased (or even biased in favor of women) might have been more likely to take or complete the survey.
This assumes that the broad purpose of the experiment could not be/was not sufficiently obfuscated from the respondents.
Having been vaguely in STEM, (and being female,) I have a lot of difficulty imagining that STEP is a particularly bad field for females. Construction? Monster truck driving? Football? I can think of a lot of professions that seem a lot less likely to be female-friendly. Honestly, I can’t remember anyone whom I’d describe as even a vaguely sexist-jerkface from my STEM days.
However, as a woman now not-in-STEM, I find it difficult to find other women with an interest in discussing STEM-type-things in their spare time. 🙁
My Litmus test is that I always hear non-STEM women explaining how sexist the STEM hiring process, and the STEM field in general is.
But I have never heard any women in STEM in the industry or on a college faculty that claims so. The ones that came closest were several girls in my undergrad and graduate courses… who decided to go and get hired as managers right out of college and avoid any job with any significant technical work (just their general predilection, not particularly due to a lack of competence).
The girls that went into actual, technical engineer jobs did not see it that way. Some were even fairly sour on how much focus and attention is given to them for being a girl. Worried that people won’t take them seriously precisely because they’re given obvious preference and handicaps for scholarship and hiring processes.
Most of said girls (both groups) were my friends, incidentally. And all with high-quality minds. But it’s been very clear in my experience that everybody not qualified to know first-hand says there is sexism, and everybody qualified to know first-hand does not. This leads me to believe it is an ideological and memetic need, and is driven by little or no systemic reality.
My own impression is that women working in STEM tend to be concerned about the low prevalence and retention of women in their field, but think the problem is not or is at least much more complicated than “because sexism.” Representative quote from a friend:
“Women that leave go and do something else. They don’t stop existing the day they leave tech. They are leaving a field with the best comp and perks a wage worker can get (with one of the smallest wage gaps of any industry, more like 96/97 cents to the dollar) to go and do something, but nobody seems to be asking what that something else is.
If the recruitment/hr industry could answer that question, I think they’d be on their way to fixing the ‘leaky pipeline’ as we call it.”
I’ve seen a similar result. But the only consistent answer I’ve gotten is that they just aren’t interested enough in the field compared with other work. The job and its prospects are as nice at it can be – they aren’t interested in it . They just have different priorities.
Which is perfectly fine. I just wish I wasn’t called sexist for women as a general group not finding my field interesting. Personally I find it much more sexist to presume that women’s general preferences are somehow ‘wrong’ compared with men’s, and therefore their lack of desire in the STEM fields is some sort of problem – some sort of sabotage inflicted on them by [male-dominated] society.
Pingback: Quotes & Links #59 | Seeing Beyond the Absurd
“It’s not even like we’re trying to detect a subtle effect here. Both sides agree that the signal is very large.”
Maybe that’s the error. Maybe the signal is actually very small. So when you do a study like this, trying to eliminate all influences but the signal, you end up mostly measuring a combination of biases inherent in your study and noise.
In particular, if people’s preferences are close to even, but unstable, it doesn’t take much publication bias to get the desired answer. If every change is likely to produce a strong effect in one direction or the other, only half have to be thrown out.
This is particularly problematic for head-to-head comparisons, which might just be measuring some kind of tie-breaker with little relevance to the real world. That seems plausible to me, but not that it is unstable. Anyhow, MR&a don’t do head-to-head comparisons, but ask for numeric scores. And W&C also do a validation study with numeric scores.
I don’t think “needing to pull this deception” is needed for a scientist trying to get the biggest effect size possible.
Pingback: Outside in - Involvements with reality » Blog Archive » Chaos Patch (#58)
If someone believes that there’s no such thing as experimenter bias, and they conduct a study that find that finds that there is indeed no such bias, is that evidence against the hypothesis, or for?
Pingback: Lightning Round – 2015/04/22 | Free Northerner
Loved this discussion, but feel compelled to point out an additional point that goes beyond the arguments here: drawing conclusions about the sexism or lack thereof an entire set of fields based on hiring decisions is majorly problematic. There are so many points where sexism can influence career trajectories; hiring may be the most scrutinized at this point in history, and one of the easiest for those afraid of being accused of sexism and for those promoting broader representation to choose qualified women over qualified men. There may be mixed evidence about bias at a particular point, but there is certainly a lot of evidence (certainly imperfect) about other points where bias can influence decision-making: e.g., MIT’s self-study that showed that ALREADY HIRED female professors were systematically given significantly less lab space than already hired male professors. So even if we were 100% positive that there was a bias for women at the point of hiring, it would only give us a snapshot of one point in the process of educating, recruiting and retaining women and men into STEM fields.
Pingback: Mediernas ansvar för missvisande opinionsundersökningar | DN Debatt-betyg