Editor's Note: Another letter to the editor that discussed additional negative effects of SRTEs can be found here. It was written by Professor Gary King and was published on Nov. 29.

It’s end-of-semester time and PSU students are evaluating their professors on the SRTE (Student Rating of Teacher Effectiveness), but unfortunately it is a big waste of time for all involved. The SRTE is so inaccurate, so bias and so useless in helping instructors improve their teaching, it is little more than a bureaucratic ritual that risks unintended negative consequences.

Or, so concludes novice education scientists taking a beginning research design course whose final group project was to “evaluate the evaluation” quality of the SRTE. They summed up their evaluation by noting that if even a beginning student’s research proposal was as poorly designed as the SRTE, it would warrant a failing grade. In other words—the evaluative quality of the SRTE should receive a “1” (lowest score) on its own scale.

“If after just one methods course, we can spot so many deficiencies, it makes you wonder why the university would continue to use this ineffective evaluation for something as important as improving instruction at PSU,” says Seyma Dagistan, a first-year graduate student in Education Policy Studies in Sociology and Education and my research methods course where the project was recently done.

The SRTE was not the aim of the course. Instead its assessment was assigned without any prior discussion and the class was asked given what they learned so far to judge the SRTE’s scientific quality and educational usefulness, and to review past research on bias in the SRTE.

The on-line questionnaire that students complete voluntarily asks a series of statements like “rate the overall quality of the instructor” on a 1–7 scale, with 1 being lowest. SRTE scores are used as part of evaluation for faculty promotions and teaching contracts, and supposedly to assist instructors improve student learning.

The new educational scientists identified 5 major problems with the SRTE:

1. Students are not required to fill out SRTEs, the result is frequently a very low response rate and a non-scientific (unrepresentative) sample. Students who really like or dislike the course and instructor may be more likely to respond increasing the risk of inaccurate non-representative scores.

2. The SRTE does not measure teaching quality, rather it measures student subjective perceptions of the class experience.

3. Students generally are not expert judges of good teaching or their learning, so the SRTE score has little to say about instruction and learning.

4. The SRTE asks students to first indicate their expected grade, creating an “anchoring bias” which can unduly influence their opinions of the course and instructor.

5. The SRTE does not calibrate by known external influencing factors such as class size, course level, teachers’ race-ethnicity, gender, and age. Therefore, SRTE scores are not comparable across faculty and courses.

“Each one is a fatal error for any assessment of instruction. While well-conducted evaluations for teacher quality match up with other sound scientific assessments of learning in K-12," said Kevin Briggs, an experienced high school teacher and graduate student in Educational Leadership, "research on the SRTE does not find the same for college courses. If I move to teaching at a college, I won't see the SRTE as helpful to my instruction, other than to indicate how popular I was with my students."

Particularly troubling is the published research showing that students’ SRTE ratings are consistently lower for new, female, minority and physically disabled faculty regardless of their teaching quality.

“It is not that students are trying to be biased but given that the STRE is so subjective, adverse stereotypes seep in,” notes Vanessa Miller, a third-year J.D./Ph.D. candidate in Higher Education. For instance, a recent study of bias in the SRTE concludes that students perceive their male professors as ‘brilliant, awesome, and knowledgeable,’ while the same teaching styles, when thought to come from a female professor, are ‘bossy and annoying.’ Miller adds, “Perhaps this is one reason why American higher education finds it difficult to recruit or promote a diverse faculty. As a Latina professional, I find this kind of institutional bias troubling.”

As part of the project, students also solicited comments from PSU faculty. One faculty member who chose to remain anonymous said “the SRTE overburdens faculty teaching large courses and particularly courses with large concentrations of freshman and sophomores. The day all members of the faculty senate are required to teach 100 and 200 level undergraduate survey courses with 100 or 200 students rather than small specialty graduate courses, the SRTE will be replaced with an evaluation aimed at learning and improvement.”

Professor of Psychology and PSU Faculty Senator, Keith Nelson agrees, “it is high time for change, Penn State could do much better than SRTEs. Effective teaching increases students' curiosity, self-confidence, persistence, critical thinking, cross-domain thinking and integration and creativity. All qualities/skills of high value in students' later career success and in their constructive life paths. But currently they are ignored in evaluations of individual faculty as well as evaluations of how well various programs at PSU are succeeding.”

Associate Professor of Education Leadership, Edward Fuller, a national expert on educational assessment, notes “it is a well-known finding in research on teaching and learning that if instructors are assessed primarily on subjective feelings of students, it can corrupt the whole process. If such assessments are high-stakes for teachers, there is great motivation to collude with the students: ‘I’ll go easy on you, if you then like me and give me good reviews.’ Learning and effective teaching suffer, thus research often reports an inverse relationship between high SRTE’s and other good measures of learning.”

Not only a bad teaching evaluation, the SRTE also fails as an effective personnel evaluation according to the standards of the JCSEE, a Joint Committee of a coalition of major professional associations in the United States and Canada concerned with fair and useful evaluations of professionals in the workplace. “As employees, professors deserve to be evaluated professionally and the SRTE is not really doing that” said Madeline Smith a BA/MA student in Education and Public Policy.

Given all the failures of the SRTE they nevertheless are used across many universities. “It is not as if PSU and other universities don’t know all about the weaknesses” observed Branden Elmore a graduate student in Higher Education after he found on-line a recent PSU Faculty Senate critical report about the SRTE’s. One reason for a lack of change is that while there is much research on the fatal errors of the SRTE, there is not much on innovative better ways to evaluate teaching and learning in universities.

After the experience, my students were puzzled about why such an ineffective system was continued semester after semester at PSU.

I offered this: The SRTE is like the old fable of the “King’s New Clothes.” As in, most everyone knows it is an ineffective and potentially damaging system, yet as a collective we silently conspire to pretend it isn’t.

Like the brave boy in the fable who shouts that the parading naked King is not wearing any clothes, when will someone with enough authority end it? PSU is a great university, it has the resources and leadership to develop a system that could help faculty improve and fairly evaluate them, hopefully it finds the collective courage to do so.

David P. Baker is a professor of Sociology, Education, and Demography at Penn State. 

If you're interested in submitting a Letter to the Editor, click here.