Assessment of Instruction Committee (AIC) Recommendations for the Use of SFF Data

In Fall 2022, the Temple University Assessment of Instruction Committee (AIC) created a guide addressing the proper use of student feedback data. The full document, which is available for download, provides a set of guidelines and suggestions for the use of Student Feedback Form (SFF) data by individual instructors, committees, school and college leadership, and university-level administrators at Temple. It also contains a brief history of the SFFs, a discussion of what these data represent, guidelines for their appropriate use, and suggestions for ways to get the most benefit from them. The document was created by the AIC as part of its charge to provide guidance on assessment to the University community. This is especially relevant in the current context where student ratings of instruction (commonly called Student Evaluations of Teaching or SETs in the literature) have come under increasing scrutiny and criticism.

A snapshot of the listed items copied directly from the full document prepared by AIC are below.

What student ratings are and are not

1. Student ratings are student perception data.

SETs represent the collective views of only that group of people who have experienced the class and who have chosen to report. This statement may seem obvious, but the word “only” in the previous sentence is important. These students comprise the only group that has had the opportunity to observe the instructor and how they have impacted the learning environment of the course. These are perceptions, but they are unique—other faculty cannot have the same depth of knowledge on which to make evaluative judgements. At the same time, they represent only the views of students who choose to report, which may be a problem depending on response rate.

2. Student ratings are sometimes biased.

The evidence for this assertion is clear: student ratings can be biased by characteristics such as race, gender, language competency, and attractiveness, among other factors. For example, some research shows that minority instructors receive lower ratings, and African American faculty receive the lowest evaluations of any other group. Likewise, studies have shown that women faculty often receive higher evaluations than male instructors. Additionally, instructors with pronounced accents or dialects also tend to score lower on student evaluations than mainstream US English-speakers.

3. Student ratings do not measure student learning.

After years of research on this topic, there is almost complete consensus that SETs cannot and should not be used as a proxy for student learning. This may appear counterintuitive because it would seem logical that students who learn more will provide higher ratings for both the course and the instructor. While that may be correct in certain cases, the point made above must be kept in mind: SETs are student perceptions, and perceptions can be and are based on many factors. Although one of these factors may be how much the student has learned, it should not be assumed that courses or instructors that obtain high SET ratings are necessarily those where the student has learned more, nor is the opposite necessarily true. This will be discussed further in the section on grading practices. To reiterate, the consensus view at this time is that the association between SET ratings and objective measures of learning is essentially zero.

4. Student ratings provide useful information.

While the AIC is aware of the controversy over the use of SETs in evaluating teaching effectiveness, we believe that their potential benefits support their continued use. When properly used, they provide one source of data that cannot be obtained in any other way and can be valuable as a tool that helps faculty reflect on and improve their teaching. Moreover, the existence of the SFF signals to the students that Temple cares about teaching and that it takes their opinions seriously. It is equally vital for the University administration to let the faculty know that it 4 values teaching and that it considers assessment of teacher quality and continuous improvement to be critical in all evaluations.

5. Student ratings are not the only way to evaluate faculty teaching.

While SETs are one source of data, it is never appropriate to evaluate an instructor’s teaching on these data alone. This has long been recognized but is now mandated for those instructors in colleges under the TAUP contract (please consult the current TU-TAUP contract for details on the use of SFFs). The Center for the Advancement of Teaching (CAT) has expertise on this topic and will both assist faculty with effectively using SFF data to improve teaching, and help deans and chairs to identify appropriate additional methods of teaching evaluation if asked.

Advice for instructors on how to make the best use of SFF data

1. Do I need to only rely on the University’s SFF for feedback?

All of the recommendations presented below focus on the use of the SFF report that an instructor receives at the end of the course. However, some of the most useful feedback can be obtained during the course, especially if this feedback is obtained no later than the midpoint when there is still time to make adjustments to your course. Soliciting this feedback can be quite simple and does not need to take much time. Basically, it is valuable to ask your students to anonymously tell you how they think the course is going. The Center for the Advancement of Teaching (CAT) recommends that you ask students these simple questions: What should I continue doing in this course because it helps you to learn? What should I stop doing because it doesn’t help you to learn? What should I start doing because you believe it will help you to learn? What can you do differently to help yourself to learn? These responses are for your use only and are not intended to be reviewed by anybody else. If possible, make clear to your students that you have read and heard their feedback (both positive and negative). You want your students to know that you valued the time they put into providing this feedback and that you will incorporate suggestions that you believe are appropriate to your course and will enhance student learning.

2. Why should I add my own questions to the SFF?

The core questions on the SFF are general as they are intended to be used in all courses in the University. While these questions provide useful feedback, they do not give you specific information about the way you designed and taught your course. The questions you can add to the SFF allow you to obtain your students’ perceptions about aspects of your course that are unique to your teaching or to your course. In the current system, the responses to these questions can be viewed only by you unless you decide to provide them to others. Some examples of items you may add include:

If you recently changed your course text, consider adding:
- The instructional materials for this course (books, handouts, etc.) were valuable in helping me learn.
If you used a new technology resource in your course, consider adding:
- The use of educational technology helped my learning.
If you changed the way you provided feedback on writing assignments, consider adding:
- The feedback I received in this class helped improve my learning.

3. How do I know if my ratings are “good” or “bad”?

SFF reports include some comparative data. Included with the ratings is a comparison to typical University and School ratings for the same question on your course. However, instructors should always look at the SFF form in total, not piecemeal. In general, a large majority of all instructors obtain means of 4.0 or higher on the questions. If your mean is considerably below this (say 3.0 or lower) and if a majority of students indicate that they “Strongly Disagree” or “Disagree” with a statement, then, in your students’ perceptions, they do not believe you were doing 6 as they hoped in that area. As an instructor, you should ask yourself why you think they have this perception. What are you doing, or not doing, that has caused them to give you this rating?

4. Use ratings carefully from courses where only a few students complete the SFFs.

The current Temple system allows students to rate courses if the enrollment is five or more. However, the number of students who complete the SFFs might be smaller than this. In small-response courses, even one or two low scores can shift the mean lower, even though those students’ views are not representative of the majority of students. In general, if fewer than 10 students complete the SFFs for the course, the mean rating is not very useful. This is not a hard cutoff – the smaller the absolute number, the less weight should be placed on the reported average. Data obtained from classes where there is a low response rate are also not very informative (see #8, below).

5. What should I do if my ratings are consistently low?

This is a follow-up to the point made above. Any instructor can obtain low ratings on occasion. If, however, your ratings are consistently low (3.5 or less on most questions over several semesters and in different courses), then your students do not perceive you as a good instructor and you should do something about it. It should be emphasized that high SFF ratings are not the goal of instruction: the goal is student learning. Still, students who perceive that their instructor is not meeting their expectations may lose interest and engagement in the course. A suggested course of action would be this: start with a peer that you feel comfortable with and ask that person to attend your course. That person might give you feedback on why your students are rating you poorly. Another course of action is to consult with the Center for the Advancement of Teaching (CAT). One of their specific goals is to assist instructors in their teaching role. The Center is staffed by professionals with the necessary expertise. This help will be private and individualized and is a resource you should use.

6. Should I worry that if I give low grades, I will get low SFFs?

This is one of the major controversies about Student Evaluations of Teaching and has been the subject of a substantial body of research. While it is true that there is a positive correlation between grades and student evaluations, the correlation is not as high as many instructors believe. The research seems to indicate that what most students evaluate is the fairness of the grade they received and the clarity on which the evaluation was based. If your assignments and assessments are clearly linked to the course goals, and if you are clear about the way you give grades, then your students will be less likely to give you poor ratings if they receive a low grade.

7. How should I address negative comments that I think are unfair?

Almost all instructors receive negative comments from students on occasion. While negative comments should not be ignored, what you should look for are patterns and not pay too much attention to a single negative comment no matter how hurtful. However, if several students make the same or similar negative comments, then they are telling you that in their opinion there is something you are doing that they perceive as not facilitating their learning. This perspective should be addressed by at least acknowledging its existence and then attempting to understand on what basis the comment is made and whether there might be remedies for the perceived problem.

8. What should I do if a low percentage of my students complete the SFFs?

At present, the University average percentage of students who complete the SFFs is around 60% during most fall and spring semesters. If your average is consistently lower than this, there are a few things you can do. Some instructors provide time, typically at the end of the last class, for the students to complete the SFFs using their phones or other devices. This can be successful although it doesn’t always work and may not fit the way your course is taught. The literature is very clear on one thing in this area: instructors who tell their students that the SFFs are important and that they will take the responses seriously obtain higher percentage return rates. It is critical that you tell your students that completing the SFFs is important to you and, if possible, give examples of how you used feedback to enhance your teaching and/or the course.

Guidelines for the use of SFF data to evaluate instructors

1. The most important thing is to use the SFF forms holistically as much as possible.

Myriad factors affect SFF scores on a particular form, and there is even more variability and less reliability in a single question on the form. As much as possible, evaluators should look for patterns that are replicated across multiple SFFs. Evaluators should also attempt to figure out reasonable explanations, based in research, for why those patterns might exist. Deviations up and down are to be expected as part of the normal variation across classes, semesters, and years, particularly if the number of respondents on a given SFF form is low.

2. An instructor’s complete set of student ratings should be considered.

In general, ratings across the various items are similar, but the nuance gained by looking at each item can be valuable in certain circumstances. It may seem self-evident that the questions ask different things and that students will rate each question independently of the others. That is, there is no logical reason for a high rating on Q1 to imply a high rating on Q3. However, perhaps counterintuitively, extensive research and Temple experience finds that students tend to rate instructors very similarly across questions (no matter what the questions are). Thus, exercise caution in interpreting any particular question or subset of questions in isolation. The exception is if a particular question seems to break the pattern of all the other ones by a significant amount (e.g., everything else is a 5, but one question is a 3, or everything is a 3, but one question averages a 4.5). That type of pattern bears further investigation. As a general rule, the totality of the questions should be used in evaluation.

3. It is not appropriate to use a subset of questions to assess teaching adequacy.

This is a follow-up to Guidelines 1 & 2. Avoid using scores from two or three SFF questions that are viewed as “more important” or creating a combined score from only those questions (occasionally in combination with an established cut-point) for decisions such as merit or contract renewal. This is not an appropriate use of SFF data. First, small differences in scores are not meaningful. Second, the SFF forms do not measure inherent teaching or teaching ability; rather, they aggregate student perception data unique to the instructor/class combination for the semester. Third, taking subsets of questions decreases the reliability of measurement. As mentioned above, responses should be viewed holistically.

4. It is not appropriate to use the data from a single semester or a single course.

When evaluating an instructor’s teaching, it is always better to use data across multiple semesters and courses. While this is not always possible, multiple data sources are always better than a single assessment.

5. Small differences in ratings are common and not necessarily meaningful.

At Temple, the average score on most items on the SFF is in the 4.0 to 4.3 (out of 5) range. For a variety of statistical reasons, small differences between scores are not likely to be meaningful, particularly at the top end of the distribution. These scores are not normally distributed and are highly skewed. As such, care must be taken in over-interpreting small differences. The difference between a mean of 4.3 and 4.5, for example, is not meaningful.

6. Be cautious in using anomalous ratings.

An anomalous rating for an entire course likely had some identifiable cause behind it but is unlikely to be a good representation of what students in general would think of that course/professor combination. Small anomalous ratings within a given SFF form are rarely meaningful, but very large deviations on a single question of a form should spark an attempt to determine whether the result was a random anomaly or had a reason behind it. One way or another, it is always better to look for patterns in an instructor’s rating over time or across different course types. Every instructor receives an occasional low rating. While one unsatisfactory set of ratings should not be ignored, they should also not be over-interpreted. It is particularly important to keep in mind that an anomalous negative rating might be due to an instructor having been assigned, or having volunteered, for a particularly difficult or undesirable teaching assignment, a new teaching assignment, or a late assignment. Over-interpreting one unsatisfactory set of ratings may also discourage innovation in teaching as faculty might be rightly concerned about detrimental effects on their evaluations.

7. Use ratings carefully from courses where only a few students complete the SFFs.

The current Temple system allows students to rate courses if the enrollment is five or more. However, the number of students who complete the SFFs might be smaller than this. In small-response courses, even one or two low scores can shift the mean lower, even though those students’ views are not representative of the majority of students. In general, if fewer than 10 students complete the SFFs for the course, the mean rating is not very useful because one or two students’ responses can have a significant effect. This is not a hard cutoff – the smaller the absolute number, the less weight should be placed on the reported average. Data obtained from classes where there is a low response rate are not very informative and will exhibit greater variability.

8. Discuss low response rates with instructors as they might indicate a lack of commitment by the students.

This recommendation follows from the one above. Temple has an interest in having students provide feedback. One major reason for a low response rate is because the instructor did not provide time in class to complete the SFF. Of course, another possibility is that students were simply uninterested in providing feedback. Lack of interest in providing feedback might be an indication that students were not deeply engaged in the course but also might mean that they were mostly satisfied with the course and the instructor. There are other factors that could affect response rate; for example, research shows that online courses receive lower response rates than in-person courses, and students are unlikely to be as engaged in a first-year required course than an upper-level elective. When discussing SFF data with instructors, a low response rate is something to mention. Encourage instructors to share with their students that they value their feedback. Giving time in class signals to students that the instructor values their feedback and is willing to give time for them to complete the SFF. Instructors should also share how they have used student feedback in the past to make adjustments to the course or their teaching. Other strategies to increase response rates can be found here and here.

9. Avoid comparing faculty to each other.

Student rating instruments are not designed to gather comparative data about instructors. The purpose of these instruments is to gain an overall sense of students’ perceptions of a single instructor teaching a particular course (or part of a course) to a specific group of students. As mentioned above, SFF forms do not directly measure the main outcomes upon which instructors are compared (for example, SFF forms measure neither teaching ability nor student learning). Comparisons should be sparing and limited to what can be validly defended. For example, in a multi-section course, one might use SFF scores as a general indicator of student satisfaction across sections. However, extreme care should be taken to ensure that the comparison being made actually applies. For example, even when teaching sections of the same course, sections may not be comparable (e.g., the MWF 8 am section may or may not be comparable to the TR 2 pm section) and instructors who do not fit the common stereotype in a specific field may not be comparable (e.g., female instructors in male dominated fields).

10. Always read the student comments.

This is another recommendation that seems so obvious that it does not need to be stated, but it is important enough to include. The Temple SFF contains several open-ended questions that students are asked to complete. These student comments offer valuable information that cannot be provided by numerical ratings alone. There is a commonly held belief that only students with more extreme views, both positive and negative, respond to these open-ended comments. While the literature does not strongly support this belief, it is the case that the students who provide comments are the ones who are committed enough to take the time to do so. These comments often provide the most useful information for understanding the ratings.

11. Focus on the most common comments rather than emphasizing one or a few atypical ones.

This recommendation follows the one above and offers some cautions about the use of student comments to evaluate teaching. When evaluating an instructor’s teaching by reading the comments, common themes should be emphasized. It is sometimes the case in a set of comments to find a few that differ from the majority. Strongly negative comments should not be ignored, but they also should not be given more weight than the views of most students. This is particularly crucial when evaluating the ratings of non-majority faculty where this problem is more common. It is also important to understand whether comments from a pattern across courses and over time or are just a result of a single course or class dynamic.

12. Contradictory written comments are not unusual.

This is an extension of the previous recommendation, but it is less focused on negative comments. As mentioned above, the best use of student comments is to search for themes. It is not uncommon, however, to find completely contradictory perceptions in these comments: some students think the textbook is great, others hate it; some students want more group work, others want less. The fact that these contradictions exist is not necessarily a sign of poor teaching. Remember that student feedback data are perceptions, and perceptions may vary.

13. Use an instructor’s grading practices as one context for reviewing SFF data.

This recommendation focuses on one of the most controversial issues in student evaluation of teaching: the relationship between grades and evaluations. Perhaps the most common criticism of these evaluations is that faculty can “buy” good evaluations by giving high grades. The literature is very clear that grades and evaluations are positively correlated, and that ratings are affected by a student’s expected grade in a course. While the correlation is lower than many believe, it is still one of the strongest effects in the research literature. The presence of this effect is problematic. With this in mind, one suggestion is to examine an instructor’s grades when examining the instructor’s SFFs. Keeping in mind that grades in a particular course can be higher or lower than normal for very good reasons (e.g., a particular group of students is unusually unprepared for the work in the course), a pattern of very high grades, across semesters and in different courses, is something that is worth discussing.

14. Always use multiple measures to assess instruction.

This is good practice and part of holistic assessment. SFF data simply do not provide information about many elements that are highly relevant to whether someone is a good instructor. In addition, where applicable, the TUTAUP contract requires that SFF data cannot be the sole way to assess instruction. As mentioned, the Center for the Advancement of Teaching will provide assistance to any college or department to help develop additional assessment processes.

Advice to chairs and deans on how to speak to faculty about their SFFs

Planning for Teaching Discussions

Set aside uninterrupted time to discuss SFF results with all faculty members.
- It is important to talk to all faculty, not just those that you might consider problematic. It is just as important to spend time discussing what went right as it is to discuss what went awry.
Ask each faculty member to read their own SFFs before meeting with you.
- Always ask the instructor to come prepared to discuss the larger patterns of positive and negative comments that they see in the feedback.
Read the SFFs carefully.
- When evaluating an instructor’s teaching, always look for patterns of feedback and choose at most three areas of improvement and three areas of strength that you wish to focus on in the meeting.

Suggestions for Conducting the Meeting

Ask questions first.
- When you meet with a faculty member, start the discussion by asking the faculty member to share what resonated with them in the feedback. You might ask them what they think of the feedback, why students might have responded in that way to the teaching practice, and whether they are considering any changes based on the feedback. A series of questions instead of statements will lead to more reflection on the part of the faculty member and open the way for a productive discussion.

If necessary…

Discuss missed topics after the faculty member offers their viewpoint.
- After the faculty member has gone through their self-evaluation, bring up any areas of improvement or strength that you marked as areas of focus that have not been discussed.
Offer your own constructive ideas in the form of questions.
- “Do you think it would work if…?” If you are well known as a good teacher, then using your own experience can be powerful: “I have often found that if I do X, students respond well. Do you think that would work for you?” If not, phrasing more generically is best (e.g., “It has been found that when instructors do X, students respond well. Do you think that might work for you?”). Make sure to make it clear that this is a process of exploration and brainstorming ideas for improvement instead of a critique.
Develop an action plan.
- Ask the faculty member to decide on two or three concrete steps they will take to improve their teaching or the course.
Dealing with recalcitrant faculty.
- If the faculty member is resisting the idea of change and improvement, point out that it is important for every faculty member to contribute to an environment of positive engagement in order for the department (school) to continue to thrive. Remind them that there are resources (such as the Center for the Advancement of Teaching and Temple’s Institutional Diversity, Equity, Advocacy and Leadership—IDEAL) that can help them think through challenges. Note that if SFF feedback is indicative of faculty behavior that demonstrates a lack of sensitivity to the diversity of students in the class (e.g., race, ethnicity, national origin, gender, sexual identity, disability, or political viewpoint), make a more concrete plan in writing for the faculty member’s improvement, refer them to your dean’s office, and set up consultations with the CAT and/or IDEAL. You must insist that this behavior is not acceptable and must be remediated.

Institutional Research and Assessment