Reliability Concerns for Classroom Summative Assessment

As Jim Popham has so eloquently stated, “Validity and reliability are the meat and potatoes of the measurement game” (Popham, 2006, p. 100). They are what every psychometrician AND teacher need to know and understand. When psychometricians build large scale tests for state departments of education, there’s a list of validity and reliability concerns that they need to address. What about when teachers build tests for the classroom? Should they be concerned about the same validity and reliability issues? Or are the concerns different for classroom assessment?

Let’s address reliability concerns now (and return to validity in another post). First, let’s define reliability. Reliability is the degree to which students’ results remain consistent over time or over replications of an assessment procedure. An important point to remember is that reliability is a necessary, but insufficient, condition for valid score-based inferences. That is, you cannot make valid inferences from a student’s test score unless the test is reliable.

So, what are the reliability concerns for classroom assessment? Nitko and Brookhart (2011) lay this out brilliantly in Chapter 4 of their book Educational Assessment of Students. (Short aside: This is a great reference book for both psychometricians and teachers to have on their bookshelves. I actually use this book in a class I’m teaching on Assessment Theory and Practice in the Educational Leadership program at Saint Mary’s University.) Here’s a summary of the reliability concerns and what you can do to address them from Nitko and Brookhart (2011).

1. For all assessments: consistency within student (not that they always do the same, but that they consistently try to show what they know).

To increase reliability, you should:

  • Encourage students to perform their best
  • Match the assessment difficulty to the students’ ability levels
  • Have scoring criteria that are available and well understood by students before they start the assignment

2. For objective assessments like multiple choice tests: consistent performance from item to item.

To increase reliability, you should:

  • Have enough items
  • Allow enough time for students to complete the test

3. For papers, essays, and projects: accuracy of rater judgment and consistency across forms (prompts or assignments).

To increase reliability, you should:

  • Have clear enough directions for students that all are likely to produce work you can score
  • Have a systematic scoring procedure
  • Have multiple markers (scorers) when possible

Remember, as educators, part of our job is to assess what our students know and don’t know (so that we can help them learn the things that they don’t know). We can’t really know our students if we don’t assess them through reliable procedures or instruments. So teachers, how reliable are the inferences you’re making about your students based the scores from your classroom assessments?

In the next post, I’ll summarize reliability concerns for formative classroom assessment.


Nitko, A. J. & Brookhart, S. M. (2011). Educational Assessment of Students (6th Edition).Boston, MA: Pearson.

Popham, W. J. (2006). Assessment for Educational Leaders. Boston, MA: Pearson.

Posted on


Custom Wordpress Website created by Wizzy Wig Web Design, Minneapolis MN