In the course of their career, most educators will use a range of evaluation and testing methods.

They might also use more than one way to measure test results. One of the main ways to do this is via the criterion-referenced test, but how does it work exactly? And how is it different from other methods of measuring results, such as norm-referenced (bell curve) testing?

A criterion-referenced test assesses a person’s knowledge, ability, or skills against a predetermined standard. This means that in a classroom situation, each individual’s test results are measured against the set standard and not against the performance of other students in the class (or other students throughout a city, region, or entire country). In other words, the performance of the rest of the group taking the test will not affect the individual’s end mark or grade.

Education resources

-20% Off

How do criterion-referenced tests work?

This assessment measures student performance against a clearly defined set of criteria or standards. These might include statements of what students should know at specific ages or learning stages, otherwise known as ‘learning outcomes’.

Criterion-referenced tests use what is known as ‘cut scores’ to assess whether students have passed a test — in other words, achieved the desired learning outcomes. In some cases, this assessment method also places results into tiered categories of achievement, for instance, ‘A’ grades or scores of ‘Excellent’.

Alternatively, criterion-referenced tests can also be used before courses begin to assess a student’s level and place them in the appropriate group to their abilities (for instance, ‘Basic’, ‘Intermediate’, or ‘Advanced’). Again, these group allocations are based on the cut scores and not how the student compares to the rest of the class.

This means that, theoretically, every student could fail a test in a single classroom, or every student could get an A. This is because everything is measured by the cut score and the performance of the individual, not the group. In other words, it is not about relative bell curve measurements.

But how are these cut scores established? Depending on the importance of the test, they might be defined by a single educator or an entire committee of academic experts. Either way, they will decide how the test should be implemented (e.g., the style of assessment and specific questions) and the cut scores. For instance, what are the percentage criteria for a pass? And what percentage is needed for an A, B, C, or D, respectively?

In other words, criterion-referenced tests don’t have one universal standard — for instance, a standard agreement that anything above 60% is always a B-grade. Instead, it is down to the individual educator or committee to decide on this, meaning that cut scores might vary widely.

Assessments can also be given different measurement criteria, for instance, letters, e.g., A to E, numbers, e.g., 1 to 5, or categories, e.g., ‘excellent/good/satisfactory/unsatisfactory’. Sometimes, there may be a straightforward cut score to determine either a pass or fail without the scores being broken down into tiered achievement categories.

What is the format of a criterion-referenced test?

This type of assessment can be implemented in many different ways, including:

What can criterion-referenced tests assess?

This kind of assessment can have many different purposes, including:

  • To assess the skills or knowledge of students at the end of a unit, module, or course.
  • To assess the skills or knowledge of students at the beginning of a course to place them into appropriate groups or tiers.
  • To identify any learning challenges or gaps in the knowledge of individual students.
  • To evaluate students’ progress with unique or different needs to identify whether they need additional support.
  • To evaluate the performance of teachers, lecturers, or trainers.
  • To evaluate the performance of a particular school or institution.
  • To measure the knowledge or abilities of students in a particular regional area, for instance, city, county, or country.
  • To evaluate the effectiveness of a unit, course, program, workshop, skills training, or any other kind of learning format.
  • As criterion-referenced testing can be used in an ongoing way to evaluate the effectiveness of a course, this allows educators to make adjustments to their methodology if needed.

Criterion-referenced tests can also be used in high and low-stakes evaluations, from casual class quizzes to end-of-year exams.

What is a norm-referenced (bell curve) test?

Criterion-referenced tests differ from norm-referenced tests because the latter is based on how each student performs to their peers.

In other words, norm-referenced tests are designed to rank individuals on a bell curve, meaning that when their scores are plotted out on a graph, they will acquire a bell shape. This kind of graph result is achieved when a small percentage of students get low scores, a small percentage get high scores, and the majority get average scores.

So, for instance, if most students in a class get a low score, then the criteria will be adjusted to bring some of those scores into the average range instead. The goal is to achieve a bell curve result, with the view being that if this is not achieved, then it means that the test was not devised correctly in some way. For instance, it was too easy, too complicated, or in some way unsuitable for that particular group.

Pros of criterion-referenced tests

Here are some key arguments in support of this type of testing:

Criterion-referenced tests are arguably fairer than norm-referenced tests, as they are not relative to the particular class or group and are designed along a consistent set of standards. So there is no chance of an individual’s grades being unfairly distorted by, for example, a few wealthier students in the class whose parents can afford to give them private tuition.

Related to the above, this method is a better way to measure the actual progress of individual learners concretely, as the results aren’t ‘muddied’ by the performance of others in the group. As it applies the exact learning expectations to everyone, this test can encourage students from disadvantaged backgrounds to achieve more. Conversely, if students within a disadvantaged group have to achieve less to get an ‘A’ (as would be the case with a norm-referenced bell curve test), then it is argued that this would not push them to achieve their full potential.

Cons of criterion-referenced tests

However, this type of assessment is not without its critics. Some of the key concerns include the following:

Criterion-referenced tests are only as far as the learning standards that they are based on. For instance, if a committee devises a set of faulty cut grades that are either too strict or too easy, then the test has not accurately measured knowledge or skills. In the end, there can be a subjective element to working out pass scores and signs of proficiency. After all, committees are made up of human beings subject to error, bias, and misjudgment.

This testing system is subject to ‘fudging’ or manipulation of results — or even outright corruption. For instance, schools or entire districts might tamper with criteria for cut grades in assessments that aren’t nationally standardized. This is so they can avoid developing a bad reputation, attracting negative media coverage, or even losing funding. Also, when an individual’s job might be at stake due to poor test results — whether a teacher or a school principal — this could also encourage tampering or corruption.

Criterion-referenced testing can be time and labor-intensive, as well as expensive. For instance, keeping them up to date might require the input of expert committees, which is not a small undertaking.

Pros of norm-referenced tests

As we have seen, the main alternative to the criterion-referenced test is norm-referenced testing. Here are some advantages of the latter:

It is a valuable way of measuring an individual’s performance in a specific group. This is sometimes necessary for educators, mainly if their group is ‘outside of the average’ in some way, for instance, from an underprivileged or highly privileged area.

This method is a good way of gathering normative data across more prominent groups, for instance, entire states or countries. This can be important in educational research, policy-making and funding allocation.

Norm-referenced tests do not cause students from disadvantaged groups to feel discouraged, as they are not being measured by pre-existing criteria that might be unfair to them in some way. Instead, they are being measured against a group of peers in similar circumstances, which could create a more level playing field.

Cons of norm-referenced tests

As with criterion-referenced tests, norm-referenced assessments have also gathered criticisms. Here are a few of the main ones:

Who defines the ‘norms’ in a norm-referenced test? And what happens when these aren’t relevant to some groups? For instance, if test results are based on a national bell curve (rather than a local or class-based one), then a too generalized — or unfair or biased — set of norms might be applied.

A norm is not the same as a standard — in other words, this kind of assessment does not have set criteria to measure performance. It compares test takers to their particular group and bases scores on this. However, this does not measure whether the test taker has an adequate level of skill or knowledge in the subject area, nor does it measure who is truly excelling. The results are not concrete.

Norm-referenced testing can upset or anger students. This is because they might perceive bell curve grading as ‘unfair’ if the method makes their grade lower than what it might have been in a criterion-referenced assessment.

What other elements should you consider when devising assessments?

Choosing between criterion and norm-referenced tests isn’t the only thing to reflect upon when devising a test. Here are a few other factors to consider:

1. What are the learning outcomes?

A test should be based on the key learning outcomes of a module, unit, or course. Usually, these outcomes are based on deciding what the reasonable — or required — knowledge or skills expectations should be upon course completion.

2. What method of assessment is best?

Multiple choice? Practical exercises? Open-ended questions? Deciding on a suitable assessment method can depend on the specific knowledge you try to cultivate in a class or group. For instance, do you want them to have a general understanding of a subject and be able to recall concrete facts? Or is the aim to develop abstract, critical, or imaginative thinking?

Or should they be able to perform a practical set of skills better evaluated via, say, roleplay scenarios rather than a written test?

3. What are your grading criteria?

How high are the stakes of this assessment? And will you be assessed for a simple pass or fail or placing results into categories of excellence? Also, what will the passing grade be? 50%? 60%? 70%? Again, this all depends on the purpose and importance of the assessment itself. For instance, if you are running a job training course where 50% recall of a subject wouldn’t cut it within the actual role, then you may want to raise the pass mark to higher than this.

Or, if you want to encourage students to strive for excellence, you might want to devise an A to D grading system instead of simple pass-fail criteria.

4. Is there a cultural bias within your assessment?

For instance, does your test paper use framing, concepts, or terminology that students with English as a second language might struggle to understand? Or are some of the questions based on the cultural contexts and norms of a particular social class or ethnic group?

As cultural biases are a form of blindspot, they can fly under the radar within testing. That is why it is crucial to be vigilant with them.

A summary of criterion-referenced testing

Here are some of the key takeaways about this method of assessment:

  • A criterion-referenced test assesses a person’s knowledge, ability, or skills against predetermined criteria or standards.
  • Depending on the test’s high stakes, these criteria or standards might be defined by an expert committee.
  • Criterion-referenced testing can be used for any assessment, from casual class quizzes to end-of-year exams.
  • Unlike a norm-referenced (bell curve) test, criterion-referenced testing does not measure an individual’s performance against other students in a class or group.
  • Criterion-referenced tests use cut scores to place student results into categories. These cut scores are not universal or uniform. It depends on the individual educator or testing body.
  • They can categorize results in different ways. For instance, through lettered or numbered grading, scores are defined as ‘excellent/good/satisfactory/unsatisfactory’ or simple pass/fail results.
  • They can come in various formats, including multiple-choice, practical exercises, or open-ended questions.
  • They can evaluate the skills and knowledge of a student before a course begins to place them in a group appropriate to their level.
  • They can assess the abilities of individual educators. For instance, did they effectively deliver the key set of learning outcomes?
  • They can be used to identify learning difficulties or challenges in individual students.
  • They can be used to assess regional performance, for instance, across a city, state, or entire country.
  • They are arguably fairer than norm-referenced tests as they are based on set criteria and not a relativistic bell curve model.
  • They can measure the performance of individuals in a more concrete way than a norm-referenced test.
  • They can encourage students from disadvantaged backgrounds to achieve more in contrast to the norm-referenced model, which might place lower expectations on certain groups.
  • This kind of assessment could also be considered unfair, as it might cause classes in disadvantaged areas to get lower overall grades. This can also have the effect of demotivating students.

Hopefully, you now understand criterion-referenced tests better, including the pros and cons of norm-referenced tests.

Of course, no testing method is perfect, but if you are devising your own, the critical thing to bear is that context is key. This can help you decide on the best evaluation method to meet the needs of your class and course overall.