It’s the time of year when the national curriculum SATs results are released. This never fails to prompt discussion about various aspects of testing our children and the appropriateness of the tests themselves. Test marking always generates controversy, with this year being no exception.
Points picked in a BBC article highlight concerns with marking English papers. One relates to a question where children were asked to correctly write a semi-colon in a sentence and the other about labelling main and subordinate clauses. Whilst we can argue about whether knowing the intricacies of using semi-colons makes our children better communicators, it is the marking of these questions that raises serious problems with the reliability and validity of the tests.
Reliability – the accuracy or consistency of the test – and validity – the extent to which the test measures what it is supposed to – are recognised as the cornerstones of any adequate form of measurement. In the case of large scale assessment, where many different markers are needed, obtaining adequate consistency between markers is a major challenge. Good training of markers and detailed guidance on what to accept as correct or incorrect helps, but only goes so far. Inevitably, noise or ‘error’ creeps into the system. We can never remove error completely but it should be minimised as far as possible. However, pressures of time and cost may mean that insufficient checks are put in place, with reliability suffering as a consequence.
What the current SATs more worryingly highlight are concerns over validity. Take the question about correctly punctuating a sentence. Guidance to markers was given as to exactly where the top of the semi-colon should be, the direction of the bottom comma and other features needed to be for a response to be acceptable. So, a child could have put a mark that was clearly a semi-colon and in the correct place, yet still not be considered to have answered the question correctly. Unfair? Almost certainly yes.
The key point to consider is the purpose of this question – its validity. If we want to check whether a child can correctly write a semi-colon, then ask them to do this. Give them a blank space and ask them to fill it with a semi-colon. However, if the task – as must be assumed – is to see whether a child knows where a semi-colon goes in a sentence, then remove these artificial constraints that penalise many.
The sentence children had to punctuate rightly did not include extra spaces where punctuation was missing. Quite rightly, as this would be too much of a clue to the answer. But when we put punctuation in our writing that’s just what we do, as the punctuation mark takes up space on the page. As we know, when you try and squeeze in a mark or letter you’ve missed out, the result is often not that great. Children’s handwriting also naturally differs in size, but here is constrained by the font used on the question paper. Again, we create an artificial situation which completely misses the point of the question.
These simple examples illustrate fundamental points. Test constructors need to be clear about what they want to measure and stick to this, removing all ‘construct irrelevant variance’. If this is not done, tests are unfair and biased. Tests are also less useful to teachers who might use them to inform children’s learning needs, as reasons for a child getting a question wrong are obfuscated.
High-stakes assessment will always be scrutinised and through this the limitations of measurement highlighted. Public understanding of these limitations is a necessary part of an accountable test system, but sometimes you look at tests and think they just should have been better.
Start typing your update here…