NAPLAN
Filling a paper bag with dog poop, lighting it on fire, placing it on someone's doorstep, knocking and running away is a kinder act than expecting schools to conduct meaningful analysis of NAPLAN data
I hate NAPLAN. So that card’s on the table. It’s such a poor quality assessment and it’s usefulness is next to nothing. It’s not that there’s no value in it at all. But, as I want to show, gleaning that valuable information is such hard work, and its value is questionable enough anyway, that it’s difficult to understand why we’re doing it at all.
When people who want to use NAPLAN to make value judgements talk about it, they say things like ‘I know NAPLAN has its problems, but…’ What they mean, from my experience, is that criticisms aren’t welcome. We’re just going with what I have to say next.
I’m not on board.
NAPLAN has too many problems.
The Results
When schools first get their NAPLAN results, they come from ACARA. You get an awful lot of information and, helpfully, it’s all easy to download in csv format so it can be opened in Excel and imported into anything you like to use for visualisation. I don’t know if ACARA actually gives schools students’ final summary score for each domain, but everything else is there.
Adaptive Tests
There are 5 domains:
Reading
Writing
Spelling
Grammar and Punctuation
Numeracy
Writing is its own thing, so let’s put that aside for now.
The other tests are adaptive. That is, the test students sit depends on how they achieved early on in the test, so students aren’t just getting a bunch of questions that are too hard or too easy. It’s a goldilocks test where everyone gets questions that are just right.
Here’s what ACARA says the branching process looks like for Reading and Numeracy:
And here’s what it looks like for Spelling and Grammar and Punctuation
source: https://www.nap.edu.au/naplan/understanding-online-assessment/tailored-tests
This is too few branching points to be useful. An adaptive test would use each question asked to inform the next best question to ask based on a growing amount of evidence. The number of questions asked would not be fixed, it would stop once a threshold of confidence was reached that a student’s level of achievement was accurately determined. As it stands, this kind of adaptive test is more discriminatory than helpful, I think.
Everyone participates in testlet A. Performance there informs whether the upper, middle or lower track is given, then there is one more decision point where a student can stay on the same track or move one up or down. Students who find themselves on the lowest track after the first decision point are stuck there.
Adaptive tests seem helpful. But there are issues with this.
One huge one for analysing results is that not every student sits the same test.
A Multitude of Assessments
When students in my class sit the Reading assessment for NAPLAN, they sit one of seven assessments. But that’s just the testlets. I’ve not yet seen any questions where an entire year group sits the one question. So within a single testlet, a student will be presented with a range of questions within a range of perceived difficulty, but they are unlikely to get the same series of questions as another student in their class sitting the same testlet.
This mightn’t be so bad, but NAPLAN questions are no longer able to be seen. I assume this is because they want to use the same questions year after year. The upshot of this is that schools have no idea what questions their students were asked. There are exemplar questions, but that’s not the same thing.
In short:
Everyone sits a different test
No one knows which questions a student was asked
If I don’t know what an actual question was, how can I know what went wrong in that question? It’s just left to speculation. It might be informed speculation because I know the students. But speculation that is more blind than it needs to be.
Just give schools the bloody questions.
Multiple Choice - Favouring Uncertainty
Another difficulty for schools with NAPLAN is its dependence on multiple choice. It seems that most multiple choice questions have four options. There are challenges with this:
Multiple choice takes away the requirement for a student to write an answer. Written answers are windows into processes and understanding. Multiple choice closes those windows.
The error rate of multiple choice as knowing whether a student got an answer right or wrong is up to 25%. That’s just too much. Even worse, when a student understands a question and can eliminate the most obviously incorrect items, there are often two viable options left. When a student gets to this stage and they’re unsure, they’re actually incorrect. A correct guess is not a demonstration of understanding. In this instance each student has a 50% chance of getting the mark and misidentifying themselves as knowing the answer.
Since you don’t have the actual questions, there is absolutely no way to understand whether even a correct response to multiple choice is demonstrating understanding or demonstrating luck.
If there were 10 questions on a single area of content this wouldn’t matter as much. You can start to account for that kind of error. But in NAPLAN, as I’ll explain below, students very rarely get more than one question in any assessment relating to a single skill or area of content knowledge.
So a student gets one chance with one multiple choice question to demonstrate a level of understanding of a topic, concept or skill. To say this is inadequate is an understatement.
When it comes to multiple choice questions in the context of NAPLAN, the quality of the quantitative output is significantly diminished.
Too Few Questions
NAPLAN tests are far too short.
In Year 3 reading, a student will be asked 39 questions. Each question is linked to a descriptor. A descriptor may be something like “Identifies the main purpose of a persuasive text” or “Analyses the effect of figurative language in an imaginative text”. Sometimes they’re much more specific, such as “Locates directly stated information in an informative text”.
39 questions to assess ‘Reading’ is too few. Reading is complex. The skills involved are many and varied. Any student deserves to have multiple opportunities to demonstrate areas of skill and ability. I understand why the tests are short. These are children. No one is interested in torturing them with tests that are hundreds of questions long. But that doesn’t excuse a bad assessment. If the required length of the test is too short to be able to adequately assess all of ‘Reading’, then don’t try to assess all of ‘Reading’. Be selective and deliberate about assessing just a few aspects of reading, that are indicative of reading knowledge and skills. Then acknowledge that what we have is an indication, not a comprehensive or determinitive assessment.
Too Many Results
This is huge and something I’m not sure many people appreciate. I’m certain that bureaucrats and politicians don’t understand this. And if they did, they wouldn’t care. Deep understanding of complex issues is not something they have a great history of seeking or caring about.
Let’s take a look at a Year 7 cohort of 250 students. This is a big school.
There’s just too much there and it’s terribly inconsistent.
Some descriptors are so similar as to be difficult to understand the difference. Here are three in Year 7 Reading.
Interprets a pronoun reference in a text
Interprets a pronoun reference in a persuasive text
Interprets a pronoun reference in an imaginative text
They’re asked different numbers of times to different students. Some students got two of these, others didn’t. The persuasive text question was asked at two different levels of difficulty. All these descriptors were identified as being at different levels of difficulty with one of them at two levels of difficulty.
So if we ask
“How well do students at our school interpret a pronoun reference in a text?” you have four incomplete data points to look at.
And it’s reported back to you in a binary nature. Just correct or incorrect. And with those questions of differing difficulty, there’s no way to know what that looks like or means, because you can’t look at the actual questions that were asked.
Another issue with this is the same dexcriptor being assessed at different levels of difficulty. In this school, one descriptor was assessed 566 times.
That descriptor was assessed at six different levels of difficulty.
360, 370, 385, 406, 447, 466
So how does the cohort achieve on the descriptor? Well, it depends on which level of difficulty question we’re talking about and it also depends on which students were asked that question. Some students have thisdescriptor assessed three times, twice or once and for some students not at all. There’s no easily discernible pattern to this. It seems to be luck of the draw.
It just makes no sense to do this to schools. Why are the decisions around this so poor?
A Better Way
There is a better way. NAPLAN should seek to build a representative understanding of current levels of achievement in literacy and numeracy at the state and federal levels. Students should sit similarly short tests with a much narrower focus that changes each year. That focus is representative of achievement, it doesn’t encompass the entirety of it.
But that’s not on the table.
NAPLAN is terrible. Right now teachers are the targets of education systems and government ministers. All education’s woes are being placed on teachers and all the solutions lie in ‘fixing’ the teachers.
But all the while we have a system that can’t even build one effective assessment. If you want to know what’s wrong with education, look to NAPLAN. It’s indicative of a whole host of problems, not least of which is a lack of leadership and direction.
NAPLAN sux.
Completely agree. Complete waste of valuable time that the statisticians at the department don't believe has any merit either. NAPLAN data not only doesn't correlate with any other performance metric such as HSC performance, it doesn't even correlate with NAPLAN data according to my source in that department.
As a parent of a third grader, the NAPLAN results agree very well with our observations of their numeracy, reading and writing. Tick from me as a parent. But, as a teacher with a maths degree and the job of disseminating high school NAPLAN data to feed back to our leaders of learning, I am finding it impossible to draw specific conclusions from the data these choose your own adventure tests generate.