All things excellent are as difficult as they are rare.


14 September 2011

Value-Added Analysis and the Really Real World

The Wall Street Journal just claimed that value-added measures of teacher quality are utterly useless.

OK, the Wall Street Journal didn't quite say that. They didn't even really mean to say that, and would probably deny it if you asked them. But if what they say is true, then they pretty much delivered what I think is a death blow against value-added analysis.

If I were running a wrongful termination lawsuit on behalf of a teacher who was dismissed on the basis of value added metrics, I'd send the WSJ a Godiva gift box for telling me what me strategy should be.

Here's what they say, in an article that I found through a link at Joanne's site:

For the first time this year, teachers in Rhode Island and Florida will see their evaluations linked to the complex metric. Louisiana and New Jersey will pilot the formulas this year and roll them out next school year. At least a dozen other states and school districts will spend the year finalizing their teacher-rating formulas.

* * * *

Janice Poda, strategic-initiatives director for the Council of Chief State School Officers, said education officials are trying to make sense of the complicated models. "States have to trust the vendor is designing a system that is fair and, right now, a lot of the state officials simply don't have the information they need," she said.

* * * *

For states and school districts, deciding which vendor to use is critical. The metrics differ in substantial ways and those distinctions can have a significant influence on whether a teacher is rated superior or subpar.

(Emphasis added.)

Let us assume that we live in the really real world and that, in addition to its being the case that there ain't no coming back, whether a teacher is good or bad is a matter of actual fact. Let us further assume that we have just such a teacher.
Let us also assume that we have just two metrics -- one of which tells us that our teacher is "superior" and the other of which tells us that the teacher is "subpar".

How are we to tell which metric is right? They can't both be right. ("You are also right!")

Well, obviously, we can look at the teacher's teaching with our own eyes and tell if she's doing a good job. We can then go back, look at our metrics, and know which one of them is getting it right in this case.

But people are using these (untested) metrics to attempt to determine who the good and bad teachers are in the first place. In other words, the way to prove the metric's efficacy is to... consult the metric.

You can't do that. You have to test these metrics first, scientifically, to determine their efficacy in determining teacher effectiveness. Because teacher effectiveness isn't what they measure: they measure student test scores and how they compare against statistical projections. That the test scores themselves are imperfect proxies for student learning (I've tanked more than one test in my life just out of spite) only compounds the problem.

I hate to sound like a stick in the mud, but if you want test scores to serve as a PROXY for (rather than merely as a definition of) good teaching, then you have to take a group of recognizably good teachers, and a group of recognizably bad teachers, and run the various metrics across the two groups to see if the results bear any semblance to reality. So who wants to volunteer their kids to be in the recognizably bad classrooms for this experiment? (Would that even be legal?)

Districts aren't doing this, as far as I can tell. And neither are the statisticians. They're just concerned with the data -- and the data aren't about teachers, but about student test scores. Instead, what we get is this paragon of scientific precision:

Principal Gregory Hodge of New York's Frederick Douglass Academy said data for teachers generally aligns with his classroom observations. Mr. Hodge said the data for teachers generally aligns with his classroom observations. "It's confirming what an experienced principal knows," he said.

I'm less than impressed. If we're going to trust the metric because it "generally aligns" with principal observations, and principals are going to make those observations anyway, why not just use the principal observations in the first place and save districts a significant amount of money?

I'd feel much better about this if all the various metrics agreed with each other, and if they were all "just confirming what the principal knows". But I am informed by the WSJ, and on that basis believe, that they don't.

Two different metrics that say different things cannot both be confirming what every principal knows. This is the really real world.

Of course, it's entirely possible I'm just overreacting to a single sentence, and that the WSJ just has its facts wrong. Maybe all the metrics produce identical results.




Let's count that as my joke of the day.

No comments: