2012-02-15

tools of prediction accuracy of cross-country teams -- what works best?

Predicting cross-country results has depended on intensive quantitative and qualitative analysis -- looking at results and talking to folks on the ground. Rich Gonzalez of prepcaltrack.com and Watchout of dyetrack are masters of this. For those of us less socially inclined, I tested several metrics that are readily available to the casual user to see which was the most accurate (see below for a summary of results).

I analyzed the results of every team at the California state cross-country meet between 1987 & 2011 with the exception of 1991 d. 4 boys and girls (I have been unable to find those results). I set the order of teams by points, then calculated how often each variable (runner 1, 2, 3, 4, 5, 6, 7, team-time, range (1 - 5)) was exactly accurate and, when not accurate, how far off it was from the actual place. For example, if a team finished 11th by points but their number 6 runner was the 9th fastest number 6 runner in the race, that was given a score of +2 (11 - 9 = 2). I aggregated this data for each division for each year for each gender. In the end, there were 224 races.

brief discussion:

By far the most accurate predictor of exact place is team-time, which is accurate nearly 61% of the time. It also has the tightest range when it is not exact -- +/- 1.3 places. This means that if a team is predicted to finish 7th place (by team-time), there is a 61% chance that that team will finish 7th and, if it doesn't, it will typically finish between places 5.7 and 8.3 (or, to keep it real, slightly less often it will finish between 6th & 8th).

The next best metrics are the number 3 & 4 runners, both of which are accurate 32% of the time, with a range of +/- 2 places.

Surprisingly, at the bottom of the list is team range (the time difference between runners 1 & 5). This was exact only 8% of the time and had a huge range of +/- 5.6 places.

By using team-time and the times of the 3rd or 4th runners, one could be very accurate in predicting performance of a given team.

The complication, of course, is that there are variables beyond measure that affect performance. As I discovered in my first attempt to apply these results to reality (in 2011) and despite careful analysis of the ratios between state meet qualifying courses and the state meet itself, predicting team-time takes more than numbers, even in the aggregate. Nonetheless, looking at team-time and a team's number 3 or 4 runner is a rough and ready way to get a sense of a team's strength.

summary of results: 

number 1 runner: this predicts exact team placing less than 17% of the time. When it is off, it is off by a range of nearly +/- 4 places.

number 2 runner: this predicts exact team placing less than 25% of the time. When it is off, it is off by a range of over +/- 2,5 places.

number 3 runner: this predicts exact team placing just over 32% of the time. When it is off, it is off by a range of +/- 2 places.

number 4 runner: this predicts exact team placing over 32% of the time. When it is off, it is off by a range of +/- 2 places.

number 5 runner: this predicts exact team placing over 25% of the time. When it is off, it is off by a range of less than 6 places.

number 6 runner: this predicts exact team placing over 18% of the time. When it is off, it is off by a range of just over +/- 3 places.

number 7 runner: this predicts exact team placing less than 14% of the time. When it is off, it is off by a range of well over +/- 3,5 places.

team-time: this predicts exact team placing just under 61% of the time. When it is off, it is off by a range of +/- 1,3 places.

team-range (runners 1 - 5): this predicts exact team placing 8% of the time. When it is off, it is off by a range of over +/- 5,6 places.