School Rankings

February 16th, 2021, 2:23 pm

It seems to me that the largest tournaments (particularly MIT) are drowning out other competitions. Nearly every team that attended MIT had by far their best performance of the year there, according to the spreadsheet, which seems suspicious. Consequently, most teams that did not attend are underrated. The problem seems to be giving tournaments different "weights" and then making team's scores for the tournament some fraction of that weight. This makes it impossible for a stellar performance at a medium-sized meet to make a big difference. Maybe something that standardizes the "weight" so that it corresponds to a given superscore?

A simple way (I'm sure you can think of a better system, this is just an example lol) of doing this would be to assign a score of 100*W/SS for a given tournament, where W is the weight of the tournament and SS is the team's superscore. A team's superscore will be higher at larger, more heavily-weighted tournaments, so it would give medium-sized tournaments a chance to make a difference while still reserving the highest scores for the largest tournaments (if team X sends equal-caliber teams to one tournament that has 60 teams and another that has 120, with similar quality of teams at each tournament, the weight of the latter would be about twice that of the former, but team X's superscore would not quite be twice as high at the larger tourament, thus making the larger tournament worth more).

This would also give an intuition for what differences in the rankings actually mean - a team with twice another's rating would be expected to score half as many points at a given tournament, assuming full stack. If New Trier has a score of 71 and Naperville North has a score of 31 (strange considering UChicago results), it is unclear how they would be expected to fare against one another in competition, other than New Trier doing better. Of course, different ranking systems are ideal for different data sets. I think the current system is generally adapted well to produce reasonable results for the top 5 teams in the nation, but beyond there it loses some of its accuracy.

My Userpage · Post by **builderguy135** » February 16th, 2021, 3:58 pm

legendaryalchemist wrote: ↑February 16th, 2021, 2:23 pm It seems to me that the largest tournaments (particularly MIT) are drowning out other competitions. Nearly every team that attended MIT had by far their best performance of the year there, according to the spreadsheet, which seems suspicious. Consequently, most teams that did not attend are underrated. The problem seems to be giving tournaments different "weights" and then making team's scores for the tournament some fraction of that weight. This makes it impossible for a stellar performance at a medium-sized meet to make a big difference. Maybe something that standardizes the "weight" so that it corresponds to a given superscore?

A simple way (I'm sure you can think of a better system, this is just an example lol) of doing this would be to assign a score of 100*W/SS for a given tournament, where W is the weight of the tournament and SS is the team's superscore. A team's superscore will be higher at larger, more heavily-weighted tournaments, so it would give medium-sized tournaments a chance to make a difference while still reserving the highest scores for the largest tournaments (if team X sends equal-caliber teams to one tournament that has 60 teams and another that has 120, with similar quality of teams at each tournament, the weight of the latter would be about twice that of the former, but team X's superscore would not quite be twice as high at the larger tourament, thus making the larger tournament worth more).

This would also give an intuition for what differences in the rankings actually mean - a team with twice another's rating would be expected to score half as many points at a given tournament, assuming full stack. If New Trier has a score of 71 and Naperville North has a score of 31 (strange considering UChicago results), it is unclear how they would be expected to fare against one another in competition, other than New Trier doing better. Of course, different ranking systems are ideal for different data sets. I think the current system is generally adapted well to produce reasonable results for the top 5 teams in the nation, but beyond there it loses some of its accuracy.

While do you bring up a very important flaw in our ranking system, your proposed system far overvalues smaller tournaments, benefiting "spamming" tournaments instead of doing only a few well. The superscore metric is also effectively useless, as different competitions' winning superscore varies extremely significantly. We've experimented with a winning score bonus for teams such as Enloe at NC in-state invitationals, but ultimately there is no consistent way to balance out how hard a team sweeped and the difficulty of a competition: 100 points at MIT is far, far more impressive than even a 30 at other competitions.

While this may not be an optional solution, it, to us, is close to the best that we can get. The only reason that we are calculating scores as the top 4 competitions of one school is that we do not want teams who can go to fewer competitions to be undervalued. Some teams will be overvalued, some undervalued; there's no perfect balance, unfortunately.

Thanks for the feedback.

Edit: so I've taken a look at your school's rankings on our spreadsheet. Unfortunately, it is definitely one of the underrated teams by our algorithm, and it seems like many of your complaints are somewhat specific to your own team – however, consider the fact that it is not the small invites that a team sweeps that tells others how good a team is in relation to others, but rather how well they perform at larger invites in which the team doesn't sweep.

February 17th, 2021, 9:56 am

builderguy135 wrote: ↑February 16th, 2021, 3:58 pm
legendaryalchemist wrote: ↑February 16th, 2021, 2:23 pm It seems to me that the largest tournaments (particularly MIT) are drowning out other competitions. Nearly every team that attended MIT had by far their best performance of the year there, according to the spreadsheet, which seems suspicious. Consequently, most teams that did not attend are underrated. The problem seems to be giving tournaments different "weights" and then making team's scores for the tournament some fraction of that weight. This makes it impossible for a stellar performance at a medium-sized meet to make a big difference. Maybe something that standardizes the "weight" so that it corresponds to a given superscore?

A simple way (I'm sure you can think of a better system, this is just an example lol) of doing this would be to assign a score of 100*W/SS for a given tournament, where W is the weight of the tournament and SS is the team's superscore. A team's superscore will be higher at larger, more heavily-weighted tournaments, so it would give medium-sized tournaments a chance to make a difference while still reserving the highest scores for the largest tournaments (if team X sends equal-caliber teams to one tournament that has 60 teams and another that has 120, with similar quality of teams at each tournament, the weight of the latter would be about twice that of the former, but team X's superscore would not quite be twice as high at the larger tourament, thus making the larger tournament worth more).

This would also give an intuition for what differences in the rankings actually mean - a team with twice another's rating would be expected to score half as many points at a given tournament, assuming full stack. If New Trier has a score of 71 and Naperville North has a score of 31 (strange considering UChicago results), it is unclear how they would be expected to fare against one another in competition, other than New Trier doing better. Of course, different ranking systems are ideal for different data sets. I think the current system is generally adapted well to produce reasonable results for the top 5 teams in the nation, but beyond there it loses some of its accuracy.
While do you bring up a very important flaw in our ranking system, your proposed system far overvalues smaller tournaments, benefiting "spamming" tournaments instead of doing only a few well. The superscore metric is also effectively useless, as different competitions' winning superscore varies extremely significantly. We've experimented with a winning score bonus for teams such as Enloe at NC in-state invitationals, but ultimately there is no consistent way to balance out how hard a team sweeped and the difficulty of a competition: 100 points at MIT is far, far more impressive than even a 30 at other competitions.

While this may not be an optional solution, it, to us, is close to the best that we can get. The only reason that we are calculating scores as the top 4 competitions of one school is that we do not want teams who can go to fewer competitions to be undervalued. Some teams will be overvalued, some undervalued; there's no perfect balance, unfortunately.

Thanks for the feedback.

Edit: so I've taken a look at your school's rankings on our spreadsheet. Unfortunately, it is definitely one of the underrated teams by our algorithm, and it seems like many of your complaints are somewhat specific to your own team – however, consider the fact that it is not the small invites that a team sweeps that tells others how good a team is in relation to others, but rather how well they perform at larger invites in which the team doesn't sweep.

My proposed system was just an example - you could certainly improve upon it. My main point is that when the rankings become a reflection of only one tournament (or a select few as other mega-invites come in), it kind of defeats the purpose of having rankings. Of course, the largest tournaments are the best indicators, but MIT is weighted so heavily that taking 25th there is worth more than winning SOLVI (a very competitive tournament in its own right).

I would like to thank you guys for going through all the effort to put this together though. I can imagine how much time this takes to assemble all of these results. I am not trying to belittle that - I'm just giving suggestions that I believe would enhance the accuracy of these rankings for the majority of schools.

Post by **EwwPhysics** » February 17th, 2021, 3:10 pm

I agree that the weighting seems to be a bit off. For example, I don't understand why our 41st and 46th at MIT were worth nearly twice as much as our 6th and 9th at SOAPS. Also our single unstacked team at Duke got us more points than our two unstacked teams at soaps.

Regardless, nothing is perfect and I like having an easy way to get a rough estimate of the strength of teams

jaymaron · Post by **jaymaron** » June 15th, 2022, 6:36 am

Strong teams:

Strong states:

The number of points a team scores in a competition is a function of rank.
A natural ranking function is:

Points = -log_2(Rank/Teams)

Where "Teams" is the number of teams. If teams = 64, it corresponds to the number of games won in a single-elimination tournament.

Rank Points
1 6
2 5
4 4
8 3
16 2
32 1
64 0

Expanded discussion: https://jaymaron.com/scioly.html#score

The Science Olympiad scoring function is goofy. Ideally, the score function should be a straight line in the plot.
Good functions include Formula-1, Indy racing, and World Cup Skiing.

jaymaron · Post by **jaymaron** » June 22nd, 2022, 10:04 am

A handful of states dominate the invitationals.

The strongest invitationals are the National Invitational, MIT, Mason, Golden Gate, and BirdSO. Data for BearSO and BadgerSO are not available.

Many highly-rated teams don't make nationals. Nationals placements can be decided by both state championships and by ratings from invitationals. Take one team from each state, and take N more teams, decided by ratings. States of death include California, New Jersey, Washington, Ohio, Illinois, Michigan, Massachusetts, and Wisconsin.

Ratings algorithm:

A team's rating is a sum of scores from invitationals. The score from an invitational is

Score = -log_2(Rank/Teams)

Where "Rank" is the rank of the team in the invitational, and "Teams" is the
number of teams at the invitational. We set Teams to 64 for all invitationals.

Rank=1 -> Score=6
Rank=2 -> Score=5
Rank=4 -> Score=4
Rank=64 -> Score=0

A proper ratings algorithm requires rating both teams and invitationals simultaneously.

The rating of an invitational is the sum of the ratings of the teams that participate.

We can extend the algorithm to include ratings of invitationals. Then a team's rating is

Team rating = Sum over invitationals [ Score + log_2(E/Emax) ]

Where Emax is the rating of the top invitational. Terms in the sum have a floor of zero. If a term is less than zero, it's set to zero.

We use a team's best 5 results from invitationals. Most heavyweight teams attend at least 5 invitationals.

The ratings can be calculated with convergence. Initialize all invitational
ratings to zero and use them to calculate team ratings. Then use the team
ratings to calculate a new set of event ratings. Repeat until convergence.

Expanded discussion of the algorithm: https://jaymaron.com/scioly.html#score

There are natural reasons for why the score function is Score=-log_2(Rank/Teams).
In the plot, this function is a straight line. It weights results logarithmically.

The algorithm is expanded on in https://jaymaron.com/scioly.html

Scioly.org

School Rankings

Re: School Rankings

Re: School Rankings

Re: School Rankings

Re: School Rankings

Re: School Rankings

Re: School Rankings

Who is online

Connect

Learn

Get Involved

About

Disclaimer