OPINION: Social Media Outrage About Scoring Often Misses the Point

It’s Time to Recognize That Leotard Bias Isn’t So Simple

With the advent of every college gymnastics season comes a new wave of debates and outrage about scoring. It’s ubiquitous, it’s often rooted in fact and most of the time, it’s even sort of fun to fight over. But over the last two years or so, I’ve become concerned that when we’re thinking about scores, we fall into tropes and rehearsed patterns of thought too easily rather than taking time to consider the facts of each situation and routine.

A few weeks ago, some fans became upset about a bars routine at a high-ranked team that scored a 9.675 despite a stuck landing. The routine had persistent body shape issues, as well as form and amplitude problems on both of its major skills, and after it started making the rounds on Twitter, many fans insisted that it would score higher at an even more prominent team. I can, as I did at the time, argue that there’s no great injustice here for various reasons, including that she’s typically the lowest score in her own lineup (which she makes based on consistency rather than technical superiority) and that the deductions are predictable, based on the construction of the routine, which would likely be changed immediately at a top bars team.

But the argument ended quickly once several recent college gymnastics alumni chimed in, swiftly followed by the athlete’s coach. The consensus seemed to be that those people couldn’t all be wrong, so of course the situation was exactly as everyone had always thought: a straightforward case of leotard bias. 

Now, I could absolutely be wrong. I like to think I’m quite technically literate, but we all have our blind spots and biases and I’m certainly no judge. Maybe that athlete could transfer to Oklahoma this second and score a 9.950 in the anchor position. What I raised an eyebrow at more than the debate itself was how it ended. It seemed like I was stuck making an unwinnable argument: Everyone knows leotard bias exists, was I delusional in saying it didn’t? If it made so many famous people happy to think this routine was worth a 9.850-plus, it must be!

Around the same time, a UCLA gymnast had an error catching her Pak salto, avoided falling and continued her routine successfully—and ultimately scored a 8.950. Within the first month of this season alone, multiple other top-tier bars gymnasts have made exactly the same error and scored above 9.600. 

Of course, there was no conspiracy against the UCLA athlete. No one would argue that she was treated more harshly because of her team or because judges had low expectations for her routine. And yet, I was totally bewildered by the lack of response to or even curiosity about the discrepancy, even though plenty of fans were watching that meet and plenty of others were aware of at least one comparable missed routine. Was it because the UCLA athlete’s score was dropped and didn’t affect the team total? Was it simply drowned out by the bigger issues to come on beam? Or did no one want to flag it and risk sounding like a shallow big-team superfan because it doesn’t fit our prevailing narrative for how judges get it wrong?

This flares up constantly. In one early week meet in which a top team was in danger of being upset at home by an outperforming opponent, a beam score from the road team was quietly changed. Plenty of people were quick to assume that there was something untoward going on to assure the victory to the home team. In fact, there was a straightforward penalty taken for incorrect use of equipment—no foul play.

While the “leotard bias” narrative undoubtedly is based on fact and massively affects the careers of athletes lower-ranked teams, it’s time for us as fans and gymnastics writers to admit it’s become a crutch in the way we understand and think about scores. In many cases, it’s a truism that stands in the way of a more complex, but more complete, understanding of both the technique of our sport and how scores are really produced.

When someone says, “Show me the deduction in this routine!” they so rarely really want to be shown. When someone throws around platitudes like “Slap an Oklahoma leotard on her and she’d score…”, how can you disagree without coming across as though you’re insulting the athlete or discrediting the struggles of lower-ranked teams?

In fact, this mental exercise of transplanting a routine in its entirety into another leotard is as ubiquitous as it is useless. Context matters, and the fact is that if that gymnast whose 9.700 you’re outraged over had attended, say, Florida, she would have been coached differently, her skill selection and routine construction would have been different, her attention would have been focused in different places and at the end of the day there is a very good chance she wouldn’t have made the lineup. And even if you jump through all the ridiculous hoops it takes to imagine such a thing, it’s certainly not the case that a new or backup routine at a top team is guaranteed to score 9.900, as many people are all too happy to pretend for the sake of outrage.

Any judged sport has a general consensus about which athletes are favored by the institutions, and fans of any judged sport can bicker endlessly about who does and doesn’t benefit. But I feel that the tenor of these discussions in college gymnastics has changed over the last few years, and it’s becoming increasingly difficult to have a balanced conversation about the issue.

For one, the increased presence of recent alumni sharing their opinions on scoring—most of which amount to advocating top-team scoring for teams not far behind in rankings—has put an odd slant on the issue. Of course, it’s their right to have opinions on their own sport, and one can understand that after four years of following stringent social media rules, they’re excited to share their side. But responsible fans should keep in mind that they bring their own biases to the table. Fans tend to flock to support these kinds of statements for various reasons, but there’s no reason to assume they are more correct than any technically competent fan—and thinking otherwise can lead to arguments over nothing and even the spread of misinformation.

And ultimately, what do we want? 

We say we want fairness, but we can’t agree on what that means. (Everyone knows a superfan of a team just outside of the top tier who constantly claims underscoring even when the team is scoring very well and thinks that routines with obviously visible errors should score 10s, right?) 

We say we want accountability, but when we hear from a judge who gets a bit defensive and reminds us that the job is difficult, we pretty much go, “Aww, all right then.” We assume that any system that would allow backup and recourse for judges during meets would be too complicated and resource-intensive for a sport where many teams can’t even post scores online in real time, so we don’t really bother to think about it. Even though other judged sports have done it successfully, we’re not even ready to talk about open scoresheets and public access to which deductions were taken.

We say we want moderation and more critical judging across the board, but it’s telling to me that both of the greatest scoring outrages of the year have been regarding athletes from top-tier teams who received 9.975s for routines that many felt deserved a 10. As angry as we’ll get when obvious mistakes go high, a tiny mistake incurring a tiny deduction is something we struggle to handle too—and it means that if there is a grand-scale increase in the application of execution deductions, we more than likely won’t enjoy it at all.

At the end of the day, as endemic as judging issues are in this sport, it’s reassuring to think that everything boils down to something so one-dimensional as judges liking top teams better, because it’s simple and easily packaged and it allows us to feel that our frustration is righteous. The truth is that nothing about this is simple.

The truth is that some deductions, by common consensus, are virtually never taken. The truth is that when judges see an unusual skill, there’s a good chance they won’t deduct on it at all, and we’re mostly fine with that. (Ruby Harrold’s 0.2 body position break every time she catches her Zuchold says hello.) The truth is that the way that judges are assigned to meets and what they think about what they see is much more complicated than we often think. The truth is that virtually everyone in the sport has gotten undeservedly high scores as well as undeservedly low ones compared to the rest of the meet. The truth is that unless changes are made to judging that run much deeper than some extra education hours and passive aggressive emails, all these complications might change shape but they’ll never go away.

And the truth is, deep down, we all love talking about the messiness of judging. So let’s lean in, ditch the cliches, be honest about our partisanship and have arguments that are both more interesting and more realistic.

Article by Rebecca Scally

Like what you see? Consider donating to support our efforts throughout the year!


  1. Agree 100%. It’s too easy to reduce it down to a leotard and we need to maintain a higher level of thinking about it.

  2. I really enjoyed this article. I actually just commented on the scoring at the Best of Utah meet at work and I found myself going down the tried and true well it’s early in the meet/lineup of course the scores are low cliche. Then on the Road to Nationals website I looked at the judges and low and behold there was a Beam judge who is a Level 10 and one who is a National judge. Spencer’s article on WTF is Beam scoring sounded in my head about judges in NCAA not actually following the L10 COP. I have a feeling that the majority of the back and forth on that event was due to that. It could also be that it’s early in the season, literally their first meet, and perhaps that email/course we recently heard about actually resonated with the judges and they started taking the deductions they should be taking.

    All of that said, as a HUGE Utah fan and seeing their preview live, watching this event from a bird’s eye view in the upper bowl(beam/vault side), I was disappointed in the scoring. I could see from a unique angle sure, and I had a toddler distracting me, but when your 1st three were literally just competing in the OLYMPICS or Olympic trials, and throw up stunning routines with uniqueness and added difficulty I’m muttering or maybe screaming WTF at the judges. I know Grace had a bobble, so I’m not upset over her score. Kara and Amelie seemed flawless to me though, and I re-watched most of the meet at home later to see what exactly I missed. I didn’t see anything obvious in either of their routines that would knock the score down that much. They all had 9.875s and somehow Cristal was the highest score at 9.9 with both Abby and Maile to come?! I know Abby fell and actually landed her aerial combo with one foot completely off the beam. I’m not upset over her score. I just don’t know what I missed in that many routines.

    I know a lot of others were confused over the scores as well, as evidenced in many threads in gym groups online. I know I’m not perfect, and I know judges don’t see from our angle in the stands, so please keep doing these types of articles and educating the ignorance out of me. This helps me be all fired up in the meet and then understand later. 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.