Generally Unfavorable

How the over-reliance on Metacritic is slowly ruining game development.

lookaroundyoumaths.jpg
Hey, did you know that everything that’s written on this blog is my own personal opinion, and it doesn’t necessarily reflect the opinions of my employers? And that this is a personal website, that shouldn’t be taken to represent any company? Because that’s always a good thing to remember.

I don’t have any idea how deeply Metacritic has ingrained itself into the Movies, TV, or music worlds, but it’s definitely become the dominant site of its kind for videogames. Fans routinely mention Metacritic scores for games, as shorthand for both “is it worth buying?” and “ur stupid because ur favorite game only got a 74”. It’s become so influential that it’s worked its way into publishers and developers as well, as shorthand for both “is this game worth spending development and marketing money on?” and “you don’t get a bonus because your last game only got a 74”.

Whether or not the site has had a big impact on other media, it’s not difficult to understand why it’s built up so much influence on games. Metacritic was bought by Gamespot’s owner company not long after its inception, and the two sites have always been thematically similar since it became part of that “family” of sites — I’ve lost track of which company that is, exactly; last I checked it’s still CNet.

But more significantly, the idea of a review aggregator is important for videogames because games are such a big investment, of both time and money. I’ll spend ten bucks and two hours to watch a movie I’m interested in, even if it gets panned by the critics — what do those guys know, anyway? But when I’m being asked to spend fifty to sixty dollars and fifteen-to-fifty hours of my time, I’m going to make more effort to get a second (and third, and twenty-third) opinion. That’s the same reason it’s gotten more emphasis from publishers and developers: when games on average take two years or longer to produce, employ teams of hundreds of people, and have ever-increasing budgets, it’s appealing to have one convenient number you can treat as a measure of success or failure.

So now, everyone in the chain — from developer to publisher to consumer — has one number they can all look to. Gamers can use it to make purchasing decisions, developers can use it to know what works and what doesn’t, and publishers can use it to assess teams and individual employees and green-light future projects. EA is the most public and vocal about using Metacritic scores to drive development decisions, but they’re definitely not the only ones. It’s become ubiquitous.

It seems like a good idea, but the problems should be obvious to anyone who thinks about the situation for more than a minute. These problems haven’t gone unnoticed, but none of the criticisms (or defenses) of Metacritic I’ve read have really described how deeply flawed the whole situation is.

Lies

At a panel at the recent GDC, Adam Sessler of G4TV delivered a rant about Metacritic [that link is a YouTube video]. It would sound like a valid complaint that manages to hit on everybody’s biggest criticisms about Metacritic itself and its growing influence. Except, as the anonymous poster of “Ranting Back at the GDC09 Game Critics Rant” on game-ism.com sums up perfectly (seriously, read that post!): Sessler seems oblivious to how big the problem is.

Sessler claims that it’s a problem that Metacritic translates review sites’ scores to their own 100-point scale, meaning that the subtle nuance of G4’s “2 stars out of 5” gets translated to the unfairly negative 40% on Metacritic. Well you know, that’s pretty much how math works, Adam. He asks, “who knows the difference between a 73 and a 74?” The answer is “everyone who’s completed kindergarten.” It’s one more. Numbers go up and down and relate to each other in pretty predictable and commonly-agreed-upon ways; that’s why people like them so much.

What’s the difference between 2 stars and 1? Or 4 stars and 5? In practice, anything less than a 3 means “don’t buy” and anything over means “buy.” Shame on you for claiming that one method of assigning numbers to quality is any more objective, equitable, or understandable than another. Suggesting that 2/5 is somehow not equal to 40% is insultingly disingenuous. If you’ve got a problem with that, then simply do as the game-ism writer suggests: stop using points to score games.

Sessler claims up front that he “doesn’t like” assigning numbers to games, but “we have to.” Period, end of story. But why do you have to? Because you wouldn’t feel right about it otherwise? Because the publishers insist on it? Because some faceless executive at G4 demands it? Or G4’s parent company, whatever that is at the moment? If it’s a horrible blight on the industry when Metacritic does it, why is it a necessary evil when G4 does? Are we supposed to think that if an aggregator site like Metacritic didn’t exist, that publishers would start carefully reading and watching reviews in detail, picking up on all the subtleties of each source’s review scores? Hell no! They’d get their production team to add up the numbers for them. Either way, your 2/5 still turns into a 40%. Shame on you for pointing fingers instead of acknowledging just how much you’re a part of the problem.

Damn Lies

The other part of Sessler’s rant is getting closer to validity: publishers put way too much emphasis on Metacritic scores, from greenlighting games or continuing franchises, all the way to determining bonuses and making hiring/firing decisions. This is indeed an ominously messed-up situation, but again, Sessler’s response is an insultingly disingenuous “Who, me?” He’s just part of a lowly group of game critics, he claims; his opinions shouldn’t be affecting people’s jobs and salaries!

I don’t understand how anyone could get up every day and go on television in front of millions of people broadcasting his own buy/don’t buy opinions of a videogame, and not realize that he’s affecting people’s jobs and salaries. How does a person resolve “My opinion is important enough to televise (or publish) and get paid for” with “Don’t pay us no mind, we’re just goofing off”? All you have to do is read forum posts — much less blogs or nationally-syndicated series — where someone posts a negative review of a game and it’s followed with five comments of “That’s disappointing. Guess I won’t buy it” to make the connection that you’ve just cost the game five sales.

And I’m definitely not singling out Sessler or G4, either. Because videogames “grew up” along with the internet and they’re so closely tied, what’s “broadcast to millions of people” isn’t limited to nationally-syndicated television. If anything, videogame fans pay more attention to websites and forums than to television, if only because TV has traditionally done such a shitty job of game coverage. What people don’t seem to grasp is that everything on the internet is inherently broadcast to millions of people. I’m just some schmuck with a very low-traffic website who likes to ramble about “Lost,” but in the age of Google, the potential audience for my ramblings and the latest episode review in Variety is exactly the same: The Entire Planet Earth.

A while ago, back when I was still more naive, and before I learned to let a work speak for itself, I wrote a comment on what I thought was an unfairly negative review of a game I worked on. The responses from the writers were completely baffling. One said that it was just a blog post, and he wouldn’t have been as harsh if it’d been a “real” review. Another wrote a disclaimer that what he was writing was “difficult” because he was writing to me directly instead of writing a blog post or a review. Now, I’m pretty sure these guys had heard of Google, because they had ads on their website. But they somehow failed to appreciate that Google’s being The Great Equalizer means that everything you post is aggregated with everything else, whether or not you think of it as a “blog post” or a “review” or “first impressions” and mark it as such. And everything you write about a game, outside of private chat, is written directly to the people who worked on it, and their bosses, their bosses, and their customers.

That’s neither good nor bad; that’s just The Way Things Work. But whenever anyone complains about the sorry state of writing about games, it tends to be about poor grammar or punctuation or over-reliance on cliches, instead of the gross lack of professionalism. It’s the “enthusiast press,” you hear, it’s not something that publishers should be taking seriously. Meanwhile, a camera crew and its enormous entourage takes up your entire booth and thirty minutes of your time to tape an interview that will never air. Or a reporter just flakes on a scheduled interview, wasting the hour you spent heading to the meeting spot and waiting. Or a reviewer doesn’t play a game to its completion, but doesn’t mention that in the review because readers will get the wrong impression. Or a reviewer doesn’t bother to review the game at all but posts something to the effect of “We haven’t gotten around to it yet, aren’t we naughty?

So you end up with people who are broadcasting their work to millions, directly influencing other people’s careers and incomes, and treating the job like a big goof. I’m not saying that reviewers need to treat everything super-seriously: these are still videogames, after all, and they’re supposed to be fun, and I have to read or watch the reviews to decide what I want to buy. And I’m not saying that people writing about games need to watch what they say for fear of hurting someone’s feelings or of driving down sales; if you’re pussy-footing around your own real opinions, then you’re not doing anybody any good. What I am saying is that you need to understand what you’re doing, take responsibility for it, and accept culpability for the results of what you broadcast.

Or in other words: take your fucking job seriously. And don’t blame Metacritic for making reviews so important to game publishers; that’s going to happen no matter what, since most games are such a big enough investment that reviews drive sales. All Metacritic is doing is gathering your work in one place and slapping a number on it.

Statistics

Now, it’d be nice to say that everything is the fault of a bunch of slack game critics who won’t do their jobs and no I’m not bitter but game X totally should’ve gotten a 82 instead of a 79, dammit. But that’d be an over-simplification. Yes, reviews do drive sales which does affect creators, but that’s true of any medium, not just games. Every medium has a similar relationship between creators, reviewers, customers, and publishers. It’s not just inevitable, it works.

What’s unique to games is how the role of reviewers has been moved to a different part of the chain than it is for movies and television. In other media (at least as far as I understand it) reviews influence customers, who drive sales, and the studios/networks/publishers pay attention to sales figures. Movie and TV execs don’t seem to care about reviews until around the time of Oscar nominations or sweeps weeks, and that’s only so they can drive more sales. It’s the equivalent of advertising. But in games, reviews influence customers and publishers; Metacritic has become a de facto extension not just of the marketing department, but H.R. (affecting hires and bonuses) and creative direction (influencing which studios, teams, and franchises get greenlit).

I’d hope it’s obvious at this point, but I think this is horrible. Maybe not “fucking evil” as the game-ism.com article describes it, but just short of “evil.”

Soren Johnson would disagree. He’s a game designer who’s worked on Spore and the Civilization series, and on his blog he wrote “The Case for Metacritic” (all bolding his):

Metacritic has been a incredible boon for consumers and the games industry industry in general. The core reason is simple – publishers need a metric for quality.

On the surface, it sounds like a great idea. We’ve all seen cynical movie and TV execs crassly cite box office receipts or advertising revenue as if they were synonymous with “quality.” We’ve all seen cases of the game or movie or series that’s a “critical darling” but tanks financially. In games, we survived the days of shovelware and just plain lazy products that outsold anything else on the market because they got shelf space at Wal-Mart. And the best-selling games are indeed better overall than they were a decade ago — I had no interest in Halo 3, and I actively hated Grand Theft Auto 4, but I’d never claim that they were lazy or sub-par games. So isn’t it better to have quality driving business decisions?

Of course it is. But Metacritic is not an objective metric for quality. And it actually does more harm than good, because we want so bad to believe that it is.

All the remaining quotes are from Johnson’s blog post:

What should executives do if they want to objectively raise the quality bar at their companies? They certainly don’t have enough time to play and judge their games for themselves. Even if they did, they would invariably overvalue their own tastes and opinions.

As well they should, because that’s part of their job. Saying that executives don’t have time to play and judge their games is absurd and inexcusable. Unless you’re a male executive at a tampon company, you are obligated to try out the product you’re selling to people. Obviously, if you’re running EA or Activision or Take Two, it’d be simply impossible to exhaustively play every single title your company publishes. That’s why you hire creative directors who do have time to play the games you’re making. You listen to them when you’re making executive decisions.

You don’t listen to some dude who played the game over a weekend, wrote a few paragraphs about it on a blog, and slapped 2/5 stars on it. And that in no way is an insult against the aforementioned dude, it just means that at least he understands his intended audience. He was writing his review for an audience of consumers, not executives. A good executive isn’t going to ignore the reviews, obviously, but he’s not going to give them over-inflated importance, either.

I’ve been in the industry for ten years now, and when I started, the only objective measuring stick we had for “quality” was sales. Is that really what we want to return to?

Simply put: yes, I would like publishers and developers to go back to sales as their only objective measuring stick. Because sales are the only objective measuring stick. Because “quality” is inherently subjective. Aggregating a pool of game reviewers’ opinions is neither objective nor is it directly related to “quality.”

When I’m soliciting subjective opinions, I’m going to value those of my target audience the most. They’re the ones who bought the game, instead of getting a free review copy. They’re the ones who cared enough about the game to want to play it, instead of getting assigned to it. They’re the ones who invested their own money and time into it, instead of getting paid to write about it. Review sites are still casual enough that a reviewer is probably a fan of the game, but I know that the guy who bought it is a fan of the game. And he spent time with it, if only to get his money’s worth, and he probably didn’t lump it in with the 3 RTS games and 2 RPGs he had to play and write about that week.

And more importantly: sales and quality may not be directly related, but they’re by no means mutually exclusive, either. Unless you’re a clerk at an indie record store, or you still haven’t outgrown that insufferable phase most people get through during their sophomore year of college, you can’t justify the opinion that popularity is the enemy of quality.

Have we gotten so jaded that we have lost sight of what a wonderous thing this is? Metacritic puts an army of critics at our fingertips. Further, consumers are not morons who can’t judge a score within a larger context.

If consumers aren’t morons, then why would we trust an army critics instead of them? Why do we need to insert some kind of electoral college into the mix? Why do we spend so much time focused on the swing states of one or two beardy dudes in some tiny office in San Francisco, instead of our real constituents? (Who are likely hundreds of thousands of beardy dudes spread all over the globe).

Ultimately, the argument against Metacritic seems to revolve around whether publishers should take these numbers seriously. Some contracts are even beginning to include clauses tying bonuses to Metacritic scores. Others are concerned that publishers are too obsessed with raising their Metacritic averages. […] However, when I am in an EA meeting in which we talk about the need to raise our Metacritic scores – and the concrete steps or extra development time thus required – I’ll tell you what I feel like doing. I feel like jumping for joy.

Hooray! When I’m in a meeting and someone talks about what we can change in order to raise our Metacritic scores — and the extra items on my to-do list required — I’ll tell you what it sounds like to me. It sounds like a very loud and clear “Fuck You.”

We’ll just ignore the fact that you went through our hiring process and have been working at our company for some time, and that you’ve demonstrated you know what you’re talking about. We’ll assume that you’re incapable of doing the basics of your job, which includes the ability to assess everything involved in a decision and the ability to explain exactly why you came to the conclusion that you did. We’ll also ignore the hundreds of people on the internet who are giving us direct feedback on the games that we’ve made in the past. Instead, we’re going to listen to the dude who’s had a bug up his ass about our games (or our genre of games) since day one and wrote about what he wanted to see.

As for the renumeration issue, isn’t it a good thing that there is a second avenue for rewarding developers who have made a great game? Certainly, contracts are not going to stop favoring high game sales, so – hopefully – Metacritic clauses can ensure that a few developers with overlooked but highly-rated games will still be compensated.

Except when you realize that Metacritic is no longer just a resource for consumers, and is now being used as a resource for publishers. Which means that you’re taking a system that’s traditionally had multiple inputs and reducing it to just one. Saying that it will result in a world where “overlooked but highly-rated games” get rewarded is optimistic at best. It’s much more likely that you’re eliminating the possibility of these games, because both sales and development are based so heavily on reviews. To use the movie analogy again: it’ll set up a situation where studios only make Oscar-bait.

Further, developers also need to stop complaining that a few specific reviews are dragging down their Metacritic scores. Besides the fact that both good and bad reviews are earned, in a world without Metacritic, one low score from GameSpot, GameSpy, 1Up, or IGN becomes a disaster. Score aggregation, by definition, protects developers from too much power being in the hands of one critic.

And this is the biggest problem of all, what tips Metacritic from “necessary evil” to just plain “evil.” Because we really, really want to believe that it works like this. Everything evens out, numbers don’t lie, and we’ve finally achieved an objective measure of quality. But the math that makes us slap our foreheads whenever Adam Sessler tries to suggest that 2/5 is not equal to 40%, is the same math that should teach us Metacritic does not work like this.

Let’s say I’m conducting a survey. Here’s my methodology:

  • It’s neither a representative sampling nor a random sampling.
  • All participants are opt-in.
  • I’m choosing from the same pool of participants as I choose for every other survey.
  • Some responses are rejected. I don’t need to list my reasons for rejecting them.
  • Some participants are paid to participate, others are not. I don’t give any indication which is which.
  • Some participants are very familiar with the topic, others are not.
  • Each participant is given a different scale for his response. Some use a scale of 1-5, others a letter grade, others from 1-100.
  • The scale is labeled differently for each participant; what is “neither agree nor disagree” on one form is “strongly disagree” on another.
  • Some participants do not use a scale. For these, I use my best judgement to assign a 1-100 score.
  • Some responses are more heavily weighted than others. I don’t list which ones are given more weight, how much more weight they’re given, my basis for weighting them, or whether the weighting is uniform.
  • The sample size for each survey is completely different. Results based on 50 respondents are listed alongside results based on 15.
  • All of my results are reduced to a single number.

Now, let’s say that I’m not immediately fired for suggesting a system with such obvious room for error. Let’s say instead that I’m promoted to creative director of several multi-billion dollar companies, simultaneously. Is the problem more apparent now?

No, consumers are not morons, and neither are reviewers, or for that matter, most videogame company executives. But you don’t have to be a moron to see a number score and assume — even subconsciously — that some bona-fide math went into the calculation of that number. Metacritic implies a statistical validity and rigor that just doesn’t exist, which is exactly why it’s dangerous. It invites people to reduce everything to one of a hundred numbers and one of three colors. When “yellow” tells one guy “don’t buy this,” that’s not a disaster.

But it is a disaster when the difference between 73 and 74 has a real impact on your production schedule or your salary or, hell, even your team’s morale. One low score from GameSpot, GameSpy, 1Up, or IGN can make that difference, especially when there’s no indication of how heavily those sites are weighted, or when Metacritic translates scores to a scale the original reviewer didn’t intend. As soon as you reduce everything to a number, you treat it as absolute and throw out everything that went into calculating that number. You simply can’t have it both ways: saying that people are smart enough to read reviews and everything works out statistically.

Undo

I’ve come across as really harsh on reviewers here, so I should make it clear: I’m not against game critics, at all. What they do is crucial, for the reasons I mentioned at the beginning: games are just too long and too expensive to dive into without weighing opinions. And they should be harsh on games that deserve it, as long as the reviewer’s being professional about it.

I consult game reviews all the time, including Metacritic but more often the review aggregates on Joystiq. But I consult them when I’m buying games, not so much when I’m working on them. Some critical feedback is great: you start to figure out which reviewers you can trust and which aren’t giving you any useful information. But you don’t put too much weight into any one review for the same reasons you don’t put all your weight onto one forum post or any one piece of feedback: you’re going to forever be chasing other people’s opinions, instead of making the game that you want to make. The most obvious problem with Metacritic to a publisher or a developer is that it throws out any sense of nuance.

A better solution — and I’m certainly not the first to suggest it — is that of Rotten Tomatoes. You abandon the pretense of objectivity and make it clear what you’re doing: aggregating reviews. Everything review is either “fresh” or “rotten,” and you’re only presenting the percentage of reviewers who called it “fresh.” You don’t weight some reviews more heavily than others; instead, you present several unweighted versions to the user and make it absolutely explicit which reviews are included in each one. And when you list reviews as “good” or “bad”, you’re inviting the reader to look for more detail; instead of the Metacritic approach, which implies that they’ve done all the analysis for you. Rotten Tomatoes is always clear that it’s a percentage; Metacritic implies that it’s a Master Score. (One Review to Rule Them All).

I’m not sure whether something like this would ever work with games. I’d definitely like to see one. And I’d want the publishers to ignore it completely.