Generally Unfavorable

How the over-reliance on Metacritic is slowly ruining game development.

lookaroundyoumaths.jpg
Hey, did you know that everything that’s written on this blog is my own personal opinion, and it doesn’t necessarily reflect the opinions of my employers? And that this is a personal website, that shouldn’t be taken to represent any company? Because that’s always a good thing to remember.

I don’t have any idea how deeply Metacritic has ingrained itself into the Movies, TV, or music worlds, but it’s definitely become the dominant site of its kind for videogames. Fans routinely mention Metacritic scores for games, as shorthand for both “is it worth buying?” and “ur stupid because ur favorite game only got a 74”. It’s become so influential that it’s worked its way into publishers and developers as well, as shorthand for both “is this game worth spending development and marketing money on?” and “you don’t get a bonus because your last game only got a 74”.

Whether or not the site has had a big impact on other media, it’s not difficult to understand why it’s built up so much influence on games. Metacritic was bought by Gamespot’s owner company not long after its inception, and the two sites have always been thematically similar since it became part of that “family” of sites — I’ve lost track of which company that is, exactly; last I checked it’s still CNet.

But more significantly, the idea of a review aggregator is important for videogames because games are such a big investment, of both time and money. I’ll spend ten bucks and two hours to watch a movie I’m interested in, even if it gets panned by the critics — what do those guys know, anyway? But when I’m being asked to spend fifty to sixty dollars and fifteen-to-fifty hours of my time, I’m going to make more effort to get a second (and third, and twenty-third) opinion. That’s the same reason it’s gotten more emphasis from publishers and developers: when games on average take two years or longer to produce, employ teams of hundreds of people, and have ever-increasing budgets, it’s appealing to have one convenient number you can treat as a measure of success or failure.

So now, everyone in the chain — from developer to publisher to consumer — has one number they can all look to. Gamers can use it to make purchasing decisions, developers can use it to know what works and what doesn’t, and publishers can use it to assess teams and individual employees and green-light future projects. EA is the most public and vocal about using Metacritic scores to drive development decisions, but they’re definitely not the only ones. It’s become ubiquitous.

It seems like a good idea, but the problems should be obvious to anyone who thinks about the situation for more than a minute. These problems haven’t gone unnoticed, but none of the criticisms (or defenses) of Metacritic I’ve read have really described how deeply flawed the whole situation is.

Lies

At a panel at the recent GDC, Adam Sessler of G4TV delivered a rant about Metacritic [that link is a YouTube video]. It would sound like a valid complaint that manages to hit on everybody’s biggest criticisms about Metacritic itself and its growing influence. Except, as the anonymous poster of “Ranting Back at the GDC09 Game Critics Rant” on game-ism.com sums up perfectly (seriously, read that post!): Sessler seems oblivious to how big the problem is.

Sessler claims that it’s a problem that Metacritic translates review sites’ scores to their own 100-point scale, meaning that the subtle nuance of G4’s “2 stars out of 5” gets translated to the unfairly negative 40% on Metacritic. Well you know, that’s pretty much how math works, Adam. He asks, “who knows the difference between a 73 and a 74?” The answer is “everyone who’s completed kindergarten.” It’s one more. Numbers go up and down and relate to each other in pretty predictable and commonly-agreed-upon ways; that’s why people like them so much.

What’s the difference between 2 stars and 1? Or 4 stars and 5? In practice, anything less than a 3 means “don’t buy” and anything over means “buy.” Shame on you for claiming that one method of assigning numbers to quality is any more objective, equitable, or understandable than another. Suggesting that 2/5 is somehow not equal to 40% is insultingly disingenuous. If you’ve got a problem with that, then simply do as the game-ism writer suggests: stop using points to score games.

Sessler claims up front that he “doesn’t like” assigning numbers to games, but “we have to.” Period, end of story. But why do you have to? Because you wouldn’t feel right about it otherwise? Because the publishers insist on it? Because some faceless executive at G4 demands it? Or G4’s parent company, whatever that is at the moment? If it’s a horrible blight on the industry when Metacritic does it, why is it a necessary evil when G4 does? Are we supposed to think that if an aggregator site like Metacritic didn’t exist, that publishers would start carefully reading and watching reviews in detail, picking up on all the subtleties of each source’s review scores? Hell no! They’d get their production team to add up the numbers for them. Either way, your 2/5 still turns into a 40%. Shame on you for pointing fingers instead of acknowledging just how much you’re a part of the problem.

Damn Lies

The other part of Sessler’s rant is getting closer to validity: publishers put way too much emphasis on Metacritic scores, from greenlighting games or continuing franchises, all the way to determining bonuses and making hiring/firing decisions. This is indeed an ominously messed-up situation, but again, Sessler’s response is an insultingly disingenuous “Who, me?” He’s just part of a lowly group of game critics, he claims; his opinions shouldn’t be affecting people’s jobs and salaries!

I don’t understand how anyone could get up every day and go on television in front of millions of people broadcasting his own buy/don’t buy opinions of a videogame, and not realize that he’s affecting people’s jobs and salaries. How does a person resolve “My opinion is important enough to televise (or publish) and get paid for” with “Don’t pay us no mind, we’re just goofing off”? All you have to do is read forum posts — much less blogs or nationally-syndicated series — where someone posts a negative review of a game and it’s followed with five comments of “That’s disappointing. Guess I won’t buy it” to make the connection that you’ve just cost the game five sales.

And I’m definitely not singling out Sessler or G4, either. Because videogames “grew up” along with the internet and they’re so closely tied, what’s “broadcast to millions of people” isn’t limited to nationally-syndicated television. If anything, videogame fans pay more attention to websites and forums than to television, if only because TV has traditionally done such a shitty job of game coverage. What people don’t seem to grasp is that everything on the internet is inherently broadcast to millions of people. I’m just some schmuck with a very low-traffic website who likes to ramble about “Lost,” but in the age of Google, the potential audience for my ramblings and the latest episode review in Variety is exactly the same: The Entire Planet Earth.

A while ago, back when I was still more naive, and before I learned to let a work speak for itself, I wrote a comment on what I thought was an unfairly negative review of a game I worked on. The responses from the writers were completely baffling. One said that it was just a blog post, and he wouldn’t have been as harsh if it’d been a “real” review. Another wrote a disclaimer that what he was writing was “difficult” because he was writing to me directly instead of writing a blog post or a review. Now, I’m pretty sure these guys had heard of Google, because they had ads on their website. But they somehow failed to appreciate that Google’s being The Great Equalizer means that everything you post is aggregated with everything else, whether or not you think of it as a “blog post” or a “review” or “first impressions” and mark it as such. And everything you write about a game, outside of private chat, is written directly to the people who worked on it, and their bosses, their bosses, and their customers.

That’s neither good nor bad; that’s just The Way Things Work. But whenever anyone complains about the sorry state of writing about games, it tends to be about poor grammar or punctuation or over-reliance on cliches, instead of the gross lack of professionalism. It’s the “enthusiast press,” you hear, it’s not something that publishers should be taking seriously. Meanwhile, a camera crew and its enormous entourage takes up your entire booth and thirty minutes of your time to tape an interview that will never air. Or a reporter just flakes on a scheduled interview, wasting the hour you spent heading to the meeting spot and waiting. Or a reviewer doesn’t play a game to its completion, but doesn’t mention that in the review because readers will get the wrong impression. Or a reviewer doesn’t bother to review the game at all but posts something to the effect of “We haven’t gotten around to it yet, aren’t we naughty?

So you end up with people who are broadcasting their work to millions, directly influencing other people’s careers and incomes, and treating the job like a big goof. I’m not saying that reviewers need to treat everything super-seriously: these are still videogames, after all, and they’re supposed to be fun, and I have to read or watch the reviews to decide what I want to buy. And I’m not saying that people writing about games need to watch what they say for fear of hurting someone’s feelings or of driving down sales; if you’re pussy-footing around your own real opinions, then you’re not doing anybody any good. What I am saying is that you need to understand what you’re doing, take responsibility for it, and accept culpability for the results of what you broadcast.

Or in other words: take your fucking job seriously. And don’t blame Metacritic for making reviews so important to game publishers; that’s going to happen no matter what, since most games are such a big enough investment that reviews drive sales. All Metacritic is doing is gathering your work in one place and slapping a number on it.

Statistics

Now, it’d be nice to say that everything is the fault of a bunch of slack game critics who won’t do their jobs and no I’m not bitter but game X totally should’ve gotten a 82 instead of a 79, dammit. But that’d be an over-simplification. Yes, reviews do drive sales which does affect creators, but that’s true of any medium, not just games. Every medium has a similar relationship between creators, reviewers, customers, and publishers. It’s not just inevitable, it works.

What’s unique to games is how the role of reviewers has been moved to a different part of the chain than it is for movies and television. In other media (at least as far as I understand it) reviews influence customers, who drive sales, and the studios/networks/publishers pay attention to sales figures. Movie and TV execs don’t seem to care about reviews until around the time of Oscar nominations or sweeps weeks, and that’s only so they can drive more sales. It’s the equivalent of advertising. But in games, reviews influence customers and publishers; Metacritic has become a de facto extension not just of the marketing department, but H.R. (affecting hires and bonuses) and creative direction (influencing which studios, teams, and franchises get greenlit).

I’d hope it’s obvious at this point, but I think this is horrible. Maybe not “fucking evil” as the game-ism.com article describes it, but just short of “evil.”

Soren Johnson would disagree. He’s a game designer who’s worked on Spore and the Civilization series, and on his blog he wrote “The Case for Metacritic” (all bolding his):

Metacritic has been a incredible boon for consumers and the games industry industry in general. The core reason is simple – publishers need a metric for quality.

On the surface, it sounds like a great idea. We’ve all seen cynical movie and TV execs crassly cite box office receipts or advertising revenue as if they were synonymous with “quality.” We’ve all seen cases of the game or movie or series that’s a “critical darling” but tanks financially. In games, we survived the days of shovelware and just plain lazy products that outsold anything else on the market because they got shelf space at Wal-Mart. And the best-selling games are indeed better overall than they were a decade ago — I had no interest in Halo 3, and I actively hated Grand Theft Auto 4, but I’d never claim that they were lazy or sub-par games. So isn’t it better to have quality driving business decisions?

Of course it is. But Metacritic is not an objective metric for quality. And it actually does more harm than good, because we want so bad to believe that it is.

All the remaining quotes are from Johnson’s blog post:

What should executives do if they want to objectively raise the quality bar at their companies? They certainly don’t have enough time to play and judge their games for themselves. Even if they did, they would invariably overvalue their own tastes and opinions.

As well they should, because that’s part of their job. Saying that executives don’t have time to play and judge their games is absurd and inexcusable. Unless you’re a male executive at a tampon company, you are obligated to try out the product you’re selling to people. Obviously, if you’re running EA or Activision or Take Two, it’d be simply impossible to exhaustively play every single title your company publishes. That’s why you hire creative directors who do have time to play the games you’re making. You listen to them when you’re making executive decisions.

You don’t listen to some dude who played the game over a weekend, wrote a few paragraphs about it on a blog, and slapped 2/5 stars on it. And that in no way is an insult against the aforementioned dude, it just means that at least he understands his intended audience. He was writing his review for an audience of consumers, not executives. A good executive isn’t going to ignore the reviews, obviously, but he’s not going to give them over-inflated importance, either.

I’ve been in the industry for ten years now, and when I started, the only objective measuring stick we had for “quality” was sales. Is that really what we want to return to?

Simply put: yes, I would like publishers and developers to go back to sales as their only objective measuring stick. Because sales are the only objective measuring stick. Because “quality” is inherently subjective. Aggregating a pool of game reviewers’ opinions is neither objective nor is it directly related to “quality.”

When I’m soliciting subjective opinions, I’m going to value those of my target audience the most. They’re the ones who bought the game, instead of getting a free review copy. They’re the ones who cared enough about the game to want to play it, instead of getting assigned to it. They’re the ones who invested their own money and time into it, instead of getting paid to write about it. Review sites are still casual enough that a reviewer is probably a fan of the game, but I know that the guy who bought it is a fan of the game. And he spent time with it, if only to get his money’s worth, and he probably didn’t lump it in with the 3 RTS games and 2 RPGs he had to play and write about that week.

And more importantly: sales and quality may not be directly related, but they’re by no means mutually exclusive, either. Unless you’re a clerk at an indie record store, or you still haven’t outgrown that insufferable phase most people get through during their sophomore year of college, you can’t justify the opinion that popularity is the enemy of quality.

Have we gotten so jaded that we have lost sight of what a wonderous thing this is? Metacritic puts an army of critics at our fingertips. Further, consumers are not morons who can’t judge a score within a larger context.

If consumers aren’t morons, then why would we trust an army critics instead of them? Why do we need to insert some kind of electoral college into the mix? Why do we spend so much time focused on the swing states of one or two beardy dudes in some tiny office in San Francisco, instead of our real constituents? (Who are likely hundreds of thousands of beardy dudes spread all over the globe).

Ultimately, the argument against Metacritic seems to revolve around whether publishers should take these numbers seriously. Some contracts are even beginning to include clauses tying bonuses to Metacritic scores. Others are concerned that publishers are too obsessed with raising their Metacritic averages. […] However, when I am in an EA meeting in which we talk about the need to raise our Metacritic scores – and the concrete steps or extra development time thus required – I’ll tell you what I feel like doing. I feel like jumping for joy.

Hooray! When I’m in a meeting and someone talks about what we can change in order to raise our Metacritic scores — and the extra items on my to-do list required — I’ll tell you what it sounds like to me. It sounds like a very loud and clear “Fuck You.”

We’ll just ignore the fact that you went through our hiring process and have been working at our company for some time, and that you’ve demonstrated you know what you’re talking about. We’ll assume that you’re incapable of doing the basics of your job, which includes the ability to assess everything involved in a decision and the ability to explain exactly why you came to the conclusion that you did. We’ll also ignore the hundreds of people on the internet who are giving us direct feedback on the games that we’ve made in the past. Instead, we’re going to listen to the dude who’s had a bug up his ass about our games (or our genre of games) since day one and wrote about what he wanted to see.

As for the renumeration issue, isn’t it a good thing that there is a second avenue for rewarding developers who have made a great game? Certainly, contracts are not going to stop favoring high game sales, so – hopefully – Metacritic clauses can ensure that a few developers with overlooked but highly-rated games will still be compensated.

Except when you realize that Metacritic is no longer just a resource for consumers, and is now being used as a resource for publishers. Which means that you’re taking a system that’s traditionally had multiple inputs and reducing it to just one. Saying that it will result in a world where “overlooked but highly-rated games” get rewarded is optimistic at best. It’s much more likely that you’re eliminating the possibility of these games, because both sales and development are based so heavily on reviews. To use the movie analogy again: it’ll set up a situation where studios only make Oscar-bait.

Further, developers also need to stop complaining that a few specific reviews are dragging down their Metacritic scores. Besides the fact that both good and bad reviews are earned, in a world without Metacritic, one low score from GameSpot, GameSpy, 1Up, or IGN becomes a disaster. Score aggregation, by definition, protects developers from too much power being in the hands of one critic.

And this is the biggest problem of all, what tips Metacritic from “necessary evil” to just plain “evil.” Because we really, really want to believe that it works like this. Everything evens out, numbers don’t lie, and we’ve finally achieved an objective measure of quality. But the math that makes us slap our foreheads whenever Adam Sessler tries to suggest that 2/5 is not equal to 40%, is the same math that should teach us Metacritic does not work like this.

Let’s say I’m conducting a survey. Here’s my methodology:

  • It’s neither a representative sampling nor a random sampling.
  • All participants are opt-in.
  • I’m choosing from the same pool of participants as I choose for every other survey.
  • Some responses are rejected. I don’t need to list my reasons for rejecting them.
  • Some participants are paid to participate, others are not. I don’t give any indication which is which.
  • Some participants are very familiar with the topic, others are not.
  • Each participant is given a different scale for his response. Some use a scale of 1-5, others a letter grade, others from 1-100.
  • The scale is labeled differently for each participant; what is “neither agree nor disagree” on one form is “strongly disagree” on another.
  • Some participants do not use a scale. For these, I use my best judgement to assign a 1-100 score.
  • Some responses are more heavily weighted than others. I don’t list which ones are given more weight, how much more weight they’re given, my basis for weighting them, or whether the weighting is uniform.
  • The sample size for each survey is completely different. Results based on 50 respondents are listed alongside results based on 15.
  • All of my results are reduced to a single number.

Now, let’s say that I’m not immediately fired for suggesting a system with such obvious room for error. Let’s say instead that I’m promoted to creative director of several multi-billion dollar companies, simultaneously. Is the problem more apparent now?

No, consumers are not morons, and neither are reviewers, or for that matter, most videogame company executives. But you don’t have to be a moron to see a number score and assume — even subconsciously — that some bona-fide math went into the calculation of that number. Metacritic implies a statistical validity and rigor that just doesn’t exist, which is exactly why it’s dangerous. It invites people to reduce everything to one of a hundred numbers and one of three colors. When “yellow” tells one guy “don’t buy this,” that’s not a disaster.

But it is a disaster when the difference between 73 and 74 has a real impact on your production schedule or your salary or, hell, even your team’s morale. One low score from GameSpot, GameSpy, 1Up, or IGN can make that difference, especially when there’s no indication of how heavily those sites are weighted, or when Metacritic translates scores to a scale the original reviewer didn’t intend. As soon as you reduce everything to a number, you treat it as absolute and throw out everything that went into calculating that number. You simply can’t have it both ways: saying that people are smart enough to read reviews and everything works out statistically.

Undo

I’ve come across as really harsh on reviewers here, so I should make it clear: I’m not against game critics, at all. What they do is crucial, for the reasons I mentioned at the beginning: games are just too long and too expensive to dive into without weighing opinions. And they should be harsh on games that deserve it, as long as the reviewer’s being professional about it.

I consult game reviews all the time, including Metacritic but more often the review aggregates on Joystiq. But I consult them when I’m buying games, not so much when I’m working on them. Some critical feedback is great: you start to figure out which reviewers you can trust and which aren’t giving you any useful information. But you don’t put too much weight into any one review for the same reasons you don’t put all your weight onto one forum post or any one piece of feedback: you’re going to forever be chasing other people’s opinions, instead of making the game that you want to make. The most obvious problem with Metacritic to a publisher or a developer is that it throws out any sense of nuance.

A better solution — and I’m certainly not the first to suggest it — is that of Rotten Tomatoes. You abandon the pretense of objectivity and make it clear what you’re doing: aggregating reviews. Everything review is either “fresh” or “rotten,” and you’re only presenting the percentage of reviewers who called it “fresh.” You don’t weight some reviews more heavily than others; instead, you present several unweighted versions to the user and make it absolutely explicit which reviews are included in each one. And when you list reviews as “good” or “bad”, you’re inviting the reader to look for more detail; instead of the Metacritic approach, which implies that they’ve done all the analysis for you. Rotten Tomatoes is always clear that it’s a percentage; Metacritic implies that it’s a Master Score. (One Review to Rule Them All).

I’m not sure whether something like this would ever work with games. I’d definitely like to see one. And I’d want the publishers to ignore it completely.

10 thoughts on “Generally Unfavorable”

  1. You missed the main reason for scores: readers like them. The kind who posts on dedicated forums often denies this, but pretty much every attempt to do without them has failed. You can see it in the responses online – when a review comes out, it’s not the individual lines and criticisms that get passed around, unless it’s a fan-forum doing a line-by-line number, it’s the number at the end.

    That said, I absolutely agree with you on Metacritic scores. It’s a great aggregator for finding reviews, but it’s always baffled me that people use the aggregate score for real decisions. If the whole page is bright red or something, sure, that tells you something, although games in that camp are usually dead-men-walking anyway. Other than that, you don’t get much of a feel for anything except the prevailing mood and what each source considers pithy.

    (I’m not a fan of scores in general, TBH – especially percentages, which try to make it look like there’s some great scientific process behind the whole thing. My favourite system was probably the old Daily Radar one, with games rated as Direct Hit, Hit, Miss or Dud – effectively thumbs up or thumbs down, with an extra modifier on either side. Specific enough to end the review, vague enough to mean you had to read the text to find out the details, and open enough that the highest score didn’t have to translate to something like “Everyone, no matter what they play, needs to play this game.” Sadly, it didn’t take off.)

  2. The old Idle Thumbs article GameRankings is not God adds a couple of valid points.

    This was a very thoughtful post and I’ll have to mull it over before I say anything, but I have a couple of kneejerk reactions:

    – I have to disagree with you on sales being a good way for measuring quality. I know you don’t mean first weekend sales but eventual ones too (long tail, etc.), but everything considered there’s a lot of great quality stuff out there that no one’s ever heard of. Jack Ketchum’s work, for example, which up until two years ago was totally out-of-print, even though he is probably the world’s greatest horror writer. The works of Anthony Burgess. Films like Heart of the Earth. Yeah, these things do happen where people pick-up on quality later on – Psychonauts, Ico, Beyond Good and Evil – but it’s struck me as being…Basically, if Kurt Cobain had never talked about The Vaselines as often as he had, no-one would ever have known about them. As it is, he did and they have a cult following now, but they would have just faded into obscurity otherwise.

    – I’ve always thought that Metacritic should divide its scores up. The letters go into one place, the percentages into another, the 10s into a third and the 5s into a last place. At least that way you can count down on the number of errors, and have numbers that may represent different schools of thought. I know for a fact that assigning a game a score out of a 100 is totally different than out of 5.

    Anyways I have a bit more to say but I need to churn it around a little. Good post.

  3. People will stay in a 2 star hotel, or eat in a 3 star restaurant. Getting a 40% or a 60% on a test, however, is either failing or hanging on by a thread. Saying “its math, idiot” is missing the way that stars, percents, and x/10 scoring are traditionally used in different contexts (which affects how they are perceived by people as rating systems), but in games reviewing have been all thrown together in the same pot.

    Your restaurant or hotel doesn’t isn’t deemed a failure if you are rated 2.5/5 stars. It might mean you’re not the fanciest one around, but you’re probably still serviceable and will still do just fine if the actual text of your review and word of mouth isn’t unkind.

    Metacritic lumping those together ignores the fact that, while they are mathematically equal, they are being read by human beings who use things beyond raw numbers when they determine what your score means.

  4. @Kroms:

    I have to disagree with you on sales being a good way for measuring quality

    My first thought was “I never said that,” but after re-reading, I can see that it sounds like I did. So I rewrote it.

    My point was that yes, I would like to see publishers go back to treating sales as their only objective measuring stick, because sales are the only objective measuring stick. You’re trying to sell a game, you count how many people bought your game. Done.

    Quality is inherently subjective. Aggregating scores from a group of reviewers is not objective, but it’s disguised to make it look like it is. People get nervous at the idea that sales are directly equal to quality, and it’s good that they do. But people regularly treat Metacritic scores as both 1) directly related to quality, and 2) objective. But it’s neither.

    I will say this, though: the thing that makes Metacritic so highly-regarded in videogames — that deeply incestuous relationship between publishers and review sites — is the same thing that makes “critical darlings” more rare in games than other media. Reviewers drive sales of games so much more than movies or TV, because people rarely buy games without consulting a review site or a forum first. I don’t have any numbers to back it up, but I’d bet you anything that the highest-reviewed games of the past 5 years closely if not directly correspond with the best-selling ones. In movies, it’s usually the opposite.

    I’ve always thought that Metacritic should divide its scores up. The letters go into one place, the percentages into another, the 10s into a third and the 5s into a last place.

    That could potentially be more “fair,” but it would obviate the Metacritic altogether. The appeal of the Rotten Tomatoes method, to me, is that you can still get at the separated data, but it does boil everything down to one number. The difference is that it doesn’t present that number as a master score; it always depicts it as an aggregation.

  5. @Jake

    People will stay in a 2 star hotel, or eat in a 3 star restaurant. Getting a 40% or a 60% on a test, however, is either failing or hanging on by a thread. Saying “its math, idiot” is missing the way that stars, percents, and x/10 scoring are traditionally used in different contexts.

    I understand perfectly well what Sessler was trying to say. I’m still saying that it’s bullshit. He — and anyone else who says that the big problem with Metacritic is that it takes scores out of context — is claiming that it’s perfectly fine when a reviewer takes his thoughts and impressions of a game and reduces it to a number, but it’s horrible when Metacritic does the exact same thing. If you want all the subtle nuance of your review to be preserved, then it’s simple: don’t slap a number on the end of it. If you’re deeply offended that someone would respond “it’s math, idiot,” then it’s simple: don’t use math, idiot.

    If you want your 2/5 stars to mean the same as the letter grade “D,” then use the letter grade “D.” If your gifts at critiquing a game are so finely-honed that you feel the need to start using D- or B+ or 2.5 stars or a 1-10 scale, then don’t go to San Francisco and bitch that Metacritic expects people to understand the difference between a 73 and a 74. And if you have to put a written description of what the score means every time you put up a 7.5 or an 84% or “Awesome!” or “Editor’s Choice!” then get a fucking clue and realize your score numbers aren’t working.

    Comparing it to a hotel or restaurant is flawed, because price doesn’t enter into the equation: I’ll stay at a 2-star hotel because I know that it’s much cheaper than a 3-star, but still livable. But I have to pay the same $60 for F.E.A.R. 2 as for Skate 2, so 84 vs 77 doesn’t mean that much to me. And comparing it to a letter grade is flawed, too, because the person giving the grade has to establish a somewhat objective standard that is the same for every student being graded, and list what you got right and wrong.

    Reviewers have incorrectly conflated their role as consumer advisor with that of critic for so long, that everybody just accepts it. And when you point out that it’s bogus, people give the knee-jerk “you just don’t appreciate the context, man!” But as a consumer advisor, you’re just giving one piece of advice: buy or don’t buy. (Or if you insist: buy, read the review first, or don’t buy). If you’re freaked out that the notion that 2/5 is a “buy” on your scale, but it’s a 40% “don’t buy” on somebody else’s, then how about this: say “buy” or “don’t buy.”

    When you’re critiquing the game, that’s where the nuance and context comes in. And that’s what your writing is for, not some number. Why is this 74 and not 73? Read the review. How come this is 2 stars and not 3? Read the review. Why is Fallout 3 an A- but GTA4 an A? First: what does that even mean? Second: read the review.

  6. @Richard:

    You missed the main reason for scores: readers like them. The kind who posts on dedicated forums often denies this, but pretty much every attempt to do without them has failed. You can see it in the responses online – when a review comes out, it’s not the individual lines and criticisms that get passed around, unless it’s a fan-forum doing a line-by-line number, it’s the number at the end.
    […]
    (I’m not a fan of scores in general, TBH […] My favourite system was probably the old Daily Radar one, with games rated as Direct Hit, Hit, Miss or Dud […] Sadly, it didn’t take off.)

    You know, I keep hearing this, and I’ve been hearing it for years, and I’m still no closer to getting an idea of why we’re supposed to accept it, exactly. I’ve never come across anyone — in person, online, or in print — who really liked review scores — but people still insist that life doesn’t work without them. This is probably going to make it sound like I’m skeptical, but: I’m extremely skeptical. When I hear everyone describe something as a necessary evil, my first question is why is it necessary?

    Basically, my question is the same as it is to Sessler: who’s making the call, and what is it based on? “Readers like them.” How do you know? What, exactly, made it not work for Games Radar? Do sales or subscriptions or site hits dramatically drop when you don’t use them, and can you attribute that directly to review scores and not any of the other 1000 capricious things that game fans will latch onto or reject?

    Are there dozens of e-mails or forum posts complaining about the lack of a score? If so, why do they get more weight than the hundreds of e-mails and forum posts complaining OMG INTERNET BIAS! because Fallout 3 got a 98 and GTA4 got a 99? Sure, people on forums will list the score of a review instead of choosing quotes from it: that’s the effect, though, not the cause. If you put a number on a review, people (including me), are going to use it. If you don’t, people are actually going to say “positive” or “negative” or start pulling quotes from it.

    It sounds to me like there’s this assumption of a highly lucrative drooling fanboy demographic that nobody particularly likes having to cater to, but “Hey, what are you gonna do?” And again, I’m skeptical. Joystiq and Kotaku have started doing reviews without scores, and although they’re not known for their reviews, they’re about as mass-market as you can get in videogames. Hell, movies have a much much larger and more varied audience than videogames, and Siskel and Ebert’s popularity didn’t suffer from their thumbs-up/thumbs-down scale.

    That said, I absolutely agree with you on Metacritic scores. It’s a great aggregator for finding reviews, but it’s always baffled me that people use the aggregate score for real decisions. […] you don’t get much of a feel for anything except the prevailing mood and what each source considers pithy.

    I’d say you don’t even get that, because “each source” doesn’t enter into it. There is absolutely no room for nuance with Metacritic. You get everybody jumping onto the Latest Big Thing, with dozens of sites giving GTA4 or Fallout3 a perfect score, so the heavily-hyped games get even more hype. And everything else just settles into a yellow-green morass, just like any other bell curve.

    Plus — and I’m mentioning it again just because it really bugs me both as a game developer and as someone who has a basic understanding of math: they weight sites like Gamespot, 1Up, IGN, etc. more heavily than the others, without indicating which ones are weighted, how much weight they’re given or their criteria for choosing.

  7. I wish more reviews went the way of “buy/don’t buy”. Personally I have never been a really big fan of scores of any kind as all I really care about is how the game played, did the story feel tired, and what the reviewer thought. A score is a extremely subjective thing based off what the reviewer gave weight to when playing the game and it is different for each person. What one person thought was a clever and engaging story another could find trite and cliched. This applies to pretty much all reviews I read though not just games. I want to know what the reviewers thoughts were in words not in some number. Plus if everyone was really actually interested in a number you could aggregate all the buys and don’t buys and assign a percentage to how many of each there were. Then you would have a scale of how many people enjoyed it enough to suggest paying for it and how many didn’t.

  8. Do sales or subscriptions or site hits dramatically drop when you don’t use them, and can you attribute that directly to review scores and not any of the other 1000 capricious things that game fans will latch onto or reject?

    Are there dozens of e-mails or forum posts complaining about the lack of a score?

    From what I understand, this is exactly what happens whenever anyone tries it – they get inundated with letters from people who don’t want to read the review and just want the score.

    Look at 1UP, as soon as it went to letter grades – NeoGAF insists on trying to convert the letter grades to numbers so it could compare it to the other sites that still use numbers, despite pretty much every reviewer on that site responding to what an A- is equivalent to with “it’s better than a B+ but not good enough to get a straight A.”

    And I think NeoGAF is a good illustration that the drooling fanboy demographic is alive and well.

    I have no idea how the blogs are getting away with it – maybe because they’re blogs, and so whenever anyone complains they can respond with a hearty “fuck you and the horse you rode in on” which would be the first recorded instance of the blogs doing the enthusiast press a favour.

  9. On “no scores.” It’s been tried, in print (CGW in the US, I think) and online (somewhere) and in both cases readership dropped. When people were used to scores, they moaned when they were removed. Mathematically proven. Sorry.

    Joystiq and Kotaku are red herrings. The cynic in me suggests they don’t do scores because they don’t want to have the inevitable rows/threats/blacklisting that comes when a publisher is unhappy with a score. More charitably, they didn’t built their traffic on review scores (unlike a great many sites and print magazines)and a big chunk of their readers are the “no scores!” brigade in any case.

    Good for them. But they aren’t mainstream. Mainstream punters have probably never heard of those sites because they don’t care about how games are made, they just want to buy a good one. That means they want a score, and they’ll turn to whoever gives them one. And they won’t document this process on NeoGAF because they don’t know it exists.

  10. On “no scores.” It’s been tried, in print (CGW in the US, I think) and online (somewhere) and in both cases readership dropped. When people were used to scores, they moaned when they were removed. Mathematically proven. Sorry.

    Nope, as convincing a mathematical proof as “somewhere, I think” is, I’m still not buying it. “Sorry.”

    You say that CGW tried dropping scores, fans complained, and sales dropped. First, what don’t videogame fans complain about? That’s what we do. Second, was there ever a period over the last decade of CGW’s existence that sales didn’t drop? The far more likely scenario is that it was much easier for the publishers to say “bringing back scores will bring back sales” than to say “our business model is doomed, because print media about digital entertainment is becoming increasingly irrelevant.”

    But they aren’t mainstream. Mainstream punters have probably never heard of those sites because they don’t care about how games are made, they just want to buy a good one. That means they want a score, and they’ll turn to whoever gives them one.

    Claiming that Joystiq and Kotaku aren’t “mainstream” is ludicrous. In the US, at least. Especially if you’re claiming that NeoGAF is a better representation — I work in the industry, and I hadn’t even heard of that forum until a year ago. People are much more likely to have seen Something Awful and even Penny Arcade’s forums than NeoGAF.

    It is true that Joystiq & Kotaku are only tangentially “review” sites, and they definitely didn’t build their audiences that way. And the big sites are still 1Up, Gamespot, and IGN. But I say you’ve got the cause and effect reversed: those aren’t popular because of their scores, they’re popular because they’ve always been there. The scores are something that people are just used to.

    Publishers and reviewers are much more attached to scores than fans are. If they want to keep using them, that’s their prerogative. I just wish they’d be up-front about it, instead of claiming that fans insist on them. If you’re a reviewer and you really don’t like scores as much as you claim, then stop using them. Fans will adjust.

Comments are closed.