Jim Felli - On Ratings
What does rating games really mean?
I have a problem with game ratings. Not the concept: as long as people choose to partition games into various levels of "goodness" or "badness" there will be a demand for scoring systems. As a player, I appreciate it; as a designer, I accept it; as a scientist, I wish we were better at it. And we can be. It's just hard. I'll grind my scientific axe first, then propose a few rating systems that I think would prove more useful than those we typically find today. Slapping down a rating scale open to all comers without controls and absent support leaves the entire system vulnerable to a variety of ill influences. Beyond conditionality and motivational biases, which are always present, numerical rating systems typically fall prey to issues arising from value nonlinearity and inconsistent criteria reduction.
Let's start with conditionality bias. Every assessment we make is conditioned on the public and private information we have available at the time. Suppose you're asked about the best place to park downtown. You'll base your assessment on public knowledge (e.g., local lots, garages, side streets) and private information (e.g., garage X has a broken gate, lot Y fills up by 10 AM, side street Z has alternate parking). Some of your private information will include your personal preferences (e.g., to park above ground rather than underground). Unless we take the time to specify and share our private information, it is unreasonable to think that our assessment of the best place to park will be the same.
Even given common information, the rating scales themselves can present problems. Two that almost always plague numerical scales are a lack of common reference points and a presumption of value linearity. With regard to reference points, raters with different "starting points" are unlikely to express the same rating with the same number. If you and I both feel the same way about a game and use the same well-defined rubric to assign a numerical score, we stand a good chance of picking the same number. However, if I make my ratings starting at 5 and adjust away from 5 based on things I like or dislike, and you start at 10 and deduct points for things you dislike, it's highly unlikely that we will agree on what a 7 means. My 7 will be driven by accolades; yours, by indictments. The assumption of value linearity compounds this problem. When presented with a numerical scale, people typically presume that the incremental value derived from a rating increase from X to X+1 is the same regardless of the value of X. That is, the added value expressed in going from a rating of 1 to 2 is the same as that expressed in going from a rating of 6 to 7. This rarely holds true. On a ten point scale, there is seldom any real difference between ratings of 1 or 2 or between ratings of 9 or 10; however, going from a rating of 7 to 8 can present a challenging hurdle.
The greatest problem facing numerical scales is the distillation of multiple assessments into a single number. Rating a game ultimately comes down to assessing the game's merit across multiple criteria. We can measure a game's table footprint using square centimeters, its physical weight in kilograms, and its play duration in hours. But how should we characterize its difficulty, complexity, or quality? How do we take into account the milling of wooden components, the number of decisions per turn, the number of alternatives per decision, the manner in which uncertainties are resolved, the durability of the cards, and so forth? Integrating several important attributes into a single score requires careful weighting and scaling which, by nature, yields wide variability across assessors.
For example, suppose you and I agree to assess a game based only on its "thematic value" and "playability," and that both of these attributes will be scored on a mutually accepted linear scale from 1 to 10. Suppose further that we agree that the game scores a 4 on theme and an 8 on playability. On a straight average, the game would score a 6. This presumes a 50/50 weighting on theme and playability. Suppose, however, that I value theme much more than playability and you value playability much more than theme. In fact, let's just say that our weights are 80/20 and 30/70, respectively. My rating would be 4.8 and yours would be 6.8. That's a huge difference. Even if we had a well-designed numerical rating system, common information, common reference points, common marginal value functions, and common weights across a set of common criteria, we'd still have problems. We would also have to share common preferences.
Regardless of our desire for objectivity, we cannot discount the influence of our personal preferences and the private motivations they drive. If I hate war games, it is unlikely that I will give a war game as high a rating as someone else who loves them. This may or may not be the consequence of conscious choices. In some cases, my distaste may be expressed in a low score for a specific attribute (e.g., game complexity); in other cases, my dislike may be manifest in an explicit penalty simply because "I don't like these kinds of games." Unless people take time to leave detailed comments when they drop a numerical rating, we really don't know what underlies their assessment. It could be a great game that they didn't like because they are sick and tired of zombies, or a terrible game that they loved because they adore all things Cthulhu.
So, what's the solution? How are we to talk about average scores and variances and skewness and modes and medians and all sorts of cool statistical measures without a numerical scale? Simple: we don't. Without clear and appropriate context, ratings are at best worthless. Even so, the desire behind their use is valid: people want to know how a game stacks up before they commit to playing or buying it. In descending order, my preferences for satisfying this need are: written reviews, Likert-type scales, categorical scoring, and augmented stars.
For depth and completeness with solid context, it's hard to beat a well-crafted written review. I find them to be more thoughtful and articulate than audio or video reviews, especially when it's never clear to me whether the person I'm listening to or watching has actually played the game in question or is just recycling words lifted from other reviewers or the back of the box. Whether or not I agree with their assessment, a reviewer's familiarity with and mastery of a game is readily apparent in a written review, and that alone makes it more valuable to me. Unfortunately, the time, talent, and dedication required to produce high quality written reviews precludes their timely availability in the current market environment.
In the absence of a written review, I recommend a set of Likert-type scales for differentiating games. Examples of these types of scales include pain scales where you are asked to choose the face that best describes your level of pain, or statements for which you are asked to put a check mark on a line to represent your degree of agreement between endpoints like "strongly agree" and "strongly disagree." For appraising a game, a set of scales might include attributes such as duration of play, depth of theme, richness of decision space, etc. For each attribute, the reviewer would make a mark on the scale to denote where they believe the game falls between the worst and best rating for that attribute. Under this approach, the final assessment of a game would be a set of marks across the set of evaluation attributes rather than a single number representing the reviewer's overall impression. This focuses the assessment on a game's characteristics and limits the influence of the reviewer's preference for some characteristics over others.
A categorical scoring system is easier to present and manage than a set of Likert-type scales, but its credibility and usefulness depends entirely on the underlying rubric that defines category assignment. Such a rubric must be well-written, easily understood, and clearly explain the criteria for inclusion in each category. Although it may be deployed on its own, a categorical rating system is well-served by supplementary text expressing the reviewer's reasons for placing a game in a specific category. It's one thing to assign a game to the Very Good category, it's quite another thing to assign it to the Very Good category with the caveat: "Although this game meets most of the criteria for Excellent, I find that I can only give it a Very Good due to its cumbersome combat resolution and overly complicated post-combat updating requirements. If these factors do not bother you, then you may find this game to be Excellent."
The simplest evaluation approach is to give a game some number of stars out of a maximum number to represent its relative merit. In and of itself, this type of assessment method is fraught with bias, obscurity, and ambiguity. However, five easy modifications can transform a simple star system into a robust rating method. First, use 5 stars. This provides a sufficient level of discrimination on either side of a 3-star middle score. Second, provide a readily accessible, clear, and well-defined rubric to differentiate between the stars. Third, make the 3-star middle score the reference point for all assessments: the reviewer may add or deduct stars as appropriate given the rubric, but their assessment will always begin at 3 stars. Fourth, use two colors of stars, say blue and red (or gray and black if color is unavailable), to represent the reviewer's preferences: if the game is of a type they typically like and play, use blue stars; if the game is of a type that they typically dislike or don't play, use red stars. Last, use the star's interior to illustrate the reviewer's dedication to their assessment: if they haven't played the game at least three times, use hollow stars; three to five times, use a light fill; more than five times, use a dark fill. Taken together, these five modifications to a star rating should enable the reader to better interpret and appreciate the rating provided. A rating of 3 solid red stars would mean a lot more to me than a rating of 4 hollow blue stars.
To paraphrase Tom Lehrer, a rating system is like a sewer: what you get out of it depends on what you put into it. For some, it's all about numbers. Numbers in and numbers out. For me, I'd like something a little more meaningful.
I generally glance at the rating number when looking at anything online. If it is extremely low, I pass over it without digging in. If it is extremely high, I generally read the few negative reviews (but not too many of the positive ones). If it is mixed, I like to read a selection of reviews, but still mostly the negative ones. I find that negative reviews often offer a more direct look at what I am interested in learning about the product. They generally are shorter and get directly to their point. Many times, the negatives for the reviewer are actually positives for me.
As an English teacher, I have to create rubrics for class, and the goal is to make them objective, so that anyone using the rubric and looking at the essay will arrive at the same score. This is not easy, but it is a good exercise for thinking clearly about the essential parts of what you want to evaluate. I like to design games, and writing rubrics has helped me with writing rulebooks.
On occasion, a written review isn't worth very much.
The biggest issue I have with any game review is that I have significant doubts regarding whether the reviewer has actually played the game. I'm not a big components guy, and as often as not a reviewer's primary observation is how it looks on the table. The minute I see this the shields go up, and the writer has to work hard to convince me they know what they're talking about. I can review a game's components without actually having a copy. So can everyone else, so it's more or less worthless content.
The long form reviews here on ThereWillBe.Games are getting the job done. There's enough content regarding the inner workings, the interesting conflictions and the shortcomings, to keep me invested. But that's not something you'll find on other sites where anyone can punch out a review in half an hour and typically do, because quantity is more valuable than quality.
S.
Regardless of our desire for objectivity
Who's "our"? You got a mouse in your pocket, Jim?
This recent shrieking about the need for objectivity in reviews is ridiculous. Reviews are not objective. They cannot be objective. They should not be objective.
I don't want a dry scientific analysis of a game. I can read rules and look at photos myself. I want to know if it's fun to play. Is that subjective? Hell yeah it is. So the reviewer's job is to get that across.
And that's hard. It's hard to adequately describe the feeling of playing a game, of encountering, through play, what works, what doesn't, what should but doesn't, and -- most fun -- what shouldn't but does. Not a lot of people can do that -- I certainly can't -- but because of the current age, everyone is free to give it a whack. And instead of saying, "yeah, most people aren't good at that and we should really sift through and find those that are" we have instead embraced this dumb idea that the problem is that reviews aren't "objective" enough, that the actual review potion should be a dispassionate evaluation of measurable and differentiable(?) categories.
The punchline to this is that it doesn't matter, because not only are we reviewing absolutely unnecessary nonsense, we're doing it for an audience who, 99 times out of 100, simply wants a "review" to justify a decision they've already made. "I want to buy this game because it's pretty. Oh, BoardGameHobo said it's good! That settles it, in the basket it goes!" Or "I want to buy this game because it's pretty. Oh, BoardGameHobo said it's bad! Well, he doesn't know what I like, so in the basket it goes!"
A single review almost always has little value on its own. You'll need either more reviews from that person so you can establish how your tastes line up with theirs or you'll need an aggregate of reviews on the item to compare across the board.
As always, my personal view on the issue is that fewer people need to be writing "reviews" in the first place. Don't add to the noise by throwing your fifteen paragraphs of rules with "I liked it and it's worth of space on your shelf" summary at the bottom in to muddy up the waters. Nobody cares. Just give it your strong 7 and be done with it. Leave the talk to people who know what they're doing.
I agree. Wholeheartedly.Sagrilarus wrote: The long form reviews here on ThereWillBe.Games are getting the job done. There's enough content regarding the inner workings, the interesting conflictions and the shortcomings, to keep me invested. But that's not something you'll find on other sites where anyone can punch out a review in half an hour and typically do, because quantity is more valuable than quality.
Overall, though, I agree with Legomancer on a philosophical level. Concern with a non-existent "objectivity" is often a Gamergate thing.
It's now ubiquitous and inescapable. If you have reviews and want them indexed correctly by Google, you have to include a rating. If you don't want to custom code scale metadata, you use a 5 star rating.
I’ve got a guy rating STF as a 2 because he doesn’t like dexterity games, and he uses his ratings as a reminder to himself.
Until you can get everyone in the world on the same page (read: never) all of this is interesting banter, but banter nonetheless
From there, most games are a "geekbuddy analysis" away from my own evaluation of whether I'm really interested in playing or not.
It's essentially the same as aggregating your favorite reviewers but it also distills the written material down to what's immediately important or relevant to the reviewer.
As for aggregate ratings... those provide nearly zero information for me.
I like people being able to tell, at a glance, what I think of a game. Some critics dislike this because it sends an anchor in the minds of the some readers. You know, the kind that will think "Oh, he's wrong" before he has read a single word. Well, I don't think these people would like me very much anyways and I think the people that would, would benefit greatly from knowing, in no unequivocal terms, what my opinion is on a game is.
Take, for example, my review of This War of Mine. If I simply listed it on my site, most people would ignore it, since there's no shortage of info on the topic and it's a fairly old game. Most would assume my views on the topic fit the game and would not investigate further. But if I add the score to it, people will know my opinion is not the same as everyone else and may read them.
And so far, nobody, absolutely nobody has taken issue with it after reading it. And those who would have, well, I don't think they are the kind of audience I want to have.
This is important to me because I tend to write about obscure games and feel very passionate about them. I want people to wonder, hey, how come this game I've never heard about is rated so highly? What's so special about this "Netrunner" thing, to deserve such a high rating? What are all those games that start with "18" and that get so high ratings? That kind of stuff. It's just useful.
Because, well. Most people don't know. Most people only know about the games they see in stores and the look at games in the only way they have been told it's possible. And I want to change that, even if it's just a little bit.
SuperflyTNT wrote: superflycircus.com/2012/07/petes-persona...phy-on-game-ratings/
I forgot I wrote this.
Well I'm glad I didn't get any games based just on your scores as if I saw a 7.5 - 8 rating (from someone who has similar gaming tastes) that would be a definite buy if I saw it for a reasonable price
As with most sensible people I listen to advice from people with similar interests, thoughts or outlooks so for a feeling about a game I have seen or become interested in it is the ratings and comments from 'geekbuddies' list (that many of you here are on) that will put me, or not, into 'I may buy this' mode. There will probably be a bit of reading of comments only of people on that game database site just to see if any red flags come up and maybe a skim of the rules to get a gist of the structure and mechanics but this will usually only be important if few of my geekbuddies had played it.
And I do get a good laugh out of people/reviewers who get a bit snotty when someone disagrees with their 'opinion', either insecurity or they just haven't got it that everyone is different in this world - and that's what makes it so interesting.
For me, a review's value is generally based on the reviewer. I need to read / see several reviews by that person, hopefully juxtaposing my own experiences with those games. Only then can I reasonably assume that if reviewer X likes a game, I will also like that game.
Southernman wrote:
SuperflyTNT wrote: superflycircus.com/2012/07/petes-persona...phy-on-game-ratings/
I forgot I wrote this.
Well I'm glad I didn't get any games based just on your scores as if I saw a 7.5 - 8 rating (from someone who has similar gaming tastes) that would be a definite buy if I saw it for a reasonable price
To be fair, the Circus always had clear metrics:
superflycircus.com/index-of-articles-and-attractions/
0.00 – 1.00: Games That Should Come With Suicide Kits
1.00 – 2.75: Failures, In A Great Many Ways
2.75 – 4.00: Sailing The Sea Of Mediocrity
4 .00- 4.50: Games That Everyone Should Play, At Least Once
4.50 – 5.00: The Most Fun You Can Have Without Sex Or Drugs
The first is the quick rating. When I'm too lazy to read an article, I just see the rating. This is especially useful either when I do research across whole lot of reviewers. For this one, fewer items is better. One number is excellent.
The second, is when I need to see what exactly a game is, without reading the review. Never seen this because I don't think this is viable. Most of the time a paragraph does a better job at this. But if you like to play with numbers, a paragraph won't do. Sometimes I just like using numbers. The thing is, the more I think about it, the worse it gets. Once I think about a rating system (for fun) to value my collection, I ended up with an archaic system that can be very abstract and imbalanced. For example last time I think about it (again, for fun), I ended up with something like: Freedom (is the player railroaded?), social (the game relies more on the player?), interaction (can you screw/help people?), interface (are you playing the game, or the interface?) and other useless nonsense, while I was trying to make sense out of it.
And I do think that games are art, and that even the bad ones have something interesting to say about people and the way we think. But the joy and beauty (and irritation and frustration) of art is that it is subjective at its core. Every time one of us comes into contact with any kind of art, we create a unique conversation, engaging with what the art is presenting and bringing into dialogue with it all our experiences, ideas, influences, and biases. So while we might be talking about the same game/painting/novel/film, we are probably having very different experiences. This is great! Art helps us bridge the gaps between each other and gives us common ground from which we can explore ideas and experiences. It's also tough, because it means we'll never be on entirely the same page about the thing in question. In any case, the nature of this conversation makes any kind of "objective" critique not just irrelevant, but impossible, especially in a hobby that is focused not simply on one person's experience, but the unique alchemy that happens when you play a game with a bunch of different human beings at the same time.
I think the ideal kind of criticism is based on conveying personal experience and exploring the ideas and mechanics that drive a game (as well as the ideas and feelings these mechanics produce). Every time we play a game we briefly step into a parallel world, one in which the normal processes of our perception and problem-solving are altered and directed in new and interesting ways. How does this feel? How does this change how we think? What kind of pleasure, excitement, boredom, or frustration does this system create? There's a wealth of fascinating shit to explore in each game we play, and most reviews barely scratch the surface in their rush to tell you whether this box is worth 60 bucks or not.
I don't wanna talk too much shit about "buyer's guide" type reviews, though, because even if they're mostly uninteresting pieces of writing, they're still kind of necessary in the game world, I think? Because unlike movies or books or video games, it's not like there's a wealth of information out there, and the high cost of a game means that having limited resources is a very real problem. A person with a decent income can probably see most of the "best movies" that populate year-end lists without bankrupting themselves. Even a dedicated gamer can only play a fraction of what's released, and folks are rightfully interested in making sure that fraction is as fun or interesting as possible.
So what to do? I tend to compromise when I write reviews, balancing my own associative and experiential thinking with more "hard" details about a game that might prove useful to folks wondering if it's a thing worthy of their time. Star reviews are another compromise--they seem to be essential to the review's discoverability, so I'll put a rating on there (usually a torturous process) in hopes that it will be the first word of a conversation and not the last one. I think games criticism is still in its infancy, and I'm interested in the conventions, style, and function it develops as it matures.
*Note: I'm an insufferable English teacher who writes about games for this site (and therefore, thinks that Writing About Games is Important), so this screed is written through the lens of a navel-gazing humanitarian who could probably use the rigor of the hard sciences. Also, it's worth noting that my first widely published piece of writing a decade-plus ago was explicitly about how games are not art so, you know, I reserve the right to change my ideas.