Articles

Jim Felli - On Ratings

X xthexlo Updated November 29, 2019

There Will Be Games

What does rating games really mean?

I have a problem with game ratings. Not the concept: as long as people choose to partition games into various levels of "goodness" or "badness" there will be a demand for scoring systems. As a player, I appreciate it; as a designer, I accept it; as a scientist, I wish we were better at it. And we can be. It's just hard. I'll grind my scientific axe first, then propose a few rating systems that I think would prove more useful than those we typically find today. Slapping down a rating scale open to all comers without controls and absent support leaves the entire system vulnerable to a variety of ill influences. Beyond conditionality and motivational biases, which are always present, numerical rating systems typically fall prey to issues arising from value nonlinearity and inconsistent criteria reduction.

Let's start with conditionality bias. Every assessment we make is conditioned on the public and private information we have available at the time. Suppose you're asked about the best place to park downtown. You'll base your assessment on public knowledge (e.g., local lots, garages, side streets) and private information (e.g., garage X has a broken gate, lot Y fills up by 10 AM, side street Z has alternate parking). Some of your private information will include your personal preferences (e.g., to park above ground rather than underground). Unless we take the time to specify and share our private information, it is unreasonable to think that our assessment of the best place to park will be the same.

Even given common information, the rating scales themselves can present problems. Two that almost always plague numerical scales are a lack of common reference points and a presumption of value linearity. With regard to reference points, raters with different "starting points" are unlikely to express the same rating with the same number. If you and I both feel the same way about a game and use the same well-defined rubric to assign a numerical score, we stand a good chance of picking the same number. However, if I make my ratings starting at 5 and adjust away from 5 based on things I like or dislike, and you start at 10 and deduct points for things you dislike, it's highly unlikely that we will agree on what a 7 means. My 7 will be driven by accolades; yours, by indictments. The assumption of value linearity compounds this problem. When presented with a numerical scale, people typically presume that the incremental value derived from a rating increase from X to X+1 is the same regardless of the value of X. That is, the added value expressed in going from a rating of 1 to 2 is the same as that expressed in going from a rating of 6 to 7. This rarely holds true. On a ten point scale, there is seldom any real difference between ratings of 1 or 2 or between ratings of 9 or 10; however, going from a rating of 7 to 8 can present a challenging hurdle.

The greatest problem facing numerical scales is the distillation of multiple assessments into a single number. Rating a game ultimately comes down to assessing the game's merit across multiple criteria. We can measure a game's table footprint using square centimeters, its physical weight in kilograms, and its play duration in hours. But how should we characterize its difficulty, complexity, or quality? How do we take into account the milling of wooden components, the number of decisions per turn, the number of alternatives per decision, the manner in which uncertainties are resolved, the durability of the cards, and so forth? Integrating several important attributes into a single score requires careful weighting and scaling which, by nature, yields wide variability across assessors.

For example, suppose you and I agree to assess a game based only on its "thematic value" and "playability," and that both of these attributes will be scored on a mutually accepted linear scale from 1 to 10. Suppose further that we agree that the game scores a 4 on theme and an 8 on playability. On a straight average, the game would score a 6. This presumes a 50/50 weighting on theme and playability. Suppose, however, that I value theme much more than playability and you value playability much more than theme. In fact, let's just say that our weights are 80/20 and 30/70, respectively. My rating would be 4.8 and yours would be 6.8. That's a huge difference. Even if we had a well-designed numerical rating system, common information, common reference points, common marginal value functions, and common weights across a set of common criteria, we'd still have problems. We would also have to share common preferences.

Regardless of our desire for objectivity, we cannot discount the influence of our personal preferences and the private motivations they drive. If I hate war games, it is unlikely that I will give a war game as high a rating as someone else who loves them. This may or may not be the consequence of conscious choices. In some cases, my distaste may be expressed in a low score for a specific attribute (e.g., game complexity); in other cases, my dislike may be manifest in an explicit penalty simply because "I don't like these kinds of games." Unless people take time to leave detailed comments when they drop a numerical rating, we really don't know what underlies their assessment. It could be a great game that they didn't like because they are sick and tired of zombies, or a terrible game that they loved because they adore all things Cthulhu.

So, what's the solution? How are we to talk about average scores and variances and skewness and modes and medians and all sorts of cool statistical measures without a numerical scale? Simple: we don't. Without clear and appropriate context, ratings are at best worthless. Even so, the desire behind their use is valid: people want to know how a game stacks up before they commit to playing or buying it. In descending order, my preferences for satisfying this need are: written reviews, Likert-type scales, categorical scoring, and augmented stars.

For depth and completeness with solid context, it's hard to beat a well-crafted written review. I find them to be more thoughtful and articulate than audio or video reviews, especially when it's never clear to me whether the person I'm listening to or watching has actually played the game in question or is just recycling words lifted from other reviewers or the back of the box. Whether or not I agree with their assessment, a reviewer's familiarity with and mastery of a game is readily apparent in a written review, and that alone makes it more valuable to me. Unfortunately, the time, talent, and dedication required to produce high quality written reviews precludes their timely availability in the current market environment.

In the absence of a written review, I recommend a set of Likert-type scales for differentiating games. Examples of these types of scales include pain scales where you are asked to choose the face that best describes your level of pain, or statements for which you are asked to put a check mark on a line to represent your degree of agreement between endpoints like "strongly agree" and "strongly disagree." For appraising a game, a set of scales might include attributes such as duration of play, depth of theme, richness of decision space, etc. For each attribute, the reviewer would make a mark on the scale to denote where they believe the game falls between the worst and best rating for that attribute. Under this approach, the final assessment of a game would be a set of marks across the set of evaluation attributes rather than a single number representing the reviewer's overall impression. This focuses the assessment on a game's characteristics and limits the influence of the reviewer's preference for some characteristics over others.

A categorical scoring system is easier to present and manage than a set of Likert-type scales, but its credibility and usefulness depends entirely on the underlying rubric that defines category assignment. Such a rubric must be well-written, easily understood, and clearly explain the criteria for inclusion in each category. Although it may be deployed on its own, a categorical rating system is well-served by supplementary text expressing the reviewer's reasons for placing a game in a specific category. It's one thing to assign a game to the Very Good category, it's quite another thing to assign it to the Very Good category with the caveat: "Although this game meets most of the criteria for Excellent, I find that I can only give it a Very Good due to its cumbersome combat resolution and overly complicated post-combat updating requirements. If these factors do not bother you, then you may find this game to be Excellent."

The simplest evaluation approach is to give a game some number of stars out of a maximum number to represent its relative merit. In and of itself, this type of assessment method is fraught with bias, obscurity, and ambiguity. However, five easy modifications can transform a simple star system into a robust rating method. First, use 5 stars. This provides a sufficient level of discrimination on either side of a 3-star middle score. Second, provide a readily accessible, clear, and well-defined rubric to differentiate between the stars. Third, make the 3-star middle score the reference point for all assessments: the reviewer may add or deduct stars as appropriate given the rubric, but their assessment will always begin at 3 stars. Fourth, use two colors of stars, say blue and red (or gray and black if color is unavailable), to represent the reviewer's preferences: if the game is of a type they typically like and play, use blue stars; if the game is of a type that they typically dislike or don't play, use red stars. Last, use the star's interior to illustrate the reviewer's dedication to their assessment: if they haven't played the game at least three times, use hollow stars; three to five times, use a light fill; more than five times, use a dark fill. Taken together, these five modifications to a star rating should enable the reader to better interpret and appreciate the rating provided. A rating of 3 solid red stars would mean a lot more to me than a rating of 4 hollow blue stars.

To paraphrase Tom Lehrer, a rating system is like a sewer: what you get out of it depends on what you put into it. For some, it's all about numbers. Numbers in and numbers out. For me, I'd like something a little more meaningful.

Posts in discussion: Jim Felli - On Ratings

Whoshim replied the topic: #278401 26 Jul 2018 07:21

This article is a solid 4.

I generally glance at the rating number when looking at anything online. If it is extremely low, I pass over it without digging in. If it is extremely high, I generally read the few negative reviews (but not too many of the positive ones). If it is mixed, I like to read a selection of reviews, but still mostly the negative ones. I find that negative reviews often offer a more direct look at what I am interested in learning about the product. They generally are shorter and get directly to their point. Many times, the negatives for the reviewer are actually positives for me.

As an English teacher, I have to create rubrics for class, and the goal is to make them objective, so that anyone using the rubric and looking at the essay will arrive at the same score. This is not easy, but it is a good exercise for thinking clearly about the essential parts of what you want to evaluate. I like to design games, and writing rubrics has helped me with writing rulebooks.

Sagrilarus replied the topic: #278408 26 Jul 2018 08:53

I like this article. It's really juicy, not an ounce of formality, good for everyday reading. Bittersweet components, cherries and plums, a touch of tobacco, touch of chocolate, but all very juicy and mild tannins. Hints of lemon peel and green apple, and a sweet and sour streak going through it. Lavender notes to the nose. A lot more complexity than I was expecting.

On occasion, a written review isn't worth very much.

The biggest issue I have with any game review is that I have significant doubts regarding whether the reviewer has actually played the game. I'm not a big components guy, and as often as not a reviewer's primary observation is how it looks on the table. The minute I see this the shields go up, and the writer has to work hard to convince me they know what they're talking about. I can review a game's components without actually having a copy. So can everyone else, so it's more or less worthless content.

The long form reviews here on ThereWillBe.Games are getting the job done. There's enough content regarding the inner workings, the interesting conflictions and the shortcomings, to keep me invested. But that's not something you'll find on other sites where anyone can punch out a review in half an hour and typically do, because quantity is more valuable than quality.

S.

Legomancer replied the topic: #278410 26 Jul 2018 09:06

Regardless of our desire for objectivity

Who's "our"? You got a mouse in your pocket, Jim?

This recent shrieking about the need for objectivity in reviews is ridiculous. Reviews are not objective. They cannot be objective. They should not be objective.

I don't want a dry scientific analysis of a game. I can read rules and look at photos myself. I want to know if it's fun to play. Is that subjective? Hell yeah it is. So the reviewer's job is to get that across.

And that's hard. It's hard to adequately describe the feeling of playing a game, of encountering, through play, what works, what doesn't, what should but doesn't, and -- most fun -- what shouldn't but does. Not a lot of people can do that -- I certainly can't -- but because of the current age, everyone is free to give it a whack. And instead of saying, "yeah, most people aren't good at that and we should really sift through and find those that are" we have instead embraced this dumb idea that the problem is that reviews aren't "objective" enough, that the actual review potion should be a dispassionate evaluation of measurable and differentiable(?) categories.

The punchline to this is that it doesn't matter, because not only are we reviewing absolutely unnecessary nonsense, we're doing it for an audience who, 99 times out of 100, simply wants a "review" to justify a decision they've already made. "I want to buy this game because it's pretty. Oh, BoardGameHobo said it's good! That settles it, in the basket it goes!" Or "I want to buy this game because it's pretty. Oh, BoardGameHobo said it's bad! Well, he doesn't know what I like, so in the basket it goes!"

A single review almost always has little value on its own. You'll need either more reviews from that person so you can establish how your tastes line up with theirs or you'll need an aggregate of reviews on the item to compare across the board.

As always, my personal view on the issue is that fewer people need to be writing "reviews" in the first place. Don't add to the noise by throwing your fifteen paragraphs of rules with "I liked it and it's worth of space on your shelf" summary at the bottom in to muddy up the waters. Nobody cares. Just give it your strong 7 and be done with it. Leave the talk to people who know what they're doing.

xthexlo replied the topic: #278411 26 Jul 2018 09:07

Sagrilarus wrote: The long form reviews here on ThereWillBe.Games are getting the job done. There's enough content regarding the inner workings, the interesting conflictions and the shortcomings, to keep me invested. But that's not something you'll find on other sites where anyone can punch out a review in half an hour and typically do, because quantity is more valuable than quality.

I agree. Wholeheartedly.

Shellhead replied the topic: #278414 26 Jul 2018 09:23

For all practical purposes, we are talking about BGG here. Nobody else has a comparable database of reviews and ratings of games. I don't care about their overall ratings or rankings anymore, because of the past extreme bias in favor of euros that continues to corrupt newcomers. However, I do look at individual reviews and ratings that have comments, because those can provide useful insight. I usually ignore ratings of 10, even though I have given out some 10s, because those comments will probably be enthusiastic ravings instead of useful descriptions. Instead, I like to at ratings in the 5 to 8 range, where they will likely acknowledge both good and bad points of the game. As Whoshim mentioned above, the negatives for another player might be positives for me. Sometimes I will read the comment threads for the reviews, in case the reviewer got an important rule wrong in a way that undermines the review. Another lingering bias at BGG is the hostility towards negative reviews, so I try to factor that in when I am reading the more neutral reviews. A neutral review might contain some useful criticism of the game, though muted to avoid offending BGGers.

Gary Sax replied the topic: #278416 26 Jul 2018 09:47

A lot of these issues can be handled with statistical models of expert and crowd rating run on large datasets, fwiw.

Overall, though, I agree with Legomancer on a philosophical level. Concern with a non-existent "objectivity" is often a Gamergate thing.

ubarose replied the topic: #278419 26 Jul 2018 10:31

The need to rate everything is driven by internet and search engine technology. The big online retailers (i.e. Amazon) and review sites use star ratings therefore Google distinguishes between a review of something and a product listing/blog/article/news item by whether or not the item being indexed has rating data on it or not. If you don't include scale metadata, Google assumes a 5 star scale. Therefore most of the blogging and CMS include 5 star rating tools.

It's now ubiquitous and inescapable. If you have reviews and want them indexed correctly by Google, you have to include a rating. If you don't want to custom code scale metadata, you use a 5 star rating.

Shellhead replied the topic: #278422 26 Jul 2018 11:26

BGG has so many users now that individual quirks in rating methods should be smoothed out by the sheer volume of ratings. And they do have a useful metric for selecting a rating, based on willingness to play a given game again. But the site was gripped by euro bias around the time this hobby picked up momentum. Ratings, reviews, and regulars in the forums all conspired to instruct newcomers in the superiority of euros. There are some other distorting effects, like kickstarters getting 10s a year before they are finished, or people punishing Games Workshop games with 1s, just because GW aggressively cracks down on fan content that potentially infringes on their intellectual properties.

fightcitymayor replied the topic: #278424 26 Jul 2018 11:41

I just need to add: Those GamePro ratings faces were straight fire back in the day.

Cranberries replied the topic: #278425 26 Jul 2018 11:52

If a game gets good ratings from a wide swathe of my Geekbuddies, then it's usually a safe pick, assuming I have a group to play it (which I never, ever do).

SuperflyPete replied the topic: #278428 26 Jul 2018 13:02

Well, some people rate things they’ve never played, or rate things based on funky criteria.

I’ve got a guy rating STF as a 2 because he doesn’t like dexterity games, and he uses his ratings as a reminder to himself.

Until you can get everyone in the world on the same page (read: never) all of this is interesting banter, but banter nonetheless

SuperflyPete replied the topic: #278429 26 Jul 2018 13:04

www.boardgamegeek.com/user/JasonSaastad

Check this out.

Frohike replied the topic: #278432 26 Jul 2018 13:14

My process: I sift through conversations on BGG or here and find/stumble across people who are well spoken, independent thinkers. Then I browse their profile and check their ratings and comments on BGG. If they compose informative and frank comments with their ratings, they get a Geekbuddy add. Comments are gold for me because most users feel like they have an audience there but don't need to put up with the 90% bullshit groupthink responses that "reviews" on BGG often elicit (especially if the review takes a darling game down a few pegs into the pretty-good-to-mediocre range). I like the candor of these comments, at least those composed by the people I follow. They're uncut by deference or some pretense of objectivity or even a systematic approach to evaluating a game.

From there, most games are a "geekbuddy analysis" away from my own evaluation of whether I'm really interested in playing or not.

It's essentially the same as aggregating your favorite reviewers but it also distills the written material down to what's immediately important or relevant to the reviewer.

As for aggregate ratings... those provide nearly zero information for me.

Gary Sax replied the topic: #278433 26 Jul 2018 13:21

100% the main honest portion of BGG is comments. Good point. Not getting trolled endlessly with "prove this!" from fan like a review means those comments are genuine.

WadeMonnig replied the topic: #278449 26 Jul 2018 16:16

One thing not addressed in this article: The tendency of someone reading any rating and immediately converting it to a school based grade system in their head. Anything that rates less than a "70" when coveted is a F

SuperflyPete replied the topic: #278456 26 Jul 2018 17:04

superflycircus.com/2012/07/petes-persona...phy-on-game-ratings/

I forgot I wrote this.

Erik Twice replied the topic: #278467 26 Jul 2018 17:58

I actually like reviews with a score attached. And I actually like them for some of the reasons most critics hate them.

I like people being able to tell, at a glance, what I think of a game. Some critics dislike this because it sends an anchor in the minds of the some readers. You know, the kind that will think "Oh, he's wrong" before he has read a single word. Well, I don't think these people would like me very much anyways and I think the people that would, would benefit greatly from knowing, in no unequivocal terms, what my opinion is on a game is.

Take, for example, my review of This War of Mine. If I simply listed it on my site, most people would ignore it, since there's no shortage of info on the topic and it's a fairly old game. Most would assume my views on the topic fit the game and would not investigate further. But if I add the score to it, people will know my opinion is not the same as everyone else and may read them.

And so far, nobody, absolutely nobody has taken issue with it after reading it. And those who would have, well, I don't think they are the kind of audience I want to have.

This is important to me because I tend to write about obscure games and feel very passionate about them. I want people to wonder, hey, how come this game I've never heard about is rated so highly? What's so special about this "Netrunner" thing, to deserve such a high rating? What are all those games that start with "18" and that get so high ratings? That kind of stuff. It's just useful.

Because, well. Most people don't know. Most people only know about the games they see in stores and the look at games in the only way they have been told it's possible. And I want to change that, even if it's just a little bit.

southernman replied the topic: #278487 27 Jul 2018 07:13

SuperflyTNT wrote: superflycircus.com/2012/07/petes-persona...phy-on-game-ratings/

I forgot I wrote this.

Well I'm glad I didn't get any games based just on your scores as if I saw a 7.5 - 8 rating (from someone who has similar gaming tastes) that would be a definite buy if I saw it for a reasonable price

southernman replied the topic: #278488 27 Jul 2018 07:32

If anyone treat people's game ratings as anything other than purely subjective (just like music and film and art and ...) then they are in for a lot of disappointment and hurt when they start spending their disposable income. In one of my gaming groups most of them are dedicated euro-gamers and I would not even bother looking at their rating for games I was interested in (and I have the empirical evidence of not being able to get probably 90% of my games played with them, and the few I do get a mixed reception).

As with most sensible people I listen to advice from people with similar interests, thoughts or outlooks so for a feeling about a game I have seen or become interested in it is the ratings and comments from 'geekbuddies' list (that many of you here are on) that will put me, or not, into 'I may buy this' mode. There will probably be a bit of reading of comments only of people on that game database site just to see if any red flags come up and maybe a skim of the rules to get a gist of the structure and mechanics but this will usually only be important if few of my geekbuddies had played it.

And I do get a good laugh out of people/reviewers who get a bit snotty when someone disagrees with their 'opinion', either insecurity or they just haven't got it that everyone is different in this world - and that's what makes it so interesting.

hotseatgames replied the topic: #278501 27 Jul 2018 09:45

Great article, as always!

For me, a review's value is generally based on the reviewer. I need to read / see several reviews by that person, hopefully juxtaposing my own experiences with those games. Only then can I reasonably assume that if reviewer X likes a game, I will also like that game.

SuperflyPete replied the topic: #278503 27 Jul 2018 09:52

Southernman wrote:
SuperflyTNT wrote: superflycircus.com/2012/07/petes-persona...phy-on-game-ratings/

I forgot I wrote this.

Well I'm glad I didn't get any games based just on your scores as if I saw a 7.5 - 8 rating (from someone who has similar gaming tastes) that would be a definite buy if I saw it for a reasonable price

To be fair, the Circus always had clear metrics:
superflycircus.com/index-of-articles-and-attractions/

0.00 – 1.00: Games That Should Come With Suicide Kits

1.00 – 2.75: Failures, In A Great Many Ways

2.75 – 4.00: Sailing The Sea Of Mediocrity

4 .00- 4.50: Games That Everyone Should Play, At Least Once

4.50 – 5.00: The Most Fun You Can Have Without Sex Or Drugs

Sevej replied the topic: #278589 27 Jul 2018 19:13

For me there are two kind of ratings.

The first is the quick rating. When I'm too lazy to read an article, I just see the rating. This is especially useful either when I do research across whole lot of reviewers. For this one, fewer items is better. One number is excellent.

The second, is when I need to see what exactly a game is, without reading the review. Never seen this because I don't think this is viable. Most of the time a paragraph does a better job at this. But if you like to play with numbers, a paragraph won't do. Sometimes I just like using numbers. The thing is, the more I think about it, the worse it gets. Once I think about a rating system (for fun) to value my collection, I ended up with an archaic system that can be very abstract and imbalanced. For example last time I think about it (again, for fun), I ended up with something like: Freedom (is the player railroaded?), social (the game relies more on the player?), interaction (can you screw/help people?), interface (are you playing the game, or the interface?) and other useless nonsense, while I was trying to make sense out of it.

BaronDonut replied the topic: #278706 29 Jul 2018 23:34

This was a fascinating article, and I've really dug the conversation happening here. I wanted to jump in and offer my two cents.* Before we can really evaluate the efficacy of a particular rating system, I think it's important to establish what exactly reviews are for. The least interesting reviews, to me, are reviews that are primarily concerned with whether or not a game is worth your money. I also think these are like 90% of the game reviews out there, and the ratings system that are developed in pursuit of this goal tend to exist mostly to rank, prioritize, and evaluate games in some sort of objective hierarchy as a necessary means of making "smart purchases." As other folks have mentioned, this can only lead to disappointment, because it's fundamentally just not how human beings interact with art.

And I do think that games are art, and that even the bad ones have something interesting to say about people and the way we think. But the joy and beauty (and irritation and frustration) of art is that it is subjective at its core. Every time one of us comes into contact with any kind of art, we create a unique conversation, engaging with what the art is presenting and bringing into dialogue with it all our experiences, ideas, influences, and biases. So while we might be talking about the same game/painting/novel/film, we are probably having very different experiences. This is great! Art helps us bridge the gaps between each other and gives us common ground from which we can explore ideas and experiences. It's also tough, because it means we'll never be on entirely the same page about the thing in question. In any case, the nature of this conversation makes any kind of "objective" critique not just irrelevant, but impossible, especially in a hobby that is focused not simply on one person's experience, but the unique alchemy that happens when you play a game with a bunch of different human beings at the same time.

I think the ideal kind of criticism is based on conveying personal experience and exploring the ideas and mechanics that drive a game (as well as the ideas and feelings these mechanics produce). Every time we play a game we briefly step into a parallel world, one in which the normal processes of our perception and problem-solving are altered and directed in new and interesting ways. How does this feel? How does this change how we think? What kind of pleasure, excitement, boredom, or frustration does this system create? There's a wealth of fascinating shit to explore in each game we play, and most reviews barely scratch the surface in their rush to tell you whether this box is worth 60 bucks or not.

I don't wanna talk too much shit about "buyer's guide" type reviews, though, because even if they're mostly uninteresting pieces of writing, they're still kind of necessary in the game world, I think? Because unlike movies or books or video games, it's not like there's a wealth of information out there, and the high cost of a game means that having limited resources is a very real problem. A person with a decent income can probably see most of the "best movies" that populate year-end lists without bankrupting themselves. Even a dedicated gamer can only play a fraction of what's released, and folks are rightfully interested in making sure that fraction is as fun or interesting as possible.

So what to do? I tend to compromise when I write reviews, balancing my own associative and experiential thinking with more "hard" details about a game that might prove useful to folks wondering if it's a thing worthy of their time. Star reviews are another compromise--they seem to be essential to the review's discoverability, so I'll put a rating on there (usually a torturous process) in hopes that it will be the first word of a conversation and not the last one. I think games criticism is still in its infancy, and I'm interested in the conventions, style, and function it develops as it matures.

*Note: I'm an insufferable English teacher who writes about games for this site (and therefore, thinks that Writing About Games is Important), so this screed is written through the lens of a navel-gazing humanitarian who could probably use the rigor of the hard sciences. Also, it's worth noting that my first widely published piece of writing a decade-plus ago was explicitly about how games are not art so, you know, I reserve the right to change my ideas.