That's Not How Playtesting Works

blaisdell105
Aug 15, 2019
9 min read

I feel like I've had to break out the phrase "That's not how playtesting works" a little too frequently, so let's take a brief (ha) look at how it actually does work, or at the very least, one of my experiences with it. For me, I love being in playtests. It’s like a free game design class and like most classes, it works better if you pay attention. For this little dive, I'm going to look at Reference a wave 2 playtest for Malifaux second edition for a few reasons:

1: It was an open beta, so no NDA's involved

2: I remember parts of it pretty vividly

3: It actually covers a lot of bases

4: Third edition has been released, so I guess semi-relevant?

So, in Wave 2 of the Malifaux open beta for second edition, they had one of my personal favorite models from first edition, Nekima. For the purposes here, all you really need to know about Nekima is that the model was fairly large and impressive by malifaux standards, she had the highest printed point cost in the game at 13pts, and was predominantly a melee beater. She also had the Nephilim subtype which will come up a little later. The first version of Nekima, like the first version of a lot of the models in this wave, was comically overpowered. I’m not a huge fan of design teams that use this practice because it basically means you’re losing the first week of the playtest to learn things you already know i.e. everything is overpowered as hell and there’s not even good enough context with all the other overpowered stuff to start figuring out what’s wrong. It’s not uncommon for devs to start up and work their way down, but this was definitely on the extreme side of the equation.

The second iteration definitely looked tamer, but she had a really odd feature in that her damage was 3/4/8. I won’t waste time on how attacks work in Malifaux, but unless a model has a buff, it’s going to see the first 2 numbers significantly more often than the third (easily over 90%). The thing is, that 8 damage was one of the highest damage numbers in the game, so, in what will surprise zero people who know me and how I play games, my first reflex was “Is there a way to reliably hit the 8?”. It turns out there actually was in a combo of 2 upgrade cards, Obsidian Talons and Pact. To call these 2 upgrades throwaway cards would be pretty charitable as few players even remembered they existed long enough to throw them away, but if you spent a Soulstone, a consumable resource in the game, this combo basically guaranteed that you could always drop a card from your hand for damage, netting you max damage, and combined with some ways to stack your hand with cards, meant basically doing 8 damage with every swing for at least the first 2 turns, maybe 3. 8 damage will kill roughly half of the models in the game in a single attack, almost all of the others in 2 and she had 3 attacks. This combined with one of the masters, Lilith, who could essentially teleport Nekima in such a position that she could attack anything on the table, including models still in their deployment zone, turn 1. Obsidian Talons was specifically keyed to the Nephilim subtype and Lilith was the only Nephilim master, so the likelihood of this combination going unnoticed was pretty low.

Naturally, this was identified and immediately corrected…well no, because that’s not actually how this works. My first step was to raise concern about this combo. Unfortunately, basically zero of the other playtesters were sold it was a problem with counter-arguments ranging from it being resource intensive, too hard to pull off, etc, but the developers were at least interested enough to want to see some real data, so I put my money where my mouth was and played it against some locals who were also doing beta things. Not only was it pretty dumb, but it was very repeatable and an opponent’s knowledge of the trick didn’t stop it from working. The “resource-intensive” argument also got crushed because if you’re tabling your opponent in 3 turns it doesn’t matter how many resources you spent on turns 1-2. The devs took this data to heart especially as I was definitely putting more table time in with the model than anybody else and they removed an element of her weapon that it allowed it to work with Obsidian Talons…and nobody has ever seen or heard from that upgrade again, but I digress, Pact was also basically just an insurance policy, but without Obsidian Talons it had nothing to insure. Problem solved! Another good day in…of course we’re not done yet.

A second problem was rapidly becoming apparent in that she was now an absolute dumpster fire of a model. She just didn’t hit hard enough when she couldn’t land that 8 damage reliably and she wasn’t particularly durable either. After some thought, I came up with a solution that I was absolutely convinced was the right way to go and that solution was to change her damage track to 4/5/6. The big selling point here is that hitting a 4 most of the time with semi-common 5’s improved her damage into the range that she needed to be for her cost and pulling the max damage down to 6 prevented possible future issues with Obsidian Talons. There was also a precedence for the damage track, not just in the game, but in the faction and with the Nephilim keyword in the Mature Nephilim that cost 11pts. The Mature Nephilim could also potentially get the same 3 attacks Nekima could, it just required more setup. So naturally everybody saw the light…yeah, no. My favorite quote from this entire episode occurs here when someone responded to this suggestion, “If they do this, it wont be the Neverborn faction, it will be the Nekima faction with Neverborn allies.” What a reasoned and non-hyperbolic response. Anyways, the devs sided with me and her damage track changed on the next update and she felt mostly done. My only comment was that some very minor defensive tech was probably in order, but the model seemed to be doing what a 13pt model should do in order to be worth it's cost without being overpowered.

Then the final rules came out and I got to open the card pack…wait what? So, during the entire playtest, she was Defense 4 (a little low) and Willpower 6 (high average), but the printed card with the released model had her as Defense 5 (average) and Willpower 7 (high) and it wasn’t a typo. Now, it’s not uncommon for small things to change between the last version the playtesters see and the release version, but normally these are minor point changes, language changes for clarity or maybe rolling a model back to a previous state. I can’t recall another instance where a model just released with stats that it was never had at any point in the playtest. I’m not sure if there was internal playtesting involved or if this was just a WAG (wild *** guess), but knowing Wyrd miniatures, probably the latter.

A lot of this is atypical i.e. you likely will not have as much developer interaction in an open playtest as I did in this instance, but there are a lot of things to unpack that can be applied across other playtests. For example, would anyone have caught the Obsidian Talon + Pact Nekima missile? I was in a unique position in that, I had played Nekima and Lilith a lot in first edition where you'd just launch Nekima at the enemy, so it was a playstyle I was comfortable with, but most players were barely using Lilith and when they were using her, were not flexing her entire kit. There is also another interesting point in that a simple correction to Obsidian Talons to read “Non-Master, non-Henchman” would’ve corrected the whole problem and future proofed the interaction, but Obsidian Talons was a wave 1 card that had already been released and the devs were very reluctant to Errata/rerelease. They could’ve also corrected Lilith’s teleport trick that didn’t require line of sight, but again she was a wave 1 model that had already been released. That ship had sailed, put into port in various tourist traps and returned with tasteless nick knacks and diarrhea. All of the other parts of the combo were, for the purposes of this playtest, set in stone, so the change would have to come on Nekima’s end. There will always be factors in any playtest that you can’t change and, sometimes, even ones that the people above you can’t change especially when you’re talking about models that have already been released or game elements that already have physical models finalized.

Nekima also had a weird interaction with her "Birthright" ability that could technically lead to Nekima being in charge of the enemy crew (and your crew having no leader). This is a fun example because it was the kind of interaction that literally nobody involved in the playtest could come up with a compelling reason why you would do it. The devs essentially responded with, "we’re not sure if it’s broken, but it could create really odd rules interactions and we see no reason to keep it in the game." So some text was added to prevent this interaction.

It’s also fascinating that, despite all of the playtest hours I had on this one, I had 0 hours playing the model with its released stats. This is definitely atypical, but it’s not unusual at all for the version that ends up being released to only spend a short amount of time with the playtesters (typically the last week or 2 are when things are pretty close to done). In that respect, both the playtesters and devs are relying on some of the data from previous versions for their overall conclusions, so results can get skewed especially if something spends a lot of time in a state of flux. This is the most common mistake when people ask “How could playtesting have missed that?” because it assumes that the element in question was not only present for the entire playtest cycle, but also present in the current state it’s in. An assumption which could easily be wrong (an in fact likely is). For example, if you asked me how I missed Willpower 7 on Nekima being broken (it wasn’t, but as an example) I’d just throw up my hands because I never played her with that stat in the playtest.

I also found out quickly that there’s a bit on art to giving feedback and, even if a player can identify a problem, if they aren’t able to easily convey both the problem and the root causes to the developers, it could still end up getting missed or the wrong element could get “fixed”. Contrary to what many will say, you don’t need to have a solution ready to identify a problem, but you do really have to be able to articulate the problem and preferably identify root causes so that someone that may be able to solve it has the information to do so. I didn’t propose the 4/5/6 damage track on a hunch, it looked at models from wave 1 that were at least in the same cost ballpark and identified a feature that could solve an existing problem. To use another example from the Garryth2 CID for Warmachine, Garryth had an issue where his control range was too small for his spells and abilities. The normal way to increase a models control range is to increase their focus stat, but this also reflects spellcasting ability, so the devs were pretty adamant this wasn’t going to change. Instead, I proposed modifying one of his rules that existed nowhere else in the game to increase his control area when he aimed. This maintained the sniper feel of the character and this solved a lot of problems while keeping to the general intent and theme of the designers.

We also see another fun problem in playtesting in that agreeing with other people is hard. I had to get multiple games in to establish “no really, this combo is broken” before I could get any movement on getting it changed. Then later, the biggest obstacle to making Nekima playable was other playtesters being hyperbolic about the power level of a change to her damage track. I’m pretty sure the only reason I won that fight was because I was able to identify the original problem and articulate it relatively quickly and effectively. It's important to remember, there’s a large gradient of power levels, so you’ll get a lot of disagreement on exactly where something is on the scale and that minor disagreement can make the difference between “it’s fine” and “buff/nerf required”.

The playtest process does not represent a straight line from initial design to finished model and lots of the bumps in the way can lead to substantial problems. I hope this has been helpful in at least giving players a glimpse into a process that is poorly understood by most gamers.

That's Not How Playtesting Works

Recent Posts

Comments