Ray Tapio is da Man!

Tater · Apr 23, 2008

fwheel73 said:
I think you are right.... it is "currently" a small return.... but so what? In the design/playtest system it takes too much time to come up with definitely balanced scenarios. The designer makes a choice in the process to stop playtesting and if he is correct..... then 30 or 40 or 50 scenarios into the playing/ROAR accounting of a single scenarios we will know if it is approaching what the designer thought-- balanced.

If it is not balanced.... what are you going to do, call Scenarios Busters?:laugh::laugh: I am just going to go on to the next scenario that I purchased and see how it goes. I'll try not to get too emotionally involved in the winning and loosing and just have some fun. ASL is good for you if you play it.... just like Guinness is good for you... if you drink it.

You are missing the point it seems to me.

I think there may be a lot more players who would try their hand at designing scenarios but they don't for two big reasons.
1) The amount of historical accuracy demanded by publishers.
2) The amount of playtesting demanded by publishers.

As I see it those two criteria are very steep but don't seem to produce any more balanced or fun scenarios than otherwise. Personally I have been working on compiling data for some scenario ideas...they languish because I have neither the time nor energy to invest to the level currently required by any publishers relative to historicity or playtest. I bet I am not alone.

James Taylor · Apr 23, 2008

Bret Hildebran said:
Bottom line - I'm strongly in the camp of balancing for the top players and encouraging the rest of the ASL world to improve their play towards that ideal. Not to mention I don't really think balancing for bad play & ideal play is possible and balancing solely for bad play is just a terrible idea, plus is it really possible to predict all the forms of bad play and balance for that? But really, who do you want telling you your scenario is a 3-Legged-Unbalanced-Barker - Steve Pleva or some newbie playing his 3rd game?

Agreed. Also feel that input from "Pleva-Bendis" playtest may provide designer with 99% of info they need to complete design.

JT

Stardragon99 · Apr 23, 2008

Pitman said:
That is not an assumption. Most designers try to take skill levels into account when evaluating playtests, I think.

It actually is an assumption because you have no way of accurately validating skill levels. I think that you are correct that a designer would intuitively take into account skill levels in their assessement but even a 10-15% difference in skill levels can be overcome by a slight variation in tactics or dice results.

With respect to the required number of playtests, I work in the casino industry and i can tell you that there is no way that statistical dice abnormalities can be accounted for over a small number of playtests. Small sample numbers and their accompanying volatility are the enemy of casinos. Only over very significant sample sizes is the volatility overcome and the sample approach the population statistic - volume is a necessity.

2 Bit Bill · Apr 23, 2008

Stardragon99 said:
It actually is an assumption because you have no way of accurately validating skill levels. I think that you are correct that a designer would intuitively take into account skill levels in their assessement but even a 10-15% difference in skill levels can be overcome by a slight variation in tactics or dice results.

With respect to the required number of playtests, I work in the casino industry and i can tell you that there is no way that statistical dice abnormalities can be accounted for over a small number of playtests. Small sample numbers and their accompanying volatility are the enemy of casinos. Only over very significant sample sizes is the volatility overcome and the sample approach the population statistic - volume is a necessity.

Sometimes playtesters will return to the statistical anomoly and replay it.

Pitman · Apr 23, 2008

Stardragon99 said:
It actually is an assumption because you have no way of accurately validating skill levels. I think that you are correct that a designer would intuitively take into account skill levels in their assessement but even a 10-15% difference in skill levels can be overcome by a slight variation in tactics or dice results.

It usually is possible to validate skill levels to a useful degree. If you are involved in the playtest yourself, you can learn first hand the skill level of your opponent. Similarly, if you witness a playtest between two other people, you can judge the relative skill levels. And if it is a completely blind playtest between people you have played, you have an idea as well. And since even in blind playtests between people you haven't played, it is common for players themselves to describe their relative skill levels in playtest reports, you usually have an idea there, too.

Buck K · Apr 23, 2008

Great posts.

Some thoughts on the last 80 or so posts.

Playtesting is a time commitment, which can make it feel like work, especially playing rules/nationalities/arms that aren't your favorite. Where playing a scenario you pick is more fun. Jack's earlier post regarding playtesters lack of feedback, is a huge demotivator for continuing making that time commitment. I enjoy playtesting, but as a way of giving back to the hobby and being a part of the project, hopefully having a little fun along the way.

Number of playtests needed definitely vary for MANY reasons. I've found interesting/unusual/experimental scenarios needing a lot of playings, not because they are poor designs but complicated designs, where each little tweak really creates a whole new scenario. Some simple scenarios, can be playtested in a few sittings, with a variety of players, if the designer got it right before submitting. The scenario that I worked on last has probably been played 8-10 times and still needs a couple more. It's a fun scenario, with a challenging VC, OOB, setup and map layout. The designer should be complimented for a clever albeit hard to balance scenario.

ROAR numbers especially for new scenarios really aren't that useful for condemning a design. I post on ROAR, and play new scenarios, but me getting beat on a regular basis by some very good players isn't really the scenario designer or playtesters fault!??! I see some stats quoted where my games are 25% of the total numbers!?!? Talk about a skewed sample!!

Balancing for competent play is essential, I've actually argued at one point that balancing for SK should be for incompetent play, but that is really not practical. Difficult scenario for a non top player will also skew ROAR significantly. You don't have to look further than SK scenarios, Retaking Vierville and Ambitious Assault to see that effect. Balanced Scenarios?? Sure, probably. Unbalanced ROAR should be expected due to the tough-for-novice effect? Mark's comment regarding attacking with armor is definitely true. Tough for a newbie for sure!! So are alot of other things. ROAR is useful, to see if a scenario is a classic (135-133), or if it is unwinnable (15-0). If ROAR is 30-1, sure a problem, but 18-7, and especially if 7 is the attacker or has some nuance that a non top player won't use effectively, then it could very well be perfectly balanced.

Sure there could be more scenarios if playtesting and history isn't required. But publishers seem to think that history and reasonable balance is essential to a product, and don't feel there is a need to lower the bar to get more scenarios published. I for one will never play more a few percentages points of the available scenarios, so I don't have any interest in a non-historical or non-balanced scenario. Seems like we have enough high quality designers who are satisfying the player community needs for scenarios!!

I can't see how more playtesting can't help a scenario. Maybe there will be diminishing returns with playtest results especially if from the same players, but no improvement doesn't seem quite plausible.

I've enjoyed reading the comments from some of the top playtesters, and scenario designers, much more qualified to comment than I probably am!!

Buck.

RobZagnut · Apr 23, 2008

>I agree that Few Returned is a fine scenario pack with acceptable rankings. I helped playtest it. I bought it. Mark designed some great, fun scenarios that I've enjoyed playing very much. I don't mean to disparage it in any way, or pick on Mark, but I must use it as an example because it totally rebuts Marks very own assertions about playtesting!

>He claims that more playtesting = more balanced scenarios. Yet his own pack belies this. If I may tease him a bit, one can only guess as to how awful it would have been if it hadn't been playtested a billion times. Perhaps Mark isn't a good enough designer to produce quality scenarios unless they've been playtested 60 times each. Work on it Mark.

I don't think his pack belies aything, which is why I made the 'acceptable' comment. The sampling size is currently too small, so the numbers that ROAR his listed for Few Returned is in the acceptable range. You can't say that any of those scenarios are unbalanced yet.

>I've seen David Lamb come up with a balanced scenario after only two or three playtests.

I don't buy this one bit. Two or three playtests for a single scenario is WAY TOO SMALL. There needs to be blind playtesting from different types of groups before you can declare a scenario balanced. Your group might play a certain way and you play a certain way against different players, because you are used to playing them. Different players and groups pick up nuances that some groups would never see.

Case in point - The Schwerpunkt balls to the walls attack style. It took players maybe one or two Schwerpunkt scenario packs to figure out that when playing Schwerpunkt scenarios that you needed to drop everything and rush like hell as the attacker. There is abolutely no extra time in a Schwepunkt scenario.

That scenario that is playtested twice might be balanced for Dave Lamb's group or him and his common opponent, but not for everyone.

I think everyone who has been to a tournament where the same scenarios are played in the same round are always amused to see that NONE of the matches look the same. Some are shocked to see that the setups are so completely different that it doesn't look like they're playing the same scenario.

I can't see how you can model this with only two playtests.

Brien Martin · Apr 23, 2008

Much as it pains my fingers to type this, and my brain to actually think it, I'm with Mark on this one.

I've never designed an ASL scenario for publication, but I have designed games, and have been involved in playtesting for games.

As a designer, I beg for my playtesters to "break the system" ... I want them to do goofy things (I design sports games) like playing a little-used, but high-scoring (on a per-minute-played basis) player as a starter ... to see what kinds of system abuses we can prevent.

I want people to playtest the game until they can't stand it. I want people to say, "Hey, I played 50 games of your indoor soccer design, and the scores are within the averages and ranges you'd expect." I don't want, "Played five games, scores seem reasonable", only to have my more dedicated playtesters tell me that, after 100 playings, goals per game are about 2 goals per game too low.

I understand that at some point, the designer does have to call the playtest to an end. But I do that only when I believe I have as much anecdotal and statistical evidence from beta testing to validate my own findings from alpha testing. In ASL terms, that means something vastly different from what it does for sports games ... but it doesn't change the fact that a designer armed with a notebook full of data has more to hang his final call on than someone with a couple of letters and an e-mail from playtesters.

Do the guys with the notebook get it right 100% of the time? No. But neither do the guys with three sheets of paper. Game design, scenario design ... they all need the same things: good data going into the design, the hard work of the designer to mold the data into a workable design, the hard work of the playtesters, and ... despite Glenn's protestations to the contrary ... some luck ... we do sometimes stumble upon a great solution to a puzzle during design.

Brien

Gunner Scott · Apr 23, 2008

Obviously we all have different ways of playtesting a products, for some, 5 or 6 playings of a scenarios seems fine, for other 40 or 60 playings seems fine, I guess it comes down to how dedicated you are as both a playtester and a designer, more power to ya. I think what is all comes down too is weather the scenario is a crappy design or good design. Usually, really crappy design will take more playtesting, poorly thought VC's, SSR's, Set-up's and so on where as a well thought out design will snap into place due to well thought out SSR's, VC's ect ect.

The Toughest scenarios to balance out are the SP style scenarios, small OB's leave little room for mistakes, so usually the designer and playtesters will have to fiddle with both turns and VC's. Also scenarios with really quirkry SSR's and VC's can be a real pain in the ars to balance, those type of scenarios usually require alot of playtesting to pan out, this was usually the case with pitmans Few Returned pack.

Anyway, play some ASL and enjoy the fruits of your fellow ASL'ers blood, sweat and tears when working on such Frankenstiens.

Scott

Will Fleming · Apr 24, 2008

Bret Hildebran said:
Once we get to large enough sample sizes you hope that effect goes away, but it's merely a hope and not a certainty. Cool would be being able to pull out only those matchups between 2 top players to check on balance. I doubt we have enough samples to make that practical, if it were possible to mine the data that tightly anyway - WeASL likely could make it happen eventually I'd guess given enough data points...

I am actually kicking around some ideas and might float them here if I get some time to actually work on things. One would be an optional 'skill' level that would put more weight on games played by players with a higher rating. A different thought is some kind of 'balance' based upon the player's relative skills.

#1 would just weight games where top notch players participated more. ie. both players were 1700+, so it counts 1.5. Both above 1800, so it counts 1.7 (or whatever)

#2 might be easier with an example. Pleva beats me in scenario A, but the WeASL recognizes this and doesn't over rate it as pro Pleva side. It is different in that there is a difference in our relative ratings. It just says well, Pleva should have won since Will sucks

I think #1 helps address my issue directly while #2 would be good for the frequent opponents where one is more experienced/better.

Glennbo · Apr 24, 2008

Pitman said:
I don't know why this is becoming a Mark vs Glenn thing, except that Glenn seems to want to make it so.

HA! I was goading you. You have been goaded! :bite:

'Ol Fezziwig · Apr 24, 2008

Glennbo said:
HA! I was goating you. You have been goated! :bite:

Why would you goat Mark?

Will Fleming · Apr 24, 2008

Is that something like TB'ing? Do I want to know?

MLaPanzer · Apr 24, 2008

'Ol Fezziwig said:
Why would you goat Mark?

Because he ran out of sheep!:laugh:

daveramsey · Apr 24, 2008

Great discussion.

Ultimately for all its wonderful value, Roar and other result tracking tools may harm the majority of scenarios giving the focus on win/loss records whilst over promoting the most balanced ones.

IMO, the deinition of "balanced" comes down to a simple question:

Do both players feel like they would have a chance of winning the game, if they were to play it again?

In the current model I agree that balance should assume a good level of play (higher than the norm) - but in doing so, you may as well come straight out and discount any playings from the average player. Newbies need not apply, unless to challenge poor grammar, or questionable SSRs.

I will eventually get round to publishing a scenario pack. The focus will be on fun without strict historical requirements. The next topic of discussion should be how we'll define Fun!

Jazz · Apr 24, 2008

daveramsey said:
The focus will be on fun without strict historical requirements. The next topic of discussion should be how we'll define Fun!

Which brings us back around to the "goat" vs "sheep" discussion of the previous postings.....

'Ol Fezziwig · Apr 24, 2008

Free editorial work...

Jazz said:
Which brings us back around the the "goat" vs "sheep" discussion of the previous postings.....

Errata to post: replace "back" with "baaack".

fwheel73 · Apr 24, 2008

daveramsey said:
Great discussion.

Ultimately for all its wonderful value, Roar and other result tracking tools may harm the majority of scenarios giving the focus on win/loss records whilst over promoting the most balanced ones.

IMO, the deinition of "balanced" comes down to a simple question:

Do both players feel like they would have a chance of winning the game, if they were to play it again?

In the current model I agree that balance should assume a good level of play (higher than the norm) - but in doing so, you may as well come straight out and discount any playings from the average player. Newbies need not apply, unless to challenge poor grammar, or questionable SSRs.

I will eventually get round to publishing a scenario pack. The focus will be on fun without strict historical requirements. The next topic of discussion should be how we'll define Fun!

Dave,
I agree... good discussion.

I disagree a bit about your view of the value of ROAR. I think ROAR does indicate Balance with W/L records being in a particular range-- 50 to XX% (choose 60 to 70%)--after reaching some total played level (now 600 to 800 balanced ones w/10+ playings). There is another scoring with most scenerios at ROAR that uses a "recommendation scale" of 9=Must play to 1=Candyland instead!. This scoring seems to answer to question of whether to play it again or not--most of the scenarios are voted to play again.

The idea of discounting players based on some assumed level of competence in the game maybe taking ourselves a little too seriously. One problem with ROAR is that all players do not take the time to input their games. Another problem, or maybe a the answer to your competence comment, is that the less experienced players may "self select" themselves in their early years as not wanting to have a boat load of losses on ROAR so they delay entering themselves.

There are probably 4,000+ ASL scenarios available today and any new system to reevaluate or "call balanced" something else or allow balance to be determined by the "best players" is looking down a long pipe-- pipe dream. If you are talking about the playtesting of scenerios, then limiting to skilled/more skilled players is certainly a correct view for designers.

I trust I have understood your comments.

Best regards,:salute:
John

spwhites · Apr 25, 2008

Poor Ray; this thread has gone a far cry from singing his praises. This is not going to help, but here goes...
There is a fundamental issue with recording the results of any test as pass/fail. It makes the required sample sizes grow dramatically when trying to establish statistical power of any hypothesis testing. Yes, I stayed at a Holiday Inn Express last night...
If I were a scenario designer and looking for more information from scenario playtests, I would set up a scale of results and not rely on win/loss results. Anyways, here's a past post that might add something to discussions around balance.
___________________________________________
This is a great stats discussion...
Having used statistics in a practical sense (many apologies to the academic crowd) as a quality engineer (QE) for several years, here are a few things to keep in mind BEFORE you establish how many samples are required (10, 50, 100, 300...).

For instance, start with this one:
What difference are you trying to see? In other words, if you are measuring individual's weights to see if they belong to a population of jockeys or to a population of NFL lineman, then you will need very few samples. If you are trying to determine if they belong to a population with a median of 190# or 210#, then that will increase the number of samples required. So, what does this mean to ASL? Well, if you are a designer and you need to tell if your scenario is extremely out of balance, 10 samples may be more than enough. If you want a perfectly balanced scenario that is going to determine the tourney champion, that's another story.

You must also ask yourself how right you want to be. What statistical power do you require of the answer? This is the chance that you might get an answer that a scenario is balanced, when in fact it really is not. If you can't afford much risk (making medical devices, for example), then you must set your limits accordingly.

One final thought; probably the most important thing to note is how results are measured. This causes the number of samples required to go up astronomically! The first thing they teach new QEs is to use continuous, not discrete, data. If you can measure the results, for example, on a scale of 1 to 100, then the number of measurements required would be much less. That would make it much easier to see the difference between a balanced and unbalanced scenario. Getting results in discrete fashion (German won/loss) doesn't give you much information about the outcome. Now, off my soapbox and back to playing...
______________________________
:salute:

fwheel73 · Apr 25, 2008

spwhites said:
Poor Ray; this thread has gone a far cry from singing his praises. This is not going to help, but here goes...
[big snip]

Now, off my soapbox and back to playing...
______________________________
:salute:

Steve,
Completely off topic

..... but I noticed that my previous message was my #269 and your message was also your #269..... hmm... does that help in any way in adding balance to this discussion? :laugh::laugh:

Best regards,:salute:
John

Ray Tapio is da Man!

Elder Member

I love women with brains

Member

комиссар рыба

Forum Guru

Member

Elder Member

Panthera oncia

Forum Guru

Senior Member

Elder Member

Repressed Dissident

Senior Member

Senior Member

Elder Member

Inactive

Repressed Dissident

Member

Member

Member