Is the save-rolling mechanism fair?

[Deleted User] · August 2016

The user and all related content has been deleted.

Lord_Tansheron · August 2016

subtledoctor said:
So, I guess tl;dr yes the save-rolling mechanism is fair.

Well, let's go with "The save-rolling mechanism is assumed to be fair, unless you have some solid statistical indication that it isn't (emphasis SOLID)."

joluv · August 2016

semiticgod said:
Technically all RNG is simulated, as there are no truly random phenomena.

I think most (not all) physicists would disagree.

Lord_Tansheron · August 2016

joluv said:

semiticgod said:
Technically all RNG is simulated, as there are no truly random phenomena.
I think most (not all) physicists would disagree.

Physicists would probably phrase it more carefully. "Randomness" is a very tricky thing, actually. Many things we casually call "random" are simply outcomes of processes we cannot, for various reasons, accurately predict (yet?). But that doesn't mean they happen on a whim, or based on no factors at all. At least according to some models.

But after a certain point it becomes mostly a philosophical discussion.

Dee · August 2016

Eventually, all dice-rolling discussions become a conversation about free will.

joluv · August 2016

semiticgoddess · August 2016

@joluv: Hard to say. In physics I've come across lots of views both for and against. Some people think that on the quantum level (more often) and at the level of epiphenomena (less often), there is such a thing as randomness, entropy, and uncertainty--that the universe itself is in a semi-uncertain state. Others think that these are just illusions created by our own inability to fully understand how the world really works.

We do have an intelligence gap here. Our brains only evolved to handle Newtonian physics; quantum stuff is by nature counterintuitive.

It doesn't help that conclusions in quantum physics are expressed through complicated mathematical proofs that the laity can't understand--and which physicists themselves have difficulty translating into speech. Asking a physicist to explain the evidence for randomness using words is a bit like asking a writer to explain the appeal of Allen Ginsberg's Howl using a mathematical proof.

joluv · August 2016

Well, there goes my evening.

Francois · August 2016

chimeric said:
Can anyone tell me why 5000 is an indication and 50 or, indeed, 2 is not at all? I know that a bigger number gives more accuracy, but what makes you think a low number can be disregarded completely?

When you decrease the frequency of each possible value, the total number of events to be measured increases exponentially before you get equal representation.

Rolling a d20 gives 20 possible values and to be statistically significant you need to roll a lot more than the number of possible outcome (which is 20, not 2; the success/failure of the save is based on the roll but it will not be measured accurately unless the number of rolls is high enough to determine if it is random)

Rolling 50 times a d20 you may roll the number 6 five times and 18 never. Doing it 5000 times you will probably get nearly equal times for each. Doing it with a d4 would require much less rolls and a d100 a lot more.

chimeric · August 2016

I don't doubt that higher sample sizes yield better accuracy, I only want to point out that even with small samples it is sometimes possible to make accurate conclusions - "sometimes" being more than 50/50. But then, we go on more than the data in the test itself, e.g. I only need to see one crow to conclude that crows are encountered in my area, but seeing one tiger in my area probably just means it ran away from a zoo. This question isn't "philosophical" in the sense of being empty nonsense, though. If we allow ourselves to use our background knowledge and intuition when we approach a system such as, say, the AD&D rules in BG, we may be better off than now even without a very big data bank.

AnonymousHero · August 2016

lunar said:
Sometimes it seems like low rolls are more common than high ones, and later, vice versa. Like the RNG gets stuck in 'low' or 'high' range. I remember needing to roll a single 12 to hit an enemy and finish it off, and for more than ten rounds I kept rolling 3s, 4s, 6s, etc, cursing all the way. It may still be just unluck.

Similarly I've hit four criticals in a row quite a few times during my playing 'career'. Streaks happen all the time -- though obviously less than non-streaks -- that's why we notice them and don't remember the huge number of "ordinary sequences".

lunar said:

Still I think the mechanic tries to be fair.

Are you saying that the RNG is fair? Or not? I'm not sure I understood this sentence.

chimeric · August 2016

Why is it obvious that streaks are less common than non-streaks, if we don't remember the latter? Maybe there never were any non-streaks. Maybe there are only streaks interspersed with other streaks, like woven threads. The woof of designers' generosity run through the warp of their fear, for instance, and expressed in various sides of this RNG's functionality.

FinneousPJ · August 2016

@chimeric https://en.m.wikipedia.org/wiki/Statistical_inference

chimeric · August 2016

Yes, that church.

Lord_Tansheron · August 2016

chimeric said:
I don't doubt that higher sample sizes yield better accuracy, I only want to point out that even with small samples it is sometimes possible to make accurate conclusions - "sometimes" being more than 50/50.

Assuming you don't mean cases where those smaller sample sizes are already statistically sufficient, if you make an "accurate conclusion" about something using a sample size that is too small, then it's not a conclusion; it's an educated guess that happened to turn out true. No one is saying you can't guess a result from incredibly small sample sizes - but that doesn't make it a conclusion.

lunar · August 2016

AnonymousHero said:

lunar said:
Sometimes it seems like low rolls are more common than high ones, and later, vice versa. Like the RNG gets stuck in 'low' or 'high' range. I remember needing to roll a single 12 to hit an enemy and finish it off, and for more than ten rounds I kept rolling 3s, 4s, 6s, etc, cursing all the way. It may still be just unluck.
Similarly I've hit four criticals in a row quite a few times during my playing 'career'. Streaks happen all the time -- though obviously less than non-streaks -- that's why we notice them and don't remember the huge number of "ordinary sequences".
lunar said:

Still I think the mechanic tries to be fair.
Are you saying that the RNG is fair? Or not? I'm not sure I understood this sentence.

I mean it tries to be fair, and most of the time it is fair, thus it is fair. In rare occasions it may feel like it is being unfair but that may just be the randomness and luck involved.

Lord_Tansheron · August 2016

lunar said:
I mean it tries to be fair, and most of the time it is fair, thus it is fair. In rare occasions it may feel like it is being unfair but that may just be the randomness and luck involved.

It either is fair or isn't, "most of the time" doesn't really make sense unless you're positing that the behavior changes dynamically (in which case it would in fact be: unfair).

As a default, one would assume that the system is "fair" in the sense that it is an acceptably random approximation to "true" randomness, and that it shows now systemic bias, or a dynamic bias influenced by previous results etc.

Variance doesn't really matter for whether it is fair or not. It only matters as to whether it is PERCEIVED as fair or not by users looking at small sample sizes - and that is subject to variance itself, so for every user on one end of the curve there is someone on the other.

alceryes · August 2016

Lord_Tansheron said:

lunar said:
I mean it tries to be fair, and most of the time it is fair, thus it is fair. In rare occasions it may feel like it is being unfair but that may just be the randomness and luck involved.
It either is fair or isn't, "most of the time" doesn't really make sense unless you're positing that the behavior changes dynamically (in which case it would in fact be: unfair).

As a default, one would assume that the system is "fair" in the sense that it is an acceptably random approximation to "true" randomness, and that it shows now systemic bias, or a dynamic bias influenced by previous results etc.

Variance doesn't really matter for whether it is fair or not. It only matters as to whether it is PERCEIVED as fair or not by users looking at small sample sizes - and that is subject to variance itself, so for every user on one end of the curve there is someone on the other.

Perceived fairness doesn't come into play when testing the actual numbers. It's one thing to play BG for a couple hours and say, "You know, I really had a rough time saving vs. spell." But that doesn't explain a controlled test.
I performed several tests against the chance to learn spells percentage. My results were right on the money (percent) with CHARNAME with two different INT numbers. However, with a dual classed Imoen, they were way off. She failed to learn spells more than she should have.
I guess the question is, how big of a sample size is needed to determine whether the RNG is, in fact, NOT as random as it could be. Is twenty enough...? How about one hundred, or maybe five thousand...? It's a fact that, if the RNG is truly random, the more you test against it the closer the average of all your testing should be to the represented percentage chance you are testing against.

My take, and how to put to bed -
There's definitely something 'fishy' going on. Of course, my tests aren't proof. Unless someone wants to dedicate a day (Beamdog?) to testing against the RNG, I don't think this will be put to bed any time soon. Maybe someone at Beamdog can get approval for a day of clicking themselves into carpal tunnel to test the chance to learn percentage of a pure CHARNAME level one mage vs. Immy level one mage dualed? I know a script to do this would be easier but that may miss some weird in-game mechanic that is actually responsible for skewing the numbers. I think a sample size of two thousand plus would be enough...thoughts?

AstroBryGuy · August 2016

@alceryes - That doesn't sound like a problem with the random number generator, but rather with how the game determines the chance to learn a spell. There was a previous thread that indicated kitted bards got the -15% penalty that specialist wizards receive when trying to learn spells outside their school. Perhaps this also applies to dual-class X/mages. In your comment on another thread, you found that Imoen's learn spell failure chance was ~40% when the expected value would be 25%, i.e., consistent with the 15% penalty being applied. So, not a problem with the RNG.

https://forums.beamdog.com/discussion/comment/770811#Comment_770811

Looks like it's been reported in redmine.

http://redmine.beamdog.com/issues/23442

alceryes · August 2016

Thanks AstroBryGuy!
I didn't realize that that specialist penalty is believed to be the cause, must've missed it.

AnonymousHero · August 2016

lunar said:

I mean it tries to be fair, and most of the time it is fair, thus it is fair. In rare occasions it may feel like it is being unfair but that may just be the randomness and luck involved.

What do you mean by "tries"? Do ascibe some sort of intention to the RNG?

It's (very probably[1]) statistically fair, but that doesn't mean you won't have absurd streaks (etc.). It just means that in the long run it'll all work out to not being biased towards player or game.

[1] Don't get me wrong, there have be cases of unintentionally bad RNGs causing all sorts of shenanigans, but usually game designers wouldn't try to gimmick the RNG if they can just as easily just change the game balance by adjusting saving throws (or whatever).

@AstroBryGuy Thanks for that info.

AnonymousHero · August 2016

alceryes said:

I guess the question is, how big of a sample size is needed to determine whether the RNG is, in fact, NOT as random as it could be.

Bingo! When all you have is black-box testing this is a huuuuuuge problem in scientific research as well as second-guessing game RNGs

Mr2150 · August 2016

Actually, there is a way to test the RNG of the game easily - modding the UI to tell you the results of an autoroller. I *very* quickly had a go...

Here is a quick test I did to show me for just over 100,000 and just over 1,000,000 rolls what the upper frequency is - I selected a chaotic neutral, human unkitted fighter:

ROLL: Count / Total

And just after 1,000,000:

Even for 100,000 rolls 93 still came up less than 94 - but that's randomness for you. I haven't done any kind of analysis but the results 'look' about right. The lowest roll was 75 BTW.

Anyway, my point was that with a full analysis of the numbers outputted from such an autoroller you could quickly see how fair the rolling mechanism and (hopefully) by extension RNG in general is...

joluv · August 2016

We've got to test it properly: http://csrc.nist.gov/groups/ST/toolkit/rng/documents/SP800-22rev1a.pdf

AstroBryGuy · August 2016

Mr2150 said:
The lowest roll was 75 BTW.

That's a function of the character generator. It discards any total < 75 and rerolls.

Mr2150 · August 2016

I left it running in the background whilst I did some modding and it rolled over 10,000,000 times... Still nothing too unusual to report...

Lord_Tansheron · August 2016

This is pretty much confirmation that any strange behavior is the result of some other systemic flaw, like the specialist factor mentioned earlier. The rolls themselves are fair, it's just that they are also modified by a whole lot of stuff, some of which can be difficult to discern and identify since we mostly only see the one single end result with no idea how we got there, exactly.

chimeric · August 2016

Thank you for saving me a lot of time, @Mr2150 .

AnonymousHero · August 2016

Mr2150 said:
I left it running in the background whilst I did some modding and it rolled over 10,000,000 times... Still nothing too unusual to report...

Much appreciated, but techincally this is still not enough. You'd need proper statistical tests. (And even those may not be quite enough.)

For an arbitrary (and probably un-disprovable) example there's no proof in your output that the RNG isn't overly generous on the 25th of December. The general problem with proving the fairness of RNGs is that you're actually trying to prove a "negative" in the sense that it isn't unfair in some certain circumstance. For that you need a) source code (that's actually provably in the binary), and b) a barrage of various statistical tests that are not just histograms.

For this reason I'm actually an advocate of using CPRNGs for games even though it's very unlikely to matter. I say CPRNG, but I mean "CPRNG or equivalent" -- it needn't be unpredicatable from previous output) in the same way, but it should have almost all of the same statistical properties. AFAIK the only family of RNGs in this family is the PCG family (obviously, other than actual CPRNGs).

That said, I'm just arguing for the sake of arguing. I find the limitations of knowledge/empicism theoretically interesting. There's no actual evidence that the RNG in BG1/BG2 (originals, at least) is unfairly biased. There may be bugs (off-by-one errors) around savings throws, but that's pretty marginal unless you're playing exclusively no-reload.

Mr2150 · August 2016

By all means, if you want to do those statistical tests then go for it...

I'm happy just saying "It's fair enough for me" ...

Is the save-rolling mechanism fair?

Comments