So, I guess tl;dr yes the save-rolling mechanism is fair.
Well, let's go with "The save-rolling mechanism is assumed to be fair, unless you have some solid statistical indication that it isn't (emphasis SOLID)."
Technically all RNG is simulated, as there are no truly random phenomena.
I think most (not all) physicists would disagree.
Physicists would probably phrase it more carefully. "Randomness" is a very tricky thing, actually. Many things we casually call "random" are simply outcomes of processes we cannot, for various reasons, accurately predict (yet?). But that doesn't mean they happen on a whim, or based on no factors at all. At least according to some models.
But after a certain point it becomes mostly a philosophical discussion.
@joluv: Hard to say. In physics I've come across lots of views both for and against. Some people think that on the quantum level (more often) and at the level of epiphenomena (less often), there is such a thing as randomness, entropy, and uncertainty--that the universe itself is in a semi-uncertain state. Others think that these are just illusions created by our own inability to fully understand how the world really works.
We do have an intelligence gap here. Our brains only evolved to handle Newtonian physics; quantum stuff is by nature counterintuitive.
It doesn't help that conclusions in quantum physics are expressed through complicated mathematical proofs that the laity can't understand--and which physicists themselves have difficulty translating into speech. Asking a physicist to explain the evidence for randomness using words is a bit like asking a writer to explain the appeal of Allen Ginsberg's Howl using a mathematical proof.
Can anyone tell me why 5000 is an indication and 50 or, indeed, 2 is not at all? I know that a bigger number gives more accuracy, but what makes you think a low number can be disregarded completely?
When you decrease the frequency of each possible value, the total number of events to be measured increases exponentially before you get equal representation.
Rolling a d20 gives 20 possible values and to be statistically significant you need to roll a lot more than the number of possible outcome (which is 20, not 2; the success/failure of the save is based on the roll but it will not be measured accurately unless the number of rolls is high enough to determine if it is random)
Rolling 50 times a d20 you may roll the number 6 five times and 18 never. Doing it 5000 times you will probably get nearly equal times for each. Doing it with a d4 would require much less rolls and a d100 a lot more.
I don't doubt that higher sample sizes yield better accuracy, I only want to point out that even with small samples it is sometimes possible to make accurate conclusions - "sometimes" being more than 50/50. But then, we go on more than the data in the test itself, e.g. I only need to see one crow to conclude that crows are encountered in my area, but seeing one tiger in my area probably just means it ran away from a zoo. This question isn't "philosophical" in the sense of being empty nonsense, though. If we allow ourselves to use our background knowledge and intuition when we approach a system such as, say, the AD&D rules in BG, we may be better off than now even without a very big data bank.
Sometimes it seems like low rolls are more common than high ones, and later, vice versa. Like the RNG gets stuck in 'low' or 'high' range. I remember needing to roll a single 12 to hit an enemy and finish it off, and for more than ten rounds I kept rolling 3s, 4s, 6s, etc, cursing all the way. It may still be just unluck.
Similarly I've hit four criticals in a row quite a few times during my playing 'career'. Streaks happen all the time -- though obviously less than non-streaks -- that's why we notice them and don't remember the huge number of "ordinary sequences".
Why is it obvious that streaks are less common than non-streaks, if we don't remember the latter? Maybe there never were any non-streaks. Maybe there are only streaks interspersed with other streaks, like woven threads. The woof of designers' generosity run through the warp of their fear, for instance, and expressed in various sides of this RNG's functionality.
I don't doubt that higher sample sizes yield better accuracy, I only want to point out that even with small samples it is sometimes possible to make accurate conclusions - "sometimes" being more than 50/50.
Assuming you don't mean cases where those smaller sample sizes are already statistically sufficient, if you make an "accurate conclusion" about something using a sample size that is too small, then it's not a conclusion; it's an educated guess that happened to turn out true. No one is saying you can't guess a result from incredibly small sample sizes - but that doesn't make it a conclusion.
Sometimes it seems like low rolls are more common than high ones, and later, vice versa. Like the RNG gets stuck in 'low' or 'high' range. I remember needing to roll a single 12 to hit an enemy and finish it off, and for more than ten rounds I kept rolling 3s, 4s, 6s, etc, cursing all the way. It may still be just unluck.
Similarly I've hit four criticals in a row quite a few times during my playing 'career'. Streaks happen all the time -- though obviously less than non-streaks -- that's why we notice them and don't remember the huge number of "ordinary sequences".
Are you saying that the RNG is fair? Or not? I'm not sure I understood this sentence.
I mean it tries to be fair, and most of the time it is fair, thus it is fair. In rare occasions it may feel like it is being unfair but that may just be the randomness and luck involved.
I mean it tries to be fair, and most of the time it is fair, thus it is fair. In rare occasions it may feel like it is being unfair but that may just be the randomness and luck involved.
It either is fair or isn't, "most of the time" doesn't really make sense unless you're positing that the behavior changes dynamically (in which case it would in fact be: unfair).
As a default, one would assume that the system is "fair" in the sense that it is an acceptably random approximation to "true" randomness, and that it shows now systemic bias, or a dynamic bias influenced by previous results etc.
Variance doesn't really matter for whether it is fair or not. It only matters as to whether it is PERCEIVED as fair or not by users looking at small sample sizes - and that is subject to variance itself, so for every user on one end of the curve there is someone on the other.
I mean it tries to be fair, and most of the time it is fair, thus it is fair. In rare occasions it may feel like it is being unfair but that may just be the randomness and luck involved.
It either is fair or isn't, "most of the time" doesn't really make sense unless you're positing that the behavior changes dynamically (in which case it would in fact be: unfair).
As a default, one would assume that the system is "fair" in the sense that it is an acceptably random approximation to "true" randomness, and that it shows now systemic bias, or a dynamic bias influenced by previous results etc.
Variance doesn't really matter for whether it is fair or not. It only matters as to whether it is PERCEIVED as fair or not by users looking at small sample sizes - and that is subject to variance itself, so for every user on one end of the curve there is someone on the other.
Perceived fairness doesn't come into play when testing the actual numbers. It's one thing to play BG for a couple hours and say, "You know, I really had a rough time saving vs. spell." But that doesn't explain a controlled test. I performed several tests against the chance to learn spells percentage. My results were right on the money (percent) with CHARNAME with two different INT numbers. However, with a dual classed Imoen, they were way off. She failed to learn spells more than she should have. I guess the question is, how big of a sample size is needed to determine whether the RNG is, in fact, NOT as random as it could be. Is twenty enough...? How about one hundred, or maybe five thousand...? It's a fact that, if the RNG is truly random, the more you test against it the closer the average of all your testing should be to the represented percentage chance you are testing against.
My take, and how to put to bed - There's definitely something 'fishy' going on. Of course, my tests aren't proof. Unless someone wants to dedicate a day (Beamdog?) to testing against the RNG, I don't think this will be put to bed any time soon. Maybe someone at Beamdog can get approval for a day of clicking themselves into carpal tunnel to test the chance to learn percentage of a pure CHARNAME level one mage vs. Immy level one mage dualed? I know a script to do this would be easier but that may miss some weird in-game mechanic that is actually responsible for skewing the numbers. I think a sample size of two thousand plus would be enough...thoughts?
@alceryes - That doesn't sound like a problem with the random number generator, but rather with how the game determines the chance to learn a spell. There was a previous thread that indicated kitted bards got the -15% penalty that specialist wizards receive when trying to learn spells outside their school. Perhaps this also applies to dual-class X/mages. In your comment on another thread, you found that Imoen's learn spell failure chance was ~40% when the expected value would be 25%, i.e., consistent with the 15% penalty being applied. So, not a problem with the RNG.
I mean it tries to be fair, and most of the time it is fair, thus it is fair. In rare occasions it may feel like it is being unfair but that may just be the randomness and luck involved.
What do you mean by "tries"? Do ascibe some sort of intention to the RNG?
It's (very probably[1]) statistically fair, but that doesn't mean you won't have absurd streaks (etc.). It just means that in the long run it'll all work out to not being biased towards player or game.
[1] Don't get me wrong, there have be cases of unintentionally bad RNGs causing all sorts of shenanigans, but usually game designers wouldn't try to gimmick the RNG if they can just as easily just change the game balance by adjusting saving throws (or whatever).
Actually, there is a way to test the RNG of the game easily - modding the UI to tell you the results of an autoroller. I *very* quickly had a go...
Here is a quick test I did to show me for just over 100,000 and just over 1,000,000 rolls what the upper frequency is - I selected a chaotic neutral, human unkitted fighter:
ROLL: Count / Total
And just after 1,000,000:
Even for 100,000 rolls 93 still came up less than 94 - but that's randomness for you. I haven't done any kind of analysis but the results 'look' about right. The lowest roll was 75 BTW.
Anyway, my point was that with a full analysis of the numbers outputted from such an autoroller you could quickly see how fair the rolling mechanism and (hopefully) by extension RNG in general is...
This is pretty much confirmation that any strange behavior is the result of some other systemic flaw, like the specialist factor mentioned earlier. The rolls themselves are fair, it's just that they are also modified by a whole lot of stuff, some of which can be difficult to discern and identify since we mostly only see the one single end result with no idea how we got there, exactly.
I left it running in the background whilst I did some modding and it rolled over 10,000,000 times... Still nothing too unusual to report...
Much appreciated, but techincally this is still not enough. You'd need proper statistical tests. (And even those may not be quite enough.)
For an arbitrary (and probably un-disprovable) example there's no proof in your output that the RNG isn't overly generous on the 25th of December. The general problem with proving the fairness of RNGs is that you're actually trying to prove a "negative" in the sense that it isn't unfair in some certain circumstance. For that you need a) source code (that's actually provably in the binary), and b) a barrage of various statistical tests that are not just histograms.
For this reason I'm actually an advocate of using CPRNGs for games even though it's very unlikely to matter. I say CPRNG, but I mean "CPRNG or equivalent" -- it needn't be unpredicatable from previous output) in the same way, but it should have almost all of the same statistical properties. AFAIK the only family of RNGs in this family is the PCG family (obviously, other than actual CPRNGs).
That said, I'm just arguing for the sake of arguing. I find the limitations of knowledge/empicism theoretically interesting. There's no actual evidence that the RNG in BG1/BG2 (originals, at least) is unfairly biased. There may be bugs (off-by-one errors) around savings throws, but that's pretty marginal unless you're playing exclusively no-reload.
Comments
But after a certain point it becomes mostly a philosophical discussion.
We do have an intelligence gap here. Our brains only evolved to handle Newtonian physics; quantum stuff is by nature counterintuitive.
It doesn't help that conclusions in quantum physics are expressed through complicated mathematical proofs that the laity can't understand--and which physicists themselves have difficulty translating into speech. Asking a physicist to explain the evidence for randomness using words is a bit like asking a writer to explain the appeal of Allen Ginsberg's Howl using a mathematical proof.
Rolling a d20 gives 20 possible values and to be statistically significant you need to roll a lot more than the number of possible outcome (which is 20, not 2; the success/failure of the save is based on the roll but it will not be measured accurately unless the number of rolls is high enough to determine if it is random)
Rolling 50 times a d20 you may roll the number 6 five times and 18 never. Doing it 5000 times you will probably get nearly equal times for each. Doing it with a d4 would require much less rolls and a d100 a lot more.
As a default, one would assume that the system is "fair" in the sense that it is an acceptably random approximation to "true" randomness, and that it shows now systemic bias, or a dynamic bias influenced by previous results etc.
Variance doesn't really matter for whether it is fair or not. It only matters as to whether it is PERCEIVED as fair or not by users looking at small sample sizes - and that is subject to variance itself, so for every user on one end of the curve there is someone on the other.
Perceived fairness doesn't come into play when testing the actual numbers. It's one thing to play BG for a couple hours and say, "You know, I really had a rough time saving vs. spell." But that doesn't explain a controlled test.
I performed several tests against the chance to learn spells percentage. My results were right on the money (percent) with CHARNAME with two different INT numbers. However, with a dual classed Imoen, they were way off. She failed to learn spells more than she should have.
I guess the question is, how big of a sample size is needed to determine whether the RNG is, in fact, NOT as random as it could be. Is twenty enough...? How about one hundred, or maybe five thousand...? It's a fact that, if the RNG is truly random, the more you test against it the closer the average of all your testing should be to the represented percentage chance you are testing against.
My take, and how to put to bed -
There's definitely something 'fishy' going on. Of course, my tests aren't proof. Unless someone wants to dedicate a day (Beamdog?) to testing against the RNG, I don't think this will be put to bed any time soon. Maybe someone at Beamdog can get approval for a day of clicking themselves into carpal tunnel to test the chance to learn percentage of a pure CHARNAME level one mage vs. Immy level one mage dualed? I know a script to do this would be easier but that may miss some weird in-game mechanic that is actually responsible for skewing the numbers. I think a sample size of two thousand plus would be enough...thoughts?
https://forums.beamdog.com/discussion/comment/770811#Comment_770811
Looks like it's been reported in redmine.
http://redmine.beamdog.com/issues/23442
I didn't realize that that specialist penalty is believed to be the cause, must've missed it.
It's (very probably[1]) statistically fair, but that doesn't mean you won't have absurd streaks (etc.). It just means that in the long run it'll all work out to not being biased towards player or game.
[1] Don't get me wrong, there have be cases of unintentionally bad RNGs causing all sorts of shenanigans, but usually game designers wouldn't try to gimmick the RNG if they can just as easily just change the game balance by adjusting saving throws (or whatever).
@AstroBryGuy Thanks for that info.
Here is a quick test I did to show me for just over 100,000 and just over 1,000,000 rolls what the upper frequency is - I selected a chaotic neutral, human unkitted fighter:
And just after 1,000,000:
Even for 100,000 rolls 93 still came up less than 94 - but that's randomness for you. I haven't done any kind of analysis but the results 'look' about right. The lowest roll was 75 BTW.
Anyway, my point was that with a full analysis of the numbers outputted from such an autoroller you could quickly see how fair the rolling mechanism and (hopefully) by extension RNG in general is...
For an arbitrary (and probably un-disprovable) example there's no proof in your output that the RNG isn't overly generous on the 25th of December. The general problem with proving the fairness of RNGs is that you're actually trying to prove a "negative" in the sense that it isn't unfair in some certain circumstance. For that you need a) source code (that's actually provably in the binary), and b) a barrage of various statistical tests that are not just histograms.
For this reason I'm actually an advocate of using CPRNGs for games even though it's very unlikely to matter. I say CPRNG, but I mean "CPRNG or equivalent" -- it needn't be unpredicatable from previous output) in the same way, but it should have almost all of the same statistical properties. AFAIK the only family of RNGs in this family is the PCG family (obviously, other than actual CPRNGs).
That said, I'm just arguing for the sake of arguing. I find the limitations of knowledge/empicism theoretically interesting. There's no actual evidence that the RNG in BG1/BG2 (originals, at least) is unfairly biased. There may be bugs (off-by-one errors) around savings throws, but that's pretty marginal unless you're playing exclusively no-reload.
I'm happy just saying "It's fair enough for me" ...