The document discusses an algorithm for modeling unknown distributions through iterative sampling. It assumes initially equal distributions and large confidence bounds. It then begins sampling randomly from the distributions and assigning values, slowly converging on the actual average distribution over multiple iterations. New samples cause the perceived distributions to change and confidence bounds to narrow with each round, better approximating the true underlying distributions over many samples.
The document discusses an algorithm for modeling unknown distributions through iterative sampling. It assumes initially equal distributions and large confidence bounds. It then begins sampling randomly from the distributions and assigning values, slowly converging on the actual average distribution over multiple iterations. New samples cause the perceived distributions to change and confidence bounds to narrow with each round, better approximating the true underlying distributions over many samples.
The document discusses an algorithm for modeling unknown distributions through iterative sampling. It assumes initially equal distributions and large confidence bounds. It then begins sampling randomly from the distributions and assigning values, slowly converging on the actual average distribution over multiple iterations. New samples cause the perceived distributions to change and confidence bounds to narrow with each round, better approximating the true underlying distributions over many samples.
that we would like to find out, but don’t currently know **We assume equal starting values **First assume a large confidence bound **Pick one at random because they are all the same at this point **win/click/good gives value 1 **Slowly begins to converge to Lose/no click/bad gives value 0 actual average **Next pick any of 4 highest at random because they are the same at this point **win/click/good gives value 1 **Slowly begins to converge to Lose/no click/bad gives value 0 actual average **win/click/good gives value 1 **Slowly begins to converge to Lose/no click/bad gives value 0 actual average **Choose next highest confidence bound at random (of three highest) **probabilities may sometimes shift (short term) confidence bounds in opposite direction of actual average, However long run will converge to actual **Creating distribution of where actual value may lie (auxiliary mechanism) **Algorithm will call and pull value from distribution
**Can be far from mean at
first, but will eventually convulse in long run **Pick Machine with highest return **Spits out value based on actual distribution of machine **New value changes perception of distribution **Also becomes more narrow **Next round generate more values from distributio **Pick Machine with highest return **Spits out value based on actual distribution of machine **New value changes perception of distribution **Pick Machine with highest return **Spits out value based on actual distribution of machine **New value changes perception of distribution