Imagine that what we now commonly think of as the subject of probability is nothing other that three distinct areas of human intellectual effort, bound more or less uncomfortably together.
What's the best order to tell their story? Historically, I suppose.
At some point in human pre-history, men became aware of degrees of certainty within their own heads concerning matters of the world. This may or may not have happened before the invention of counting. So these degrees were more likely to be rank-based. I assume also that some people would have been substantially better at this kind of comparative ranking of possibilities than others. I'm agnostic on whether they were on average any good (one of the findings of modern behavioural economics, for example, is just how bad we are at this kind of reasoning). The exact dating of this period is most certainly lost to us.
Later mankind invented technologies which allowed them to create randomisation machines with more or less stable relative frequencies. These machines are clearly out there in the world, as opposed to in people's heads. As such, they provide the possibility for us humans to observe their behaviour when executed, to note the stability of their outcomes, and for these outcomes to be inter-subjectively affirmed. Credit goes to part of the heel bones of deer and sheep - the astralagi - as the first randomisers. These were approximately dice-shaped and were sturdy enough to be rolled on many primitive floors. Because of their shape, when they landed, they did so uncontroversially (as opposed, for example, to a walnut, about which there could be much dispute, I'd imagine, over exactly which way up it landed, and as opposed to, for example, a toe bone, which would mostly land one side or the other and could probably be gamed easily). Two of the six sides in this approximately die shaped object were rounded and hence when you throw it, won't land on those sides. Of the remaining 4 sides, two were approximately likely to be seen 2/5 of the time, and the other two both about 1/10 of the time.
When you're playing games which contain an element of randomness, then if you chose a coin-like object, you have to execute its randomisation operation (toss it) more times to reach the same degree of randomness. In object-scarce primitive times, it is probably practically more useful, not to say more fun, to use a randomiser with 1.7 bits of information revealed per toss than one with just 1 bit. 1.7 bits represents the information content in a discrete probability distribution with values {2/5, 2/5, 1/10, 1/10}.
When you're playing games which contain an element of randomness, then if you chose a coin-like object, you have to execute its randomisation operation (toss it) more times to reach the same degree of randomness. In object-scarce primitive times, it is probably practically more useful, not to say more fun, to use a randomiser with 1.7 bits of information revealed per toss than one with just 1 bit. 1.7 bits represents the information content in a discrete probability distribution with values {2/5, 2/5, 1/10, 1/10}.
Such activities as rolling the astralagus presented a stable real world phenomenon (in the sense of delivering stable relative frequencies) against which one could measure up one's own subjective estimations of certainty. Notice that it is easy to imagine a world which had individuals making subjective cardinality comparisons between possible futures, in the absence of any randomisation machine; but not vice versa.
The beginnings of an analysis of the properties of these randomisation machines heralded the birth of the frequentist approach. We've moved on from the astralagus to Chaitin's algorithmic complexity.
Finally we have Kolmogorov replicating the essential rules of the frequentist approach into the axioms of a brand new branch of logic or set theory - using measure theory to achieve that goal. The frequentist analysis thus seems to be the crucial link between the uncertainty of subjective approaches and the formality of set and measure theoretic definitions of probability. But while the set theoretic approach renders the common axioms of the frequentist approach well, it doesn't rely on frequentism to derive any of its conclusions.