THE EARLY HISTORY OF BAYESIAN STATISTICS TOM LEONARD REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from 1972) on my website www.thomashoskynsleonard.co.uk Refers to technical material in my book Bayesian Methods: An Analysis for Statisticians and Interdisciplinary Researchers (1999, with John S.J. Hsu) Cambridge University Press See also my academic life story The Life of a Bayesian Boy. Self-published on my website Slides prepared by Thomas Tallis Among competing (plausible) hypotheses, the hypothesis with the

fewest assumptions should be selected. (WILLIAM OF OCKHAM) In other words: Keep things simple, and cut out extraneous information OCCAMS RAZOR (William of Ockham, c1287-1347) FOR EXAMPLE:: Use parameter parsimonious sampling models which depend upon on low numbers of unknown parameters (e.g. which minimise AIC or DIC) Contrasts with: A model should be as big as an elephant (Leonard Jimmie Savage, 1954, Lindley, 1983) Agrees with:

The greater the amount of information the less you actually know (Toby Mitchell, c 1980) Related to: E.T. Jaynes extremely valuable idea (1957 and 1968) of choosing the maximum entropy prior distribution when only p summaries of the prior information are specified. Pascal Fermat Blaise Pascal (1623-1662) formulated Pascals Wager by reference to the notion of subjective probability. Pascal corresponded with Pierre de Fermat about the potential development of probability theory. In 1654, Pascal and De Fermat (1601 or 1607 -1665 ) together solved the problem of points or division of stakes.

In 1657, Christian Huygens discussed the Pascal De Fermat debate, in De rationiciis in ludo aleae Daniel Bernoulli (1700-1782) Swiss physician, doctor and mathematician. Formalised subjective view of probability, decision making and risk. Introduced concept of EXPECTED UTILITY in 1738 in historic paper published in St Petersburg Used the St PETERSBURG PARADOX to justify maximising expected utility. Daniel Bernoulli (where the expected reward from the specified betting scheme is infinite, but most punters would only want to place a small bet on the outcome

because of the high probability of a low return) Educated (from age 12) at University of Edinburgh Sceptical views about causality in 1739-41 trilogy between 1723 and 1725 Questionable cause fallacy----The false assumption that correlation proves causality Subjective probability discussed in Ch 6 of his 1748 book David Hume F.R.S.E (1711-1776) Author of is-ought problem or Humes guillotine Significant difference between descriptive statements (about what is) and prescriptive statements (about what ought to be) Not obvious how to get from descriptive statements to

prescriptive ones Humes Law: You cant derive an ought from an is A midget on the shoulders of giants like Hume and Huygens (Tom Leonard, 2014) Studied for Presbyterian Ministry at University of Edinburgh between 1719 and about 1722. Probably derived continuous version of Bayes Theorem during the 1740s while a wealthy, wellconnected minister in Tunbridge Wells, with a serious demeanour and happy disposition. Rev. Thomas Bayes (1701-1763) The Notebook of Thomas Bayes (1747-1760) contains a section on probabilities. In his tract In defence of Isaac Newton (1736, printed by John Noon), sold for a shilling, Bayes writes,

To suspect Isaac Newton of the mean design of seeking reputation among the ignorant by venting unintelligible notions, and defending them by artful cunning and cunning artistry, is what no man is capable of doing. Moral philosopher, inductive thinker, and political activist in support of American Revolution. In 1763, Richard Price published Bayes paper An Essay towards solving a Problem in the Doctrine of Chances, posthumously, in the Proceedings of the Royal Society of London. Bayes solved a complicated Ball tossing problem involving n non-independent trials and with applications in life assurance. His mathematical solution was brilliant, but counterintuitive. Rev. Richard Price F.R.S.(1723-1791)

*** He posed this as a special case of: Obscurely Worded General Problem: Given the number of times (n) an unknown event has happened and failed, REQUIRED the chance that the probability () of its ) of its happening in a single trial lies somewhere between any two degrees of probability that can be made? A further special case (n=50 independent Bernoulli trials---see Bayes Appendix): If you fail to win a lottery on n=50 occasions, with equal chance ) of its of winning on reach occasion, then what is the chance that you probability ) of its of winning it on the 51 st attempt lies between 0.001 and 0.01? A young Bayesette

VERY SPECIAL CASE (n=1) If a mothers first baby is a girl, then what is the chance that the probability ) of its that her second baby is a boy lies between 0.5 and 1? Note that probability (girl on first birth, given ) = 1- Therefore LIKELIHOOD FUNCTION OF is L (, given girl on first birth) = 1- for 0< <1 In general, the likelihood of the unknown parameters is the assumed sampling density or probability mass function of the observations but expressed as a function of the unknown parameters, given the observations actually observed. Initiated the Savageous philosophy of Bayesian Statistics THE BAYESIAN PARADIGM Posterior information=Prior Information + Sampling Information. ($$$) A Bayesian is somebody who tries to represent his prior information about by a probability distribution on BAYES THEOREM (Continuous case):

LEONARD JIMMIE SAVAGE (1917- 1971) POSTERIOR DENSITY = K x PRIOR DENSITY x LIKELIHOOD where K can be calculated by noting that posterior density integrates to unity across the parameter space. However, in his 1763 paper, Bayes assumed a uniform prior distribution on (0,1) for , in which case POSTERIOR DENSITY=K x LIKELIHOOD POSTERIOR DENSITY OF PSI In preceding very special case, Posterior density of , given girl on first birth = (1-)/2 (0<<1) (*)

D E N S I T Y Posterior mean of =predictive probability that next baby is a boy= 1/3 and P (0.5 < <1, given girl on first birth) =1/4 < <1, given girl on first birth) =1/41, given girl on first birth) =1/4 If first n babies are girls, then predictive probability that next baby is a boy is 1/(n+2) PSI

French Astronomer, Mathematician, and Politician Minister in Napoleons Government FOUNDING FATHER OF BAYESIAN STATISTICS AND DATA ANALYSIS In 1774, his Memoir on the Probability of the Causes of Events Included a Bayesian analysis of the causes of events. In 1812, his Analytic Theory of Probabilities contained a number of detailed statistical analyses. He introduced a general version of Bayes theorem that Le Marquis Pierre Simon de includes the discrete and multiparameter cases. Laplace (1749-1827) Applied it to ANALYZE DATA in celestial mathematics, MEDICAL STATISTICS, reliability and jurisprudence. Developed LAPLACES APPROXIMATION to multidimensional integrals And LAPLACE TRANSFORMATIONS (moment generation

functions) Scottish moral philosopher and leading political economist. The Wealth of Nations , 1776 Rejected the idea that: Demand must be related to utility i.e. the more useful a thing is, and the more satisfaction it gives, the more people would be willing to pay for it. Adam Smith (1723-1794) THE PARODOX OF DIAMONDS AND WATER Water is necessary for life, and yet very cheap Diamonds have little utility, and are yet very costly. Smith thereby concluded that willingness to

pay is not related to utility. Adam Smith proposed using interval bounds for probabilities, rather than precisely specified subjective probabilities British philosopher, jurist and social reformer. Regarded by some as the father of modern utilitarianism, and by others, in the context of banking, insurance, and speculation, as the founder of the subjectivist, Bayesian approach to decision making. (Benthams approach to subjective probability is an earlier version of the exact, linear approach recommended as being rational by Tversky and Kahnemann). Introduction to Principles of Morals and Legislation, 1780 Jeremy Bentham (1748-1832)

GREATEST HAPPINESS PRINCIPLE: It is the greatest happiness of the greatest number which is the principle of right or wrong. Classification of 12 pains and 14 pleasures by which we may test the happiness factor of any action. Formalised set of criteria for measuring the extent of pain or pleasure that any decision will create. Reviewed concept of punishment, and whether a particular punishment will create more pain or pleasure for society. Bentham applied similar ideas to monetary economics. Anglo-Indian mathematician, statistician and spiritualist. Appointed to Chair of Mathematics at University of London (later UCL) in 1838 See his Essay on Probabilities (1838) De Morgan further developed Bayess and Laplaces

approach to INVERSE PROBABILITY... Posterior probabilities when the prior distribution is uniform. Somewhat arbitrary e.g. a uniform prior for a non-linear transformation of the parameter will give different posterior. Augustus De Morgan (1806-71) Uniform priors over on continuous unbounded parameter space are improper, but can, though not always, yield meaningful proper posteriors. De Morgan sought to justify uniform prior by Laplaces Principle of Insufficient Reason Florence Nightingale (1820-1910) Nurse and statistician For remainder of 19th century

(A) Many statistical scientists (e.g. Gauss, Edgeworth, Galton) thought Bayesian (B) Inverse probabilities remained the main methodology for statistical Inference. Fisher dabbled with then in the early 20th century and discarded them because of the arbitrariness in the choice of uniform prior. (C) Emphasis seemed to shifted somewhat to numerical and graphical summaries of data. e.g. London Cholera epidemic map (1832) and Crimean War (Florence Nightingale, e.g. pie charts) English geneticist, statistician and polymath, a truly great man of science In 1877 built machine called GALTON QUINCUNX Used simulations while attempting to calculate posterior distribution Galton encouraged use of Bayes Theorem

Sir Francis Galton (1822-1911) Informative conjugate analysis for normal distribution developed around that time. American philosopher, logician, mathematician and scientist. The father of pragmatism Emphasised that objective statistical conclusions can only be hoped for if the data result from a randomised experiment. Was the first scientist to elicit subjective probabilities in experimental psychology. French Military Officer 1894 TRIAL OF MILLENIUM

Dreyfus tried for treason Bizarrely justified subjective probability of forgery. Falsely convicted of transmitting military secrets to Germany. Probability related to possible coincidences concerning frequencies of symbols in the code. Alfred Dreyfus 9 October 1859 12 July 1935) SIMILAR PROBLEMS OCCUR TODAY WHENEVER STATISTICAL EVDENCE AND SUBJECTIVE PROBABILITIES ARE INTRODUCED INTO EVIDENCE David H. Kaye, Minnesota Law Review (2007)

O.J. Simpson murder case, Adams Rape Case, Sally Clark Cot Death Case See also D.H. Kaye (2010) DNA identification and the threat to civil liberties. Yale University Press British mathematician, philosopher and economist 1926 papers on subjective probability and utility were encouraged by the economist John Maynard Keynes His work on subjective probability and its elicitation satisfied Charles Peirces empirical test. Used by experimental psychologists and recognised in 1944 by Von Neumann and Morgenstern, in their book The Theory of Games and Economic Behaviour Famously used utility theory to judge how much of its Frank Ramsey (1903-1930) wealth a nation should spend Close friend of philosopher Ludwig Wittgenstein whose works he translated

Never stay up on the barren heights of cleverness, but come down into the green valleys of silliness Highly eccentric English statistician, evolutionary biologist, geneticist and eugenics One of the chief architects of neo-Darwinian synthesis Galton Professor of Eugenics at UCL (1933-43) Argued with Karl Pearson e.g, about who should teach which course. Dabbled with Bayesian inference and inverse probability, then argued vehemently against it because of its dependence on prior e.g. the choice of vague so-called ignorance prior. Sir Ronald Fisher (1990-1962) Introduced FIDUCIAL INFERENCE in paper in Annals of

Eugenics (1935).Disputed by Neyman and shown by Lindley in 1958 to violate Kolmorogovs addition laws of probability. Baron Keynes of Tilton Cambridge Economist Employed expected utility in 1936 in Chapter 12 of The General Theory of Employment, Interest and Money. Keynesian Economics has fundamentally affected the theory and practice of modern macroeconomics, and influenced the policies of governments, until about 1979, until the ideas of Milton Friedman, who also used expected utility, took over. John Maynard Keynes (1883-1946)

Cambridge-based Mathematician, Statistician, Geologist and Astronomer The Theory of Probability (1939) Precursed Anglo-American Bayesian Revival of 1960s Led by Rudolf Kalman, Raiffa and Schlaifer, Mosteller and Wallace, Box and Tiao, John Aitchison F.R.S.E and Dennis Lindley. INCLUDED: Invariance priors---Vague priors which refer to the determinant of Fishers Information and yield Sir Harold Jeffreys F.R.S. posterior distributions which are invariant under non(1891-1989) linear transformations of the parameters. Approximate Bayes intervals (also approximate confidence intervals) centred on the maximum likelihood estimate, which also refer to the likelihood dispersion. Pre-eminent Russian Mathematician and Probabilist

Introduced concept of Bayesian sufficiency in his paper on the statistical estimation of the law of Gauss in !942 in URSS Bulletin of the Academy of Sciences. Kolmogorovs Extension Theorem constrains us to only defining our probability distributions on measurable subsets of the parameter space or sample space (i.e. those which are elements of an appropriate sigmafield, such as a Borel field) Andrey Kolmogorov (1903-1987) Alan Turing (1912-1954) Irving Jack Good (1916-2009 ) Alan Turing: Gay icon and martyr, father of machine intelligence, modern computer science and artificial intelligence. Also the father of modern Bayesian applied statistics.

Jack Good: cryptanalysist, mathematician, statistician and philosopher. While solving the Nazi codes at Bletchley Park, Turing and Good used various pioneering, effectively Bayesian procedures including Empirical alternatives to Bayes factors as measures of evidence Effectively Bayesian sequential analysis and decision-tree analysis Shrinkage estimators for multinomial cell probabilities, which smooth the relative frequencies of the letters in the German code towards a common value, Thomas Tallis 1988-NotDeadYet Adam Empirius Logan "If Bayesians live to be a hundred they think they think they've got it made, Very few people die past that age." If we deduce that knowledge comes from irrationality and out of rationality comes rationality then we must also deduce that most

of our conventional knowledge derives from the senses and that every rational saying is a pragmatic lie (Adam Logan, Farewell Halcyon Days, 2013)