Earliest Known Uses of Some of the Words of Mathematics (M)

Last revision: July 11, 2007

M-ESTIMATOR. In his "Robust Estimation of a Location Parameter," Annals of Mathematical Statistics, 35, (1964), 73-111 Peter J. Huber considers a class of estimators analogous to least squares but in which another function of the errors is minimised. Huber called such estimators "(M)-estimators." The brackets were later discarded and the abbreviation "M-estimator" has become standard.

MACLAURIN'S SERIES is named for Colin Maclaurin (1698-1746).

Maclaurin's theorem appears in 1820 in A Collection of Examples of the Applications of the Differential and Integral Calculus by George Peacock [Google print search].

In 1849, An Introduction to the Differential and Integral Calculus, 2nd ed., by James Thomson has: "A particular case of this formula is commonly called Maclaurin's theorem, because it was first made generally known by that writer. It had been given previously, however, by Stirling, another Scotch mathematician; and therefore, if a particular case of Taylor's general theorem should be named after any other mathematician, this ought to be called Stirling's theorem." Thomson subsequently uses the term Stirling's theorem throughout the book.

McLaurin's formula is found in English in 1855 in Elements of the differential and integral calculus by Albert Ensign Church [University of Michigan Digital Library].

Les séries de Taylor et de Maclaurin is found in 1870 in J. Bourget, "Note sur les séries de Taylor et de Maclaurin," Nouv. Ann.

Maclaurin's series is found in English in 1831 in the second edition of Elements of the Differential Calculus (1836) by John Radford Young: "All that is meant is, that the function in particular states may fail to be developable according to Taylor's series, and under particular forms it may fail to be developable according to Maclaurin's series; so that, in fact, these theorems fail to give the true development only when that development is impossible" [James A. Landau].

C. B. Boyer A History of Mathematics (1968, p. 469) comments. "In view of the striking results of Maclaurin in geometry, it is ironic that today his name is recalled almost exclusively in connection with a portion of analysis in which he had been anticipated by some half dozen earlier workers."

MAGIC SQUARE is found in the title Des quassez ou tables magiques by Frenicle de Bessy (1605-1675).

The first citation in the OED2 is in 1704 in Lexicon technicum, or an universal English dictionary of arts and sciences by John Harris.

Benjamin Franklin used the term in his autobiography:

This latter station was the more agreeable to me, as I was at length tired with sitting there to hear debates, in which, as clerk, I could take no part, and which were often so unentertaining that I was induc'd to amuse myself with making magic squares or circles, or any thing to avoid weariness; and I conceiv'd my becoming a member would enlarge my power of doing good.
Franklin also used the term in a letter in which he wrote, "I make no question, but you will readily allow the square of 16 to be the most magically magical of any magic square ever made by any magician" (Cajori 1919, page 170).

The term MANDELBROT SET was coined by Adrien Douady, according to an Internet web page.

MANIFOLD was introduced as Mannigfaltigkeit by Bernhard Riemann (1826-1866) in Grundlagen für eine Allgemeine Theorie der Functionen, published (posthumously) in 1867, Werke p. 3 [Mark Dunn].

MANTISSA is a late Latin term of Etruscan origin, originally meaning an addition, a makeweight, or something of minor value, and was written mantisa. In the 16th century it came to be written mantissa and to mean appendix (Smith vol. 2, page 514).

Numerous sources, including Smith (vol. 2, page 524), Boyer (page 345), the Century Dictionary (1889-97), and Webster's New International Dictionary (1909), claim that mantissa was introduced by Henry Briggs (1561-1631) in 1624 in Arithmetica logarithmica. However, this information apparently is incorrect. Johannes Tropfke in his "Geschichte der Elementar-Mathematik, vol. 2, 3rd edition 1933, says "Das Fachwort Mantisse hatte Briggs noch nicht" (p. 252). [Christoph J. Scriba]

According to Cajori (1919, page 152), the word mantissa was first used by John Wallis in 1693:

Ejusque partes decimales abscissas, appendicem voco, sive mantissam.
The citation above is from "Opera mathematica," vol. 2, Oxoniae, 1693 (De Algebra tractatus), page 41. This is in the Latin edition, and not in the original edition of 1685, in which Wallis uses the English word "appendage." According to Julio González Cabillón, this is the first use of the term to mean "the decimal part of any number."

Mantissa was also used by Leonhard Euler in 1748:

Constat ergo logarithmus quisque ex numero integro et fractione decimali et ille numerus integer vocari solet characteristica, fractio decimalis autem mantissa. (The logarithm consists of an integral part, called the characteristic, and a decimal fraction, called the mantissa.)
The citation above is from Euler's Introductio in analysin infinitorum, vol. 1, page 83 (Lausannae 1748). According to Julio González Cabillón, this is the first use of the term to mean "the decimal part of a logarithm." According to Smith (vol. 2, page 514), the word was not commonly used until its adoption by Euler.

Gauss suggested using the word for the fractional part of all decimals: "Si fractio communis in decimalem convertitur, seriem figurarum decimalium ... fractionis mantissam vocamus ..." (Smith vol. 2, page 514).

MANY-VALUED is found in 1893 in J. Harkness and F. Morley, Treatise on the Theory of Functions 36 (OED Online).

MAPPING is found in "On the Metric Geometry of the Plane N-Line," F. Morley, Transactions of the American Mathematical Society, Vol. 1, No. 2. (Apr., 1900).

MARGIN OF ERROR. This asssessment of the accuracy of opinion polls was formulated by the statistician Leslie Kish (1910-2000), although the phrase margin of error was in common use in the 19th century.

MARKOV CHAIN. A. A. Markov (1856-1922) introduced chains in 1906 in a paper extending the law of large numbers to sums of dependent variables. (E. Seneta "Markov, Andrei Adreyevich" in Encyclopedia of Statistical Science, 5, 246-249. New York: Wiley.).

The phrase les chaînes de Markoff is found in V. Romanovsky, “Sur les chaînes de Markoff,” C. R. de l'Académie de l'U. R. S. S., 1929, A, n°. 9, p. 203-208. [Thomas Weber]

The term is found in English in 1938 in American Mathematical Monthly, 45, p. 410 [Mark Dunn, JSTOR].


MARKOV CHAIN MONTE CARLO. This method was proposed for solving the state equations of statistical mechanics by N. Metropolis, A.W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. "Equations of State Calculations by Fast Computing Machines," Journal of Chemical Physics, 21, 1953, 1087-1092. It was later adopted by statisticians: see W. K. Hastings "Monte Carlo Sampling Methods Using Markov Chains and Their Applications," Biometrika, 57, (1970), 97-109. The name "Markov chain Monte Carlo" seems to have taken off around 1990 when the method first attracted wide attention: see e.g. Charles J. Geyer "Practical Markov Chain Monte Carlo" Statistical Science, 7, (1992), 473-483. See the entry MONTE CARLO.

MARKOV PROCESS. The term comes from the analogy with Markov chain; Markov did not study Markov processes. The name appears in A. Khintchine "Korrelationstheorie der Stationären Stochastischen Prozesse", Math. Ann. 109 (1934), 604-615 although the process had already been investigated by A. N. Kolmogorov "Über die analytischen Methoden in der Wahrscheinlichkeitsrechnung," Math. Ann. 104, (1931), 415-458. See E. B. Dynkin "Kolmogorov and the Theory of Markov Processes," Annals of Probability, 17, (1989), 822-833.)

The English term appears in 1938 in J. L. Doob "Stochastic Processes With an Integral Valued Parameter," Transactions of the American Mathematical Society, 44, p. 102 [Mark Dunn, JSTOR].


MARKOV’S INEQUALITY. According to Oscar Sheynin Theory of Probability: A Historical Essay (p. 166) Markov published the result in 1900. It is referred to as Markov’s inequality in L. Bortkiewicz’s Die Iterationen, Ein Beitrag zur Wahrscheinlichkeitstheorie (1917).

The term MARRIAGE THEOREM was introduced by Hermann Weyl (1885-1955) in "Almost periodic invariant vector sets in a metric vector space", Amer. J. Math. 71 (1949), 178-205, according to Konrad Jacobs in Measure and Integral, Academic Press, 1978. The theorem is also called "Hall's theorem" or "Hall's marriage theorem" since it was first proved by Philip Hall in 1935 [Carlos César de Araújo].

MARTINGALE. The original sense is given in the OED: "a system in gambling which consists in doubling the stake when losing in the hope of eventually recouping oneself." The oldest quotation is from 1815 but the nicest is from 1854: Thackeray in The Newcomes I. 266 "You have not played as yet? Do not do so; above all avoid a martingale if you do."

J. Venn in his Logic of Chance (1888) wrote that the possibility that "by mere persistency [the martingale player] may accumulate any sum of money he pleases, in apparent defiance of all that is meant by luck" has been "a source of perplexity to persons of considerable acutenesss."

There was an early discussion by C. Babbage ("An Examination of Some Questions Connected with Games of Chance" Trans. Royal Soc. Edinburgh, 9 (1821) 153-177).

The martingale of modern probability theory is a mathematical model of a fair game and so is different from the martingale as a gambling system. The connection is a theorem that the martingale system will not change a fair game into an unfair game--an old martingale is a new martingale. J. Ville's Étude Critique de la Notion de Collectif (1939) begins by discussing old martingale in the context of von Mises's requirement that with a random sequence a successful gambling system is impossible and goes on to define a (new) martingale as "un jeu équitable."

J. L. Doob’s Stochastic Processes (1954) made the martingale an important chapter of probability theory. In 1940 Doob wrote about “chance variables with the property E” (“Regularity Properties of Certain Families of Chance Variables,” Transactions of the American Mathematical Society, 47, 455-486.) See A Conversation with Joe Doob Statistical Science 1997 and R. Mansur Histoire de martingales. Mathématiques et sciences humaines, 2005. [This entry was contributed by John Aldrich.]

MATH and MATHS. The phrase "Math: books" is found in the writings of Isaac Newton. Apparently the colon indicates this is an abbreviation [James A. Landau, Axel Harvey].

The first citation for maths in the OED2 is 1911: "The Answers to Maths. Ques. were given us all this morning." This citation is from the collected letters of Wilfred Edward Salter Owen, published in 1967.

Maths is found in Wireless World in 1917: "Extremely 'rusty' in 'maths'" (OED2). It is unclear whether a period is used to indicate an abbreviation or the end of the sentence or both.

The earliest use of math in the OED2 in which it is clear that no period is intended is in 1924 in P. Marks, Plastic Age: "I'm talking about the copying of math problems and the using of trots." However, there are a number of earlier uses in which the word ends a sentence, so that it is unclear whether the writer would have used a period to indicate an abbreviation.

The earliest use of maths in the OED2 in which a period is clearly absent is in the Times of Sept. 8, 1959: "Royal Australian Air Force. Education Officers required with Majors in Maths or Physics."


The term MATHEMATICAL INDUCTION was introduced by Augustus de Morgan (1806-1871) in 1838 in the article Induction (Mathematics) which he wrote for the Penny Cyclopedia. De Morgan had suggested the name successive induction in the same article and only used the term mathematical induction incidentally. The expression complete induction attained popularity in Germany after Dedekind used it in a paper of 1887 (Burton, page 440; Boyer, page 404).


MATHEMATICAL LOGIC became an official term in the 1890s but before then the phrase could be found in various contexts. In 1850 in Grammar of arithmetic; or, An analysis of the language of figures and science of numbers Charles Davies wrote: "In explaining the science of Arithmetic, great care should be taken that the analysis of every question and the reasoning by which the principles are proved, be made according to the strictest rules of mathematical logic." [University of Michigan Digital Library].

From the time of Boole's Mathematical Analysis of Logic (1847) there was a body of work to which the phrase "mathematical logic" might be applied. The OED has a nice quotation from John Venn touching on the improbability of such a study: "What with the logicians who hate mathematics, and the mathematicians who despise logic, a theory of so-called mathematical logic does not find many friends." (Princeton Review, (1880), p. 248.)

Mathematical logic arrived for good in the 1890s. Grattan-Guinness (2000, p. 234) writes that in 1891 Peano launched the Rivista di matematica "with two papers on the subject to which he gave the name that it still carries." The papers were "Principi di logica mathematica" and "Formolo di logica mathematica."

The first course on mathematical logic in Britain was given by Bertrand Russell in Cambridge in the winter of 1901-2. (Grattan-Guinness (2000, p. 331)

This entry was contributed by John Aldrich. See LOGIC.

MATHEMATICAL RIGOR. Leonhard Euler used a term in 1755 in Institutiones calculi differentialis which is rendered "mathematical rigor" in an English translation.

Rigor is found in English in 1831 in On the study and difficulties of mathematics by Augustus De Morgan: "But the rigour of this science is carried one step further; for no property, however evident it may be, is allowd to pass without demonstration, if that can be gvien" [University of Michigan Historical Math Collection].

MATHEMATICAL STATISTICS. Mathematische Statistik is found in 1867 in the title Mathematische Statistik und deren Anwendung auf National-Oekonomie und Versicherungs-Wissenschaft by T. Wittstein (David, 1998).

Mathematical statistics is found in English in 1918 in the title Introduction to Mathematical Statistics by C. J. West (David, 1998).

MATHEMATICIAN. St. Augustine (354-430) used the Latin word mathematicus in Book 2 of De Genesi ad litteram: "Quapropter bono christiano, sive mathematici, sive quilibet impie divinantium, maxime dicentes vera, cavendi sunt, ne consortio daemoniorum irretiant." An widely-quoted English translation has: "The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell." However, mathematicus is more properly translated "astrologer" and a 1982 translation by J. H. Taylor, S. J., in the series Ancient Christian Writers has: "Hence, a devout Christian must avoid astrologers and all impious soothsayers, especially when they tell the truth, for fear of leading his soul into error by consorting with demons and entangling himself with the bonds of such association" [Barry Cipra].

Mathematician is found in English in Higden's Polychronicon, translated 1432-50. (The word is spelled "mathematicions.") (OED2).

MATHEMATICS. Pythagoras is said to have coined the words philosophy for "love of wisdom" and mathematics for "that which is learned."

Mathematics is found in English in 1581 in Positions, wherein those primitive circumstances be examined, which are necessarie for the training up of children by Richard Mulcaster. (The word is spelled "mathematikes.") (OED2)

The term MATRIX was introduced into mathematics by James Joseph Sylvester (1814-1897) in 1850. Matrix was a long-established word with the meaning of “the place from which something else originates.” For Sylvester the “something else” was a determinant of some description:

[...] For this purpose we must commence, not with a square, but with an oblong arrangement of terms consisting, suppose, of m lines and n columns. This will not in itself represent a determinant, but is, as it were, a Matrix out of which we may form various systems of determinants by fixing upon a number p, and selecting at will p lines and p columns, the squares corresponding of pth order.

“Additions to the Articles On a new class of theorems, and On Pascal's theorem,” Philosophical Magazine, pp. 363-370, 1850. Reprinted in Sylvester's Collected Mathematical Papers, vol. 1, pp. 145-151, Cambridge (At the University Press), 1904, page 150.

Sylvester used the term on more than one occasion but it was his friend Cayley who treated the “oblong arrangement” as an object in its own right and developed an algebra of matrices in papers of 1855 [“Recherches sur les Matrices ...” Coll Math Papers, I, 216-20] and 1858 [“A Memoir on the Theory of Matrices” Coll Math Papers, I, 475-96]. See Katz (1993) and Kline p. 804.

Charles L. Dodgson (Lewis Carroll) considered Cayley’s use of the word a misuse. In his Elementary Treatise on Determinants (1867) Dodgson preferred the term block to matrix: “I am aware that the word 'Matrix' is already in use to express the very meaning for which I use the word 'Block'; but surely the former word means rather the mould, or form, into which algebraical quantities may be introduced, than an actual assemblage of such quantities...”

There are useful historical notes and references in Appendix I of J. H. M. Wedderburn Lectures on Matrices (1934). Wedderburn (p. 169) points out that the algebra of matrices was re-discovered by Laguerre in 1867 and by Frobenius in 1878. The paper by Frobenius is a very impressive contribution to matrix theory. However the term matrix does not appear in “Ueber lineare Substitutionen und bilineare Formen,” J. reine angew. Math. Vol. 84 (1878) pp.1-63 or in other papers by Frobenius before 1894. It was then that he learnt of Cayley’s work and adopted Cayley’s term.

This entry was contributed by Randy K. Schwartz, Julio González Cabillón, and John Aldrich. A list of matrix and linear algebra terms having entries on this web site is here.

MATRIX MECHANICS was developed in 1925 by Heisenberg, Born and Jordan. The English phrase appeared almost immediately. The OED quotes Dirac from 1926: “In Heisenberg's matrix mechanics it is assumed that the elements of the matrices that represent the dynamical variables determine the frequencies and intensities of the components of the radiation emitted.” From “On the Theory of Quantum Mechanics,” Proc. Royal Soc. A. 112, p. 666.

MATROID. In a effort to axiomatize the notion of "independence" that arises in graph theory and in vector spaces theory, Hassler Whitney coined the term "matroid" and introduced it in his fundamental paper On the abstract properties of linear independence, Amer. J. Math. 57 (1935) 509-533. The choice of the name arose because he took as an initial model the finite sets of linearly independent column vectors of a matrix over a field. In his paper Whitney gave several equivalent characterizations of a matroid, but the general idea is that of a finite set endowed with a "independence structure" (just as a topological space is a set endowed with a "closeness structure"). Extensions to infinite sets and additional contributions were made by Saunders Mac Lane (1936), R. Rado (1942), W. T. Tutte (1961) and many others. [Carlos César de Araújo]

MAXIMAL (of an element in an ordered or partially ordered set) is found in 1896 in Annals of Math. vol. 11, p. 169 [Mark Dunn, JSTOR].

MAXIMUM and MINIMUM (of a real-valued function) is found in 1743 in W. Emerson, Doctrine of Fluxions [Mark Dunn].

MAXIMUM LIKELIHOOD. The method has been traced back to Daniel Bernoulli’s "Diiudicatio maxime probabilis plurium observationem discrepantium atque verisimillima inductio inde formanda." Acta Acad. Sci. Imp. Petrop., 1777 (1778), 1, 3-23. This has been translated into English by C. G. Allen as "The most probable choice between several discrepant observations and the formation therefrom of the most likely induction" and appears, with a note by M. G. Kendall "Daniel Bernoulli on Maximum Likelihood," in Biometrika, 1961, 48, 1-18. However the modern use of the method dates from the work of R. A. Fisher. Fisher introduced the term maximum likelihood in his "On the Mathematical Foundations of Theoretical Statistics" (Phil. Trans. Royal Soc. Ser. A. 222, (1922), p. 323.) Previously he had used two terms. "The optimum" of "On the "Probable Error" of a Coefficient of Correlation Deduced from a Small Sample" (Metron, 1, (1921), 3-32) is the value that maximizes the likelihood. However Fisher's use of the method pre-dated the elaboration of his ideas about likelihood: the "absolute criterion" of 1912 is maximum likelihood: "On an Absolute Criterion for Fitting Frequency Curves" Messenger of Mathematics, 1912, 41: 155-160.

This entry was contributed by John Aldrch. See LIKELIHOOD.

MAXWELL DISTRIBUTION. J. C. Maxwell gave this distribution as the solution of the problem on the distribution of velocities of molecules in an ideal gas in his "Illustrations of the Dynamical Theory of Gases," Philosophical Magazine, 19, (1860), 19-32.

MEAN. Sir Thomas Heath in his History of Greek Mathematics, volume 1 (1921, p. 85) writes that Pythagoras "discovered the dependence of musical intervals on numerical ratios, and the theory of means was developed very early in his school with reference to the theory of music and arithmetic. ... [There] were three means, the arithmetic, the geometric and the subcontrary." The last was later renamed the 'harmonic.' For more on music and means, see the entry HARMONIC MEAN.

Mean occurs in English in the sense of a geometric mean in a Middle English manuscript of circa 1450 known as The Art of Numbering: "Lede the rote of o quadrat into the rote of the oþer quadrat, and þan wolle þe meene shew" [Mark Dunn].

In 1571, A geometrical practise named Pantometria by Thomas Digges (1546?-1595) has: "When foure magnitudes are...in continual proportion, the first and the fourth are the extremes, and the second and thirde the meanes" (OED2).

Mean is often used as an abbreviation for arithmetic mean. This is not a new practice: see e.g. Thomas Simpson's On the Advantage of Taking the Mean of a Number of Observations Philosophical Transactions of the Royal Society of London 1755.

In statistical mechanics, probability and statistics mean has often meant expectation; e.g. the "mean velocity" of molecules in J. Clerk Maxwell's "On the Dynamical Theory of Gases (Philosophical Transactions of the Royal Society, 157, (1867) p. 64).

Mean is one of the most common terms in Mathematics. As a noun it appears in such constructions as Hölder mean and Cesàro mean and as an adjective in such constructions as mean square error.

See ARITHMETIC MEAN, AVERAGE, CESÀRO MEAN, EXPECTATION, GEOMETRIC MEAN, HARMONIC MEAN, HÖLDER MEAN and WEIGHT, for the weighted mean. See also Symbols in Statistics on the Symbols in Probability and Statistics page.

MEAN CURVATURE appears in 1840 in J. R. Young, Mathematical Dissertations (1841). (The preface is dated Nov. 25, 1840.) According to James A. Landau, who provided this citation, Young specialized in introducing recent French developments in geometry (particularly those of Monge) to English-speaking readers, so that it is possible that this is the first appearance of "mean curvature" in English.

MEAN ERROR was a standard term in the 19th century theory of errors. Gauss introduced it in Theoria combinationis observationum erroribus minimis obnoxiae (Theory of the combination of observations least subject to error) (1821, p. 7), in connection with the integral

where x is an error and φ its density function: "quantitatem m vocabimus errorem medium metuendum, sive simpliciter errorem medium ..." [We will call m the mean error to be feared, or simply the mean error ...]. Gauss adopted a decision theory approach, arguing that an error (of an observation, or quantity derived from observations) generates a loss ("iactura") and of the many possible loss functions the quadratic loss function is simplest. The expected loss is m2. See the entry on DECISION THEORY.

The German term was "die mittlere Fehler": see e.g. F. R. Helmert Die Ausgleichsrechnung nach der Methode der kleinsten Quadrate (1872, p. 12). It was used with the same flexibility--or ambiguity--as the later term standard deviation, which replaced it in some uses.

In Higher Mathematics for Students of Chemistry and Physics (1912), J. W. Mellor writes:

In Germany, the favourite method is to employ the mean error, which is defined as the error whose square is the mean of the squares of all the errors, or the "error which, if it alone were assumed in all the observations indifferently, would give the same sum of the squares of the errors as that which actually exists." ...

The mean error must not be confused with the "mean of the errors," or, as it is sometimes called, the average error, another standard of comparison defined as the mean of all the errors regardless of sign.

Mellor’s footnote testifies to the confusion in terminology, "Some writers call our "average error" the "mean error," and our "mean error" the "error of mean square". The latter usage can be found in G. B Airy's 1861 book, On the Algebraical and Numerical Theory of Errors of Observation and the Combination of Observations. [James A. Landau]

This entry was contributed by John Aldrich. See STANDARD DEVIATION.

MEANS. According to Smith (vol. 2, page 483), "The terms 'means,' 'antecedent,' and 'consequent' are due to the Latin translators of Euclid."

MEAN SQUARE is found in 1838 in An Essay on Probabilities, and Their Application to Life Contingencies and Insurance Offices by Augustus De Morgan. [Google print search]

The term MEAN SQUARE DEVIATION (apparently meaning variance) appears in a paper published by Sir Ronald Aylmer Fisher in 1920 A Mathematical Examination of the Methods of Determining the Accuracy of an Observation by the Mean Error, and by the Mean Square Error. [James A. Landau].

MEAN VALUE THEOREM. Theorem of mean value is found in 1891 in "An Introduction to the Study of the Elements of the Differential and Integral Calculus" by Axel Harnack [Google print search].

Mean value theorem is found in 1899 in "Note on the Convergence of Definite Integrals" by J. K. Whittemore in The Annals of Mathematics 2nd Ser., Vol. 1, No. 1/4: "Since 1/x does not change sign between x = a1 and x = a2 we may apply the mean value theorem for integrals." [JSTOR search]

The term MEASURABLE FUNCTION was used by Arnaud Denjoy (1884-1974) (Kramer, p. 648).

An early use of the term is N. Lusin, "Sur les propriétés des fonctions mesurables," Comptes Rendua Acad. Sci. Paris, 154 (1912).

MEASURE. The following articles feature some uses of the term measure.

Giuseppe Vitali, Sul problema della misura dei gruppi di punti di una retta Bologna: Tip. Gamberini e Parmeggiani (1905).

"On Non-Measurable Sets of Points, with an Example," Edward B. Van Vleck, Transactions of the American Mathematical Society, Vol. 9, No. 2 (Apr., 1908): "Lebesgue's theory of integration is based on the notion of the measure of a set of points, a notion introduced by BOREL and subsequently refined by LEBESGUE himself."

Nikolai Luzin, "Sur les propriétês des fonctions mesurables," Comptes Rendus Hebdomadaires des Séances de l'Académie des Sciences 154 (1912).

Waclaw Sierpinski, "Sur quelques problèmes qui impliquent des fonctions non-mesurables," Comptes Rendus Hebdomadaires des Séances de l'Académie des Sciences 164 (1917).

Henri Lebesgue, "Remarques sur les théories de la mesure et de l'intégration," Annales Scientifiques de l'Ecole Normale Supérieure (3) 35, pp 191-250 (1918) [James A. Landau].

Émile Borel (1871-1956), who created the theory of the measure of sets of points, wrote: "La définition de la mesure des ensembles linéaires bien définis m'est entièrement due" (The definition of the measure of well defined linear sets, is entirely due to me) [Udai Venedem].

MECHANICAL QUADRATURE is found in F. G. Mehler, "Bemerkungen zur Theorie der mechanischen Quadraturen," J. Reine angew. Math 63 (1864) [James A. Landau].

MEDIAN (in statistics). Valeur médiane was used by Antoine A. Cournot in 1843 in Exposition de la Théorie des Chances et des Probabilités (pp. 119-20) (David, 1998).

Median was used in English by Francis Galton in Report of the British Association for the Advancement of Science [Tables and discussion of range in height, weight and strength] in 1881: "The Median, in height, weight, or any other attribute, is the value which is exceeded by one-half of an infinitely large group, and which the other half fall short of." (OED2).

See also MEAN and MODE.

MEDIAN (of a triangle) is found in 1876 in Lessons in elementary mechanics. Introductory to the study of physical science by Sir Philip Magnus, with emendations and introduction by Prof. DeVolson Wood: "In the same way it may be shown that the centre of gravity of the triangle is in the median CE (fig. 109). Hence the centre of gravity of the triangle is at G, where the two medians intersect" [University of Michigan Digital Library].

MEDIATE is found in Dorothy Wrinch, "On Mediate Cardinals," American Journal of Mathematics 45 (1923) [James A. Landau].

MENTAL ARITHMETIC is found in 1766 in H. Brooke, Fool of Quality, vol. I., p. 260: "I cast up, in a pleasing kind of mental arithmetic, how much my weekly twenty guineas would amount to at the year's end" [Mark Dunn].


MERSENNE NUMBER is found in É. Lucas, Récréations Mathématiques, tome II, Note II, "Sur les nombres de Fermat et de Mersenne" (1883).

Mersenne's number is found in English in the title "Mersenne's numbers" by W. W. Rouse Ball in Messenger of Mathematics in 1891.

Mersenne number is found in English in the 1911 Encyclopaedia Britannica: "Similar difficulties are encountered when we examine Mersenne's numbers, which are those of the form 2p - 1, with p a prime; the known cases for which a Mersenne number is prime correspond to p = 2, 3, 5, 7, 13, 17, 19, 31, 61" (OED2).

Mersenne prime is found in English in 1943 in American Math. Monthly, vol. 50, p. 29 [Mark Dunn, JSTOR].

MESSENGER PROBLEM. In 1930, Karl Menger (1902-1985) mentioned the messenger problem, referring to the problem of finding the shortest Hamiltonian path, according to an Internet web page.

META-ANALYSIS. The term was introduced by Gene V. Glass (1976) "Primary, Secondary, and Meta-analysis of Research," Educational Researcher, 5, 3-8: "I use [the term] to refer to the statistical analysis of a large collection of results from individual studies for the purpose of integrating the findings."

Meta-analysis has become a very active area of statistical research. Naturally, pioneers have been identified, including Karl Pearson, "Report on Certain Enteric Fever Inoculation Statistics," British Medical Journal, 3, (1904) 1243-1246, R. A Fisher "The Combination of Probabilities from Tests of Significance," §21.1 of Statistical Methods for Research Workers (4th edition 1932) and F. Yates & W. G. Cochran "The Analysis of Groups of Experiments," Journal of Agricultural Science, 28, (1938), 556-580.

METABELIAN GROUP appears in William Benjamin Fite, "On Metabelian Groups," Transactions of the American Mathematical Society 3 (July, 1902): "We define a Metabelian Group as a group whose group of cogredient isomorphisms is abelian."

The term METAMATHEMATICS goes back to the 1870s where it was used as a pejorative (intending to put it in the same light as metaphysics) in discussions of non-Euclidean geometries.

In the 1890 Funk & Wagnalls Dictionary the word is defined as "The philosophy or metaphysics of mathematics."

The word was first used in its modern sense by David Hilbert (1862-1943) in a 1922 lecture and it appears, as metamathematik, in 1923 in "Die logischen Grundlagen der Mathematik" Math. Ann. 88. p. 153. [Michael Detlefsen, Carlos César de Araújo]

METHOD OF EXHAUSTION. The Flemish Jesuit mathematician Gregorius a Sancto Vincentio (or Gregory St. Vincent) (1584-1667) was "probably the first to use the word exhaurire in a geometrical sense" (Cajori 1919). The term method of exhaustion arose from this word.

Vincentio used the term in 1647, according to A Concise History of Mathematics by Dirk J. Struik, third edition.

Method of exhaustions appears in English in 1685 in Treat. Algebra by John Wallis: "It will be necessary to premise somewhat concerning (what is wont to be called) the Method of Exhaustions" (OED2).

The term METHOD OF LEAST SQUARES was coined by Adrien Marie Legendre (1752-1833), appearing in Sur la Méthode des moindres quarrés [On the method of least squares], the title of an appendix to Nouvelles méthodes pour la détermination des orbites des comètes (1805). The appendix is dated March 6, 1805. A much more sophisticated treatment appeared soon after: Gauss’s Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientum (The Theory of the Motion of Heavenly Bodies moving around the Sun in Conic Sections) of 1809. There was a dispute about priority for Gauss claimed he had been using the method since 1795.

"Minimum" and "small" were the early English translations of moindres but Method of least squares occurs in English in 1825 in the title "On the Method of Least Squares" by J. Ivory in Philosophical Magazine, 65, 3-10.

This entry was contribugted by James A. Landau, based on David (1995). See the entries ERROR, GAUSSIAN, GAUSS-MARKOV THEOREM.

The term METRIC SPACE is due to Felix Hausdorff (1869-1942) who gave axioms for the metrischer Raum in his Grundzüge der Mengenlehre (1914, pp. 211-2). Hausdorff’s axioms governing "die Entfernung" were based on Fréchet’s treatment of "l’écart" in "Sur quelques points du calcul fonctionnel," Rendiconti del Circolo matematico di Palermo, 22, (1906) pp. 1-67.

Metric space is found in English in E. W. Chittenden; A. D. Pitcher "On the Theory of Developments of an Abstract Class in Relation to the Calcul Fonctionnel," Transactions of the American Mathematical Society, 20, (1919), 213-233. (JSTOR)

METRIC SYSTEM. Noah Webster's 1806 dictionary has the heading "New French Weights and Measures."

In 1821 John Quincy Adams used the terms French system and French metrology.

Webster's dictionary of 1828 refers to French measure.

Metric system apparently is found in 1829 in The London encyclopaedia, or, Universal dictionary of science, art, literature, and practical mechanics [Google print search without document view].

French metric system appears in 1831 in An historical inquiry into the production and consumption of the precious metals by William Jacob. [Google print search]

Metric system is found in 1833 in The Military and Naval Magazine of the United States. [Google print search]

Decimal system appears in January 1844 in The Southern quarterly review: "These units, multiplied or divided by ten, ad infinitum, formed the beautiful decimal system of the French, which surpasses all others."

In May 1854, Debow's review, Agricultural, commercial, industrial progress and resources uses the terms the decimal system of measures, French metrical system, metrical-decimal system, and decimal-metrical system of France.

The term French decimal system is used in 1857 in Mathematical Dictionary and Cyclopedia of Mathematical Science.

Gram is found in English in Aug. 1797 in Nicholson's Journal where it is spelled "gramme." Kilogram and liter are found in English in Aug. 1797 in Journal of Natural Philosophy. Kilometer, milliliter, millimeter, and milligram are found in English in Noah Webster's 1806 A Compendious Dictionary of the English Language, although kilometer is spelled "chiliometer."

Metric ton is found in 1871 in Chemistry, general, medical, and pharmaceutical, including the chemistry of the U.S. pharmacopoeia by John Attfield: "The Metric Ton of 1000 Kilo-grammes = 19 cwt. 2 qrs. 20 lbs. 10 ozs" [University of Michigan Digital Library Project].

Micron (one millionth of a meter) was coined by Johann Benedict Listing (1808-1882), according to Breitenberger (1999). The OED2 shows a use of the word in French in 1880 in Procès-Verbaux des Séances du Comité Internat. des Poids et Mesures 1879.

MILLER-RABIN TEST is found in H. W. Lenstra, Jr. "Primality testing," Number theory and computers, Studyweek, Math. Cent. Amsterdam 1980, and in Louis Monier, "Evaluation and comparison of two efficient probabilistic primality testing algorithms," Theor. Comput. Sci., 12 (1980).

Related terms are found in H. W. Lenstra, Jr., "Miller's primality test," Inf. Process. Lett. 8 (1979) and Tore Herlestam, "A note on Rabin's probabilistic primality test," BIT, Nord. Tidskr. Informationsbehandling 20 (1980).

MILLIARD. Gulielmus Budaeus (1467-1540) used the term in his De Asse et Partibus eius Libri V. In the Paris edition of 1532, the following appears: "hoc est denas myriadu myriadas, quod vno verbo nostrates abaci studiosi Milliartu appellat, quasi millionu millione" (Smith vol. 2, page 85).

MILLION, BILLION, etc. The following is taken from Smith (vol. 2, pages 80-86):

One of the most striking features of ancient arithmetic is the rarity of large numbers. There are exceptions, as in some of the Hindu traditions of Buddha's skill with numbers, in the records on some of the Babylonian tablets, and in the Sand Reckoner of Archimedes with its number system extending to 1063, but these are all cases in which the élite of the mathematical world were concerned; the people, and indeed the substantial mathematicians in most cases, had little need for or interest in numbers of any considerable size.

The word "million," for example, is not found before the 13th century, and seems to have come into use in England even later. William Langland (c. 1334-c. 1400), in Piers Plowman, says,

Coueyte not his goodes
For millions of moneye,
but Maximus Planudes (c. 1340) seems to have been among the first of the mathematicians to use the word. By the 15th century it was known to the Italian arithmeticians, for Ghaligai (1521; 1552 ed., fol. 3) relates that "Maestro Paulo da Pisa" read the seventh order as millions. It first appeared in a printed work in the Treviso arithmetic of 1478. Thereafter it found place in the works of most of the important popular Italian writers, such as Borghi (1484), Pellos (1492), and Pacioli (1494), but outside of Italy and France it was for a long time used only sparingly. Thus, Gemma Frisius (1540) used "thousand thousand" in his Latin editions, which were published in the North, while in the Italian translation (1567) the word millioni appears. Similarly, Clavius carried his German ideas along with him when he went to Rome, and when (1583) he wished to speak of a thousand thousand he almost apologized for using "million," referring to it as an Italian form which needed some explanation.

In Spain the word cuento was early used for 106, the word million being reserved for 1012. When the latter word was adopted by mathematicians, it was slow in coming into general use.

France early took the word "million" from Italy, as when Chuquet (14848) used it, being followed by De la Roche (1520), after which it became fairly common.

The conservative Latin writers of the 16th century were very slow in adopting the word. Even Tonstall (1522), who followed such eminent Italian writers as Pacioli, did not commonly use it. He seems to have been influenced by the fact that the Romans had no use for large numbers; or by the fact that, for common purposes, it sufficed to say "thousand thousand" as had been done for many generations. He simply mentions the word as a piece of foreign slang to be avoided. Other Latin writers were content to say "thousand thousand."

The German writers were equally slow in abandoning "thousand thousand" for "million," most of the writers of the 16th century preferring the older form. The Dutch were even more conservative, continuing the old form later than the writers in the neighboring countries. Indeed, for the ordinary needs of business in the 16th century, the word "million" was a luxury rather than a necessity.

England adopted the Italian word more readily than the other countries, probably owing to the influence of Recorde (c. 1542). It is interesting to see that Poland was also among the first to recognize its value, the word appearing in the arithmetic of Klos in 1538.

Until the World War of 1914-1918 taught the world to think in billions there was not much need for number names beyond millions. Numbers could be expressed in figures, and an astronomer could write a number like 9.15 · 107, or 2.5 · 1020, without caring anything about the name. Because of this fact there was no uniformity in the use of the word "billion." It meant a thousand million (109) in the United States and a million million (1012) in England, while France commonly used milliard for 109, with billion as an alternative term.

Historically the billion first appears as 1012, as the English use the term. It is found in this sense in Chuquet's number scheme (1484), and this scheme was used by De la Roche (1520), who simply copied parts of Chuquet's unpublished manuscript, but it was not common in France at this time, and it was not until the latter part of the 17th century that it found place in Germany. Although Italy had been the first country to make use of the word "million," it was slow in adopting the word "billion." Even in the 1592 edition of Tartaglia's arithmetic the word does not appear. Cataldi (1602) was the first Italian writer of any prominence to use the term, but he suggested it as a curiosity rather than a word of practical value. About the same time the term appeared in Holland, but it was not often recognized by writers there or elsewhere until the 18th century, and even then it was not used outside the schools. Even as good an arithmetician as Guido Grandi (1671-1742) preferred to speak of a million million rather than use the shorter term.

The French use of milliard, for 109, with billion as an alternative, is relatively late. The word appears at least as early as the beginning of the 16th century as the equivalent both of 109 and of 1012, the latter being the billion of England today. By the 17th century, however, it was used in Holland to mean 109, and no doubt it was about this time that the usage began to change in France.

As to the American usage, taking a billion to mean a thousand million and running the subsequent names by thousands, it should be said that this is due in part to French influence after the Revolutionary War, although our earliest native American arithmetic, the Greenwood book of 1729, gave the billion as 109, the trillion as 1012, and so on. Names for large numbers were the fashion in early days, Pike's well-known arithmetic (1788), for example, proceeding to duodecillions before taking up addition.

Million appears in the King James Bible: "And they blessed Rebekah, and said unto her, Thou art our sister, be thou the mother of thousands of millions, and let thy seed possess the gate of those which hate them" (Gen. 24: 60). (This is translated "many millions" in the Living Bible.)

Million was also used by Shakespeare a number of times.

The number 200,000,000 appears in the Living Bible in Rev. 9:16. It is translated as "two hundred thousand thousand" in the King James version (1611), "twice ten thousand times ten thousand" in Darby (1890) and RSV (1946), "two myriads of myriads" in Young's Literal Translation (1898), and "two hundred million" in the New International Version (1973).

Billion first occurs, with the meaning 1012, in French in 1484 in Le Triparty en la Science des Nombres by Nicolas Chuquet (1445?-1500?). He used the words byllion, tryllion, quadrillion, quyllion, sixlion, septyllion, ottyllion, and nonyllion. A translation has: "The first dot indicates million, the second dot billion, the third dot trillion, the fourth dot quadrillion...and so on as far as one may wish to go."

The OED2 has:

The name [billion] appears not to have been adopted in Eng. before the end of the 17th c. .... Subsequently the application of the word was changed by French arithmeticians, figures being divided in numeration into groups of threes, instead of sixes, so that F. billion, trillion, denoted not the second and third powers of a million, but a thousand millions and a thousand thousand millions. In the 19th century, the U.S. adopted the French convention, but Britain retained the original and etymological use (to which France reverted in 1948). Since 1951 the U.S. value, a thousand millions, has been increasingly used in Britain, especially in technical writing and, more recently, in journalism; but the older sense "a million millions" is still common.]
Decillion occurs in English in 1847.

Centillionth, with an imprecise meaning, appears in English in 1852 in Tait's Magazine: "There existed not a centillionth of the blessing."

Centillion is found in English in 1863 in The Normal: or, Methods of Teaching the Common Branches, Orthoepy, Orthography, Grammar, Geography, Arithmetic and Elocution by Alfred Holbrook, which has the following:

Names of the periods. - 1st, Units. 2d, Thousands. 3d, Millions. 4th, Billions. 5th, Trillions. 6th, Quadrillions. 7th, Quintillions. 8th, Sextillions. 9th, Septillions. 10th, Octillions. 11th, Nonillions. 12th, Decillions. 13th, Undecillions. 14th, Duodecillions. 15th, Tridecillions. 16th, Quadrodecillions. 17th, Quindecillions. 18th, Sexdecillions. 19th, Septodecillions. 20th, Octodecillions. 21st, Nonodecillions. 22d, Vigintillions. 23d, Unvingintillions. 24th, Duo-vingintillions, etc. 32d, Trigintillions. 42d, Quadrogintillions. 52d, Quingintillions. 62d, Sexagintillions. 72d, Septuagintillions. 82d, Octogintillions. 92d, Ninogintillions. 102d, Centillions. 103d, Uncentillions. 104th, Duocentillions, etc. 202d, Duocentillions, etc. 1002d, Millillions, etc.
The term MINIMAL BASIS is due to Felix Klein, according to Harkness and Morley in A Treatise on the Theory of Functions.

MINIMAX (in geometry). In the sense of a saddle point of a surface or similar concept in higher dimensions, Poincaré wrote in 1899 in Méthodes Nouvelles de la Mécanique Céleste III. 246: "J'appelle minimax, à l'exemple des Anglais, un point pour lequel..."

Alan M. Hughes, Associate Editor of the OED, reports that, despite Poincare's comment, no earlier English usage has been traced.

Mark Dunn writes that the earliest English use appears to be in 1917 in Trans. American Math. Soc., vol. 18, p. 240. Most later examples of this meaning in English refer to this 1917 article as though it is the first use.

MINIMAX (in game theory). In 1928 J. von Neumann wrote in " Zur Theorie der Gesellschaftsspiele" Mathematische Annalen, 100, (p. 307) the heading "Beweis des Satzes Max Min = Min Max" (OED2).

Min-max is found in English in 1944 in J. Von Neumann & Morgenstern, Theory of Games: "A slightly more general form of this Min-Max problem arises in another question of mathematical economics" (OED2).

Minimax solution to a statistical decision problem appears in 1947 in Wald’s "Foundations of a General Theory of Sequential Decision Functions," Econometrica, 15, 279-313 but the concept had appeared in his 1939 paper under the guise of the "best estimate."

Minimax estimate appears in Hodges & Lehmann’s "Some Problems in Minimax Point Estimation", Annals of Mathematical Statistics, 21, (1950), 182-197 [John Aldrich, based on David (2001)].

Maximin is dated 1951 in MWCD10.


MINIMUM CHI-SQUARED. After Karl Pearson introduced the χ2 goodness of fit test in 1900 several authors tried basing estimation on χ2. E. Slutsky’s (1913) "On the Criterion of Goodness of Fit of the Regression Lines and on the Best Method of Fitting them to the Data," Journal of the Royal Statistical Society, 77, 78-84 and F. L. Engledow & G. U. Yule’s (1914) "The Determination of the Best Value of the Coupling-ratio from a Given Set of Data," Proceedings of the Cambridge Philosophical Society, 17, 436-440 seem to have been the first. However these papers were less noticed than Kirstine Smith's "On the 'Best' Values of the Constants in Frequency Distributions," Biometrika, 11, (1916), 262-276. Smith used the phrase "minimum χ2" but only in tables where brevity was necessary. R. A. Fisher read Smith and he was the writer who did most to keep minimum χ2 in view, for he often compared it with his own maximum likelihood: see e.g. "On the Mathematical Foundations of Theoretical Statistics", Phil. Trans. Royal Soc. Ser. A. 222, (1922) p. 357.

(Based on A. W. F. Edwards "Three Early Papers on Efficient Parametric Estimation," Statistical Science, 12, (1997), 35-38.)


MINKOWSKI’S INEQUALITY was given in Hermann Minkowski’s Geometrie der Zahlen (1896, pp. 115-7). It is discussed in Inequalities by G. H. Hardy, J. E. Littlewood and G. Polya (1934).

The term MINOR was apparently coined by James Joseph Sylvester, who wrote in Philos. Mag. Nov. 1850:

Now conceive any one line and any one column to be struck out, we get ... a square, one term less in breadth and depth than the original square; and by varying in every possible manner the selection of the line and column excluded, we obtain, supposing the original square to consist of n lines and n columns, n2 such minor squares, each of which will represent what I term a First Minor Determinant relative to the principal or complete determinant. Now suppose two lines and two columns struck out from the original square ... These constitute what I term a system of Second Minor Determinants; and ... we can form a system of rth minor determinants by the exclusion of r lines and r columns.
Sylvester also used minor as a noun in the same article: "The whole of a system of rth minors being zero" and "We shall have only to deal with a system of first minors" (OED).

MINUEND is an abbreviation of the Latin numerus minuendus (number to be diminished), which was used by Johannes Hispalensis (c. 1140) (Smith vol. 2, page 96).

In English, minuend was used in 1706 by William Jones in Synopsis palmariorum matheseos, or a new introduction to the mathematics (OED2).


MINUS SIGN. Negative sign appears in 1668 in T. Brancker, Introd. Algebra: "The Sign for Subtraction is - i.e. Minus, or the Negative Sign.

Minus sign is found in 1825 in History of the Political and Military Transactions in India during the Administration of the Marquess of Hastings 1813-1823 by Henry T. Prinsep.

MIXED NUMBER appears in English in 1542 in The Ground of Artes by Robert Recorde: "mixt numbers (that is whole numbers with fractions)" (OED2).

MÖBIUS STRIP appears in 1904 in E. R. Hedrick, translation of Goursat's Course in Mathematical Analysis (as "Möbius' strip) (OED2).

August Möbius described the object in "Ueber die Bestimmung des Inhaltes eines Polyëders" (1865). See Gesammelte Werke II, p. 484. According to Grattan-Guinness (1997, p. 404), Johann Benedict Listing also found the construction in 1858; Listing published it in 1861.

MODE was coined by Karl Pearson (1857-1936). He used the term in 1895 in "Contributions to the Mathematical Theory of Evolution. II. Skew Variation in Homogeneous Material," Philosophical Transactions of the Royal Society of London, Ser. A, 186, 343-414: "I have found it convenient to use the term mode for the abscissa corresponding to the ordinate of maximum frequency. Thus the "mean," the "mode," and the "median" have all distinct characters." (p. 345)

See also MEAN and MEDIAN.

MODULAR ARITHMETIC. The subject of modular arithmetic originated in Gauss' Disquisitiones arithmeticae of 1801.

A JSTOR search finds the term modular arithmetic in a 1942 review of Fundamental Mathematics (1940) by Duncan Harkin.


MODULAR CURVE appears in 1878 in J. J. S. Smith, "On the modular curves," Rep. Brit. Ass.

The term MODULAR EQUATION was introduced by Jacobi [Encyclopaedia Britannica (1902), article "Infinitesimal Calculus"; Smith (1906)].

The term équations modulaires appears on January 12, 1828, in a letter written by Jacobi to Legendre [Emili Bifet].

Modular equation is found in 1844 in "Investigation of the Transformation of Certain Elliptic Functions," by Arthur Cayley in Philosophical Magazine, vol. XXV [University of Michigan Digital Library].

MODULAR FORM occurs in the heading "Definite Modular Forms" in "Definite Forms in a Finite Field," Leonard Eugene Dickson, Transactions of the American Mathematical Society, Vol. 10, No. 1. (Jan., 1909).

MODULAR FUNCTION. Christoph Gudermann (1798-1852) called elliptical functions "Modularfunctionen" (DSB).

Joseph Alfred Serret (1819-1885) defined modular functions in 1866 in "Mémoire sur la théorie des congruences suivant un module premier et suivant une fonction modulaire irréductible," Mémoires de l'Acad.: "La fonction irréductible qui intervient ici, joue le rôle de module, et je lui donne en conséquence le nom de fonction modulaire" [Udai Venedem].

Richard Dedekind (1831-1916) used the term elliptic modular function in "Schreiben an Herrn Borchardt ueber die Theorie der elliptischen Modulfunktionen," J. reine angew. Math. 83 (1877), 265-292. According to Klein, this was the origin of the general name modular functions for functions with this or similar invariance [William C. Waterhouse].

MODULUS, MODULO and MOD (in number theory). Gauss introduced these terms in his Disquisitiones arithmeticae (1801, p. 9)

Si numerus a numerorum b, c differentiam metitur, b et c secundum a congrui dicuntur, sin minus, incongrui; ipsum a modulum appelamus. Uterque numerorum b, c priori in casu alterius residuum, in posteriori vero nonresiduum vocatur. [If a number a measure the difference between two numbers b and c, b and c are said to be congruent with respect to a, if not, incongruent; a is called the modulus, and each of the numbers b and c the residue of the other in the first case, the non-residue in the latter case.]

On the next page Gauss introduced the abbreviation mod. for modulo:

Numerorum congruentiam hoc signo, ≡, in posterum denotabimus, modulum ubi opus erit in clausulis adiungentes, -16 ≡ 9 (mod. 5), -7 ≡ 15 (mod. 11).

Modulus is found in English in 1811 in An Elementary Investigation of the Theory of Numbers by Peter Barlow [James A. Landau].

The OED2 shows a use of mod. in English in 1854 in Cambr. & Dublin Math. Jrnl. IX. 85 and a use of mod in 1860 in Rep. Brit. Assoc. Adv. Sci. 1859.

Modulo appears in English in 1887 in American Journal of Math. vol. 10, p. 62 [Mark Dunn, JSTOR].

Modulo (non-technical sense). Modulo is being widely used by mathematicians in a related sense of "(a) taking into account (a particular consideration, aspect, etc.) (b) with respect to an equivalence defined by (some feature)." [This is the definition which will be given by the OED, according to Mark Dunn.]

In the spring of 1953, in a letter to Paul Halmos, Warren Ambrose of Princeton wrote: "[Nash] proceeded to announce that he had solved it, modulo details, and told Mackey he would like to talk about it at the Harvard colloquium." In this citation, modulo means "except for" or "without." This letter, which was critical of John Nash's attempt (later successful) to prove the Riemann Imbedding Theorem, is quoted in A Beautiful Mind by Sylvia Nasar [James A. Landau]

Carlos César de Araújo provides these examples:

MODULUS (in logarithms) was used by Roger Cotes (1682-1716) in 1722 in Harmonia Mensurarum: Pro diversa magnitudine quantitatis assumptae M, quae adeo vocetur systematis Modulus. Cotes also coined the term ratio modularis (modular ratio) in this work.

Modulus is found in English in A Treatise on Plane and Spherical Trigonometry: With Their Most Useful Practical Applications by John Bonnycastle: "Where M = 1 for hyperbolic logarithms, or = 2.802585093 for the common tabular logarithms; which number is the hyperbolic logarithm of 10, what is usually called the modulus of the system." [Google print search]

MODULUS (a coefficient that expresses the degree to which a body possesses a particular property) appears in the 1738 edition of The Doctrine of Chances: or, a Method of Calculating the Probability of Events in Play by Abraham De Moivre (1667-1754) [James A. Landau].

MODULUS (in the Theory of Errors). In his first theory of least squares based on the normal distribution and presented in Gauss’s Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientum (1809) Gauss used a measure of precision ("mensura praecisionis observationum" (p. 245) which he denoted by h: the reciprocal of h is √2σ, where σ is the standard deviation. Both h and its reciprocal have been called the modulus: the reciprocal in G. B Airy's On the Algebraical and Numerical Theory of Errors of Observation and the Combination of Observations (1861, p. 15) and h in E. T. Whittaker & G. Robinson's Calculus of Observations (1924, p. 175). See METHOD OF LEAST SQUARES and also Symbols Associated with the Normal Distribution on the Symbols in Probability and Statistics page.

At the end of the 19th century the standard deviation began to replace the modulus in the biometric/statistical literature but writers in the error theory tradition continued to use the modulus, see e.g. Harold Jeffreys’s "An Alternative to the Rejection of Observations," Proceedings of the Royal Society of London. Series A, 137, (1932), pp. 78-87. The term now seems to have dropped out of use completely. See STANDARD DEVIATION.

MODULUS. The term modulus ("le module") for the length of the vector a + bi is due to Jean Robert Argand (1768-1822) (Cajori 1919, page 265). According to William F. White in A Scrap-Book of Elementary Mathematics (1908), the term was first used by him in his 1814 Reflexions. The passage is on p. 122 of the edition of Essai sur une manière de représenter les quantités imaginaires dans les constructions géométriques.

The term was adopted by Cauchy and chapter VII of his Cours d'Analyse (1821, p. 173ff.) has the title Des expressions imaginaires et de leurs modules. The OED’s earliest English quotation is from 1866 W. T. Brande & G. W. Cox A dictionary of science, literature and art II. 551/2 "The positive square root of a2 + b2 is often termed the modulus of the imaginary expression ." Because modulus had other meanings German writers preferred the term Der absolute Betrag (= ABSOLUTE VALUE). [John Aldrich]

The term MODULUS OF TRANSFORMATION was used in 1882 by George M. Minchin in Uniplanar Kinematics of Solids and Fluids: "It will be convenient to speak of this quantity K as a modulus of transformation" (OED2).

MOMENT was used in the obsolete sense of "an infinitesimal increment or decrement of a varying quantity" by Isaac Newton in 1704 in De Quadratura Curvarum: "Momenta id est incrementa momentanea synchrona" (OED2).

Moment appears in English in the obsolete sense of "momentum" in 1706 in Synopsis Palmariorum Matheseos by William Jones: "Moment..is compounded of Velocity..and..Weight" (OED2).

Moment of a force appears in 1830 in A Treatise on Mechanics by Henry Kater and Dionysius Lardner (OED2).

Moment was taken into Statistics from Mechanics by Karl Pearson when he treated the frequency-curve (or observation curve) as the sheet enclosed by the curve and the horizontal axis. See his "Asymmetrical Frequency Curves," Nature October 26th 1893: "Now the centre of gravity of the observation curve is found at once, also its area and its first four moments by easy calculation." (OED2).

The phrase method of moments was used in a statistics sense in the first of Karl Pearson's "Contributions to the Mathematical Theory of Evolution," (Philosophical Transactions of the Royal Society A, 185, (1894), p. 75.). Pearson used the method to estimate the parameters of a mixture of normal distributions. For several years Pearson used the method on different problems but the name only gained general currency with the publication of his 1902 Biometrika paper "On the systematic fitting of curves to observations and measurements" (David 1995). In "On the Mathematical Foundations of Theoretical Statistics" (Phil. Trans. R. Soc. 1922), Fisher criticized the method for being inefficient compared to his own maximum likelihood method (Hald pp. 650 and 719).

Moment generating function. R. A. Fisher seems to have brought this term into English in his "Moments and Product Moments of Sampling Distributions.," Proceedings of the London Mathematical Society, Series 2, 30, (1929), p. 238. He probably took the term from V. Romanovsky "Sur Certaines Éspérances Mathématiques et sur l'Erreur Moyenenne du Coefficient de Corrélation, Comptes Rendus, 180, (1925), 1897-1899. Romanovsky refers to "la function génératrice des moments" (p. 1898).

Some English publications of the 1930s, including M. S. Bartlett’s "On the Theory of Statistical Regression," Proceedings of the Royal Society of Edinburgh, 53, (1933), 260-283, used the term for what is now called the characteristic function. The modern division of labour between the two terms seems to have been fixed from around 1940.

This entry was contributed by John Aldrich. See CHARACTERISTIC FUNCTION (1).

The term MONOGENIC (for a function having a single derivative at a point) was introduced by Augustin-Louis Cauchy (1789-1857).

MONOMIAL appears in English in 1702 in A Mathematical Dictionary: Or; A Compendious Explication of All Mathematical Terms by Joseph Raphson and Jacques Ozanam: "Monomial, is a Magnitude of one Name, or one only Term, as ab, aab, aaab, &c." [Google print search]

MONOTONIC is found in 1901 in Ann. Math. II: "It follows that f(s) is a monotonic function that actually decreases in parts of the interval..." (OED2).

It is also found in W. F. Osgood, "On the Existence of a Minimum of the Integral...," Transactions of the American Mathematical Society, 2 (Apr., 1901). The term is probably considerably older.

MONTE CARLO with reference to the use of (pseudo) RANDOM NUMBERS for solving numerical problems. In his autobiography Adventures of a Mathematician Stanislaw M. Ulam (1976, pp. 196-200) wrote that such a method came to him while playing solitaire during an illness in 1946. Ulam described the method to John von Neumann and they “developed the mathematics together.” In an unpublished manuscript, “The Origin of the Monte Carlo Method,” dated Apr. 12, 1983, Ulam adds that what seems to be the first written account of the method was given by von Neumann in a letter to Robert Richtmyer of Los Alamos in early 1947.

The first publication to describe the method was “The Monte Carlo Method” by Ulam and Metropolis in the Journal of the American Statistical Association, 44, (1949), 335-341. A news item in Math. Tables & Other Aids to Computation III, (1949), p. 546 reports a Symposium on Probability Methods in Numerical Analysis at which both Ulam and von Neumann spoke. The Monte Carlo method and its history are explained as follows: “This method of solution of problems in mathematical physics by sampling techniques based on random walk models constitutes what is known as the ‘Monte Carlo’ method. The method as well as the name for it were apparently first suggested by John von Neumann and S. M. Ulam.”

Ulam and von Neumann exploited the random number generation possibilities of the new electronic COMPUTER to solve differential equations and their Monte Carlo method would now be classified as a form of MARKOV CHAIN MONTE CARLO. Computer-based sampling techniques were soon applied to other problems, particularly those arising in statistical distribution theory, and the term Monte Carlo was used for these applications as well. These exercises resembled the “experimental sampling” of the pre-electronic computer age, examples of which can be found in the famous 1908 paper by Student (see STUDENT’S t-DISTRIBUTION) and the 1926 paper “Why Do We Sometimes Get Nonsense Correlations between Time-series? A Study in Sampling and the Nature of Time-series” by Yule (see SPURIOUS CORRELATION). It was for applications like these that the first tables of RANDOM NUMBERS were produced in 1927.

[This entry was contributed by John Aldrich.]


MOORE SPACE. This name was introduced by F. Burton Jones in Concerning normal and completely normal spaces (Bull. Amer. Math. Soc. 43 (1937) 671-677, p.675) for a topological space satisfying "Axiom 0 and parts 1, 2, and 3 of Axiom 1 of R. L. Moore’s Foundations of Point Set Theory" (Amer. Math. Soc. Coll. Publ. 13, NY, 1932). It was in that paper (p. 676) that Jones stated for the first time the famous normal Moore space conjecture: "Is every normal Moore space M metric [metrizable]?" Despite considerable effort spent in seeking a solution, the question was "settled" only in 1970, when Tall and Silver (by using a Cohen model) showed its undecidability from traditional set theory. [Carlos César de Araújo]

MORAL EXPECTATION was once the standard term for what is now called expected utility. "L'espérance morale" appeared in a letter dated 21st May 1728 written by Gabriel Cramer; see letter 8 in Correspondence of Nicholas Bernoulli concerning the St Petersburg game with Montmort, Daniel Bernoulli and Cramer (translation by Richard J. Pulskamp.) Daniel Bernoulli published an extract from this letter in his "Specimen Theoriae Novae de Mensara Sortis," Commentarii Academiae Scientiarum Imperialis Petropolitana, 5, 175-192 (1738). This was the first publication on expected utility and it has been translated as "Exposition of a New Theory on the Measurement of Risk," Econometrica, 22, (1954), 23-36). Laplace gave Bernoulli's theory plenty of attention in the Théorie Analytique des Probabilités, livre II, chapitre X, p. 441 but he used "l'espérance morale" rather than Bernoulli's "emolumentum medium" and the literature followed.

[John Aldrich, based on Jacques Dutka, "On the St. Petersburg paradox," Arch. Hist. Exact Sci. 39, No.1, 1988]


The phrase MORALLY CERTAIN was introduced by Jacob (James/Jacques) Bernoulli (Ars Conjectandi (1713) Part IV, Chapters I and II.) for a case in which the probability is .99 or perhaps .999

That is morally certain whose probability nearly equals the whole certainty, so that a morally certain event cannot be perceived not to happen: on the other hand, that is morally impossible which has merely as much probability as renders the certainty of failure moral certainty. Thus, if one thing is considered morally certain which has 999/1000 certainty, another thing will be morally impossible which has only 1/1000 certainty.
(Walker, 1929, p. 10).

MOVING AVERAGE. This technique for smoothing data points was used for decades before this, or any general term, came into use. In 1909 G. U. Yule (Journal of the Royal Statistical Society, 72, 721-730) described the "instantaneous averages" R. H. Hooker calculated in 1901 as "moving-averages." Yule did not adopt the term in his textbook, but it entered circulation through W. I. King's Elements of Statistical Method (1912).

"Moving average" referring to a type of stochastic process is an abbreviation of H. Wold's "process of moving average" (A Study in the Analysis of Stationary Time Series (1938)). Wold described how special cases of the process had been studied in the 1920s by Yule (in connection with the properties of the variate difference correlation method) and Slutsky [John Aldrich].


MULTICOLLINEARITY (in Econometrics and Statistics). The term due to Ragnar Frisch, is a contraction of his phrase multiple collinearity which refers to a situation in which several linear relationships hold between variables. The OED gives the quotation, "There exist two or more independent linear relations between the systematic parts of these variates, but..we are not aware of this multicollinearity." Statistical Confluence Analysis (1934) p. 75.

In the 1930s Frisch investigated multicollinearity from the point of view of the multi-equation errors in variables model; Statistical Confluence Analysis was his principal work on the subject. When interest in this model waned the term multicollinearity survived with an altered meaning. It now meant that the DESIGN MATRIX in the regression model has deficient rank. The change can be seen in the discussion of multicollinearity in Richard Stone’s The Measurement of Consumers' Expenditure and Behaviour in the United Kingdom, 1920-1938, vol. 1 (1954) p. 302. In this new sense the term is hardly ideal for it implies that there is more than one relationship between the columns of X. Several writers have suggested dropping the term or replacing it by collinearity. However it survives.

This entry was contributed by John Aldrich. See the entry ERROR: ERRORS IN VARIABLES.

MULTINOMIAL DISTRIBUTION appears in R. A. Fisher’s "Theory of Statistical Estimation," Proc. Cambr. Philos. Soc. 22, (1925) p. 719. The "multinomial expansion" was already an established term and this distribution bears the same relationship to that expansion as the binomial distribution bears to the binomial expansion. David (2001)

MULTIPLY was used in English as a verb ("multiply by two") about 1391 by Chaucer in A Treatise on the Astrolabe (OED2).

MULTIPLICATION was used by Chaucer in a non-mathematical sense about 1384 and in a mathematical sense in 1390 by John Gower in Confessio amantis III 89 (OED2).

MULTIPLICATION TABLE. Table of multiplication appears in 1594 in Exercises (1636) by Blundevil: "Before I teach you the true order of multiplying, I thinke it good to set you downe a Table of Multiplication" (OED2).

Multiplication table appears in 1674 in Arithmetic by Samuel Jeake: "To learn by heart the Table commonly called Multiplication Table" (OED2).

The first edition of the Encyclopaedia Britannica (1768-1771) has: "This elementary step may be learned from the following table, commonly called Pythagoras's table of multiplication: which is consulted thus; seek one of the digits or numbers on the head, and the other on the left side, and in the angle of meeting you have their product."


MULTIPLICATIVE IDENTITY and MULTIPLICATIVE INVERSE are found in 1953 in First Course in Abstract Algebra by Richard E. Johnson [James A. Landau].

MULTIVARIATE is found in Karl Pearson, “Notes on the History of Correlation,” Biometrika 13 (Oct., 1920), pp. 25-45


MULTIVARIATE ANALYSIS (in Statistics) appears in the title of M. S. Bartlett’s "A Note on Tests of Significance in Multivariate Analysis," Proc. Cambr. Philos. Soc. 35, (1939), 180-185. David (2001)

Front - A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z - Sources