An interesting result that demonstrates, among other things, the ubiquity of in mathematics is that the probability that two random positive integers are relatively prime is . A more revealing way to write this number is , where
is the Riemann zeta function. A few weeks ago this result came up on math.SE in the following form: if you are standing at the origin in and there is an infinitely thin tree placed at every integer lattice point, then is the proportion of the lattice points that you can see. In this post I’d like to explain why this “should” be true. This will give me a chance to blog about some material from another math.SE answer of mine which I’ve been meaning to get to, and along the way we’ll reach several other interesting destinations.
Densities and measures
First of all, what do we mean by a “random” integer? It’s not hard to see that there is no uniform probability distribution on the positive integers. The strongest statement that is meant when somebody says that a positive integer has a probability of of being in some subset is that the natural density
is equal to . For example, by this definition the probability that a positive integer is divisible by is , as one would expect. Natural density, however, has drawbacks: it is not defined for every subset of , and it does not form a probability measure in the measure-theoretic sense. In particular, it’s not countably additive, since each positive integer has measure but their union has measure .
If we want an honest probability measure on , we need to choose a sequence of non-negative real numbers such that . Since we’re trying to answer a number-theoretic question about relatively prime integers, we should pick the to have nice number-theoretic properties. One property that’s natural to ask for, as above, is that the probability that an integer is divisible by should be for any positive integer . In other words, we want
for all . This implies, among other things, that the events that an integer is divisible by one of two relatively prime positive integers are independent.
Unfortunately, this is impossible. For any fixed positive integer , it’s clear that is at most the probability that a positive integer is not divisible by for any prime . The probability that a positive integer is not divisible by is , and since these events are independent, the probability that a positive integer is not divisible by the primes not dividing is
which diverges to zero, so ; contradiction.
Here’s one explanation for why this condition can’t work which I think is revealing. One way to justify requiring that the probability that a positive integer is divisible by is is that this is true for all finite quotients of ; that is, in it’s true that for all positive integers , exactly of the residues are congruent to . In other words, when we imposed the above requirement, we were secretly looking for a probability distribution compatible with the counting measure on the finite quotients of . Whenever you want to talk about the finite quotients of a group , there’s a universal object that lets you talk about all of them at once: the profinite completion , defined as the limit of the diagram of the finite quotients of equipped with the obvious morphisms (that is, if are two subgroups of finite index such that , then there is an obvious morphism ). The profinite completion comes equipped with a natural map which is not always injective and with natural projections for every finite quotient of . It also comes with the initial topology making these projections continuous, and with this topology it is Hausdorff, compact, and totally disconnected. In addition, the image of is dense.
Example. The profinite completion of is a gadget called the profinite integers. By the Chinese Remainder Theorem, it is isomorphic to the direct product . Profinite groups often appear as Galois groups, and in this case naturally occurs as the absolute Galois group of a finite field.
Example. The profinite completion of is the trivial group because has no finite quotients! This shows that the natural map from a group into its profinite completion need not be injective.
Since is compact Hausdorff, it comes equipped with a canonical (say, left) Haar measure (the one which assigns the entire group measure ). This is important because it describes a probability measure on . For profinite groups the Haar measure is particularly easy to describe. Any Borel measure on a topological space is determined by what it does to a basis of open sets, and for profinite groups such a basis is given by the partitions determined by every projection . The kernel of such a morphism is a subgroup of finite index, and by invariance all of its cosets must have the same measure as it, so they must all have measure . In particular, the (normalized) Haar measure on is the unique one which assigns measure to the kernel of the projection .
This is precisely the property we wanted from a probability distribution on . But now the bitter truth hits us: in , the subgroup has measure zero! One way to see this is that the identity is in the kernel of every projection, so it has measure less than for every , hence measure zero. And by invariance, every element of has measure zero (in other words, there are no atoms), hence also has measure zero. One can also repeat the first proof given above in this context, but the point is that we are free to use invariance here because it was already guaranteed by the requirement that the measure be compatible with projections to the finite quotients. So we are totally defeated by this requirement, which is too strong.
(On the other hand, if we are ever in a situation where we want to pick random integers, we should perhaps think about picking random profinite integers instead because we now know there is a perfectly reasonable way to do this.)
The zeta distributions
What now? Well, the Chinese Remainder Theorem suggests that it is still natural to require that the events of being divisible by two relatively prime integers should be independent. This is equivalent to choosing the different exponents in the prime factorization independently. So how do we choose the exponent of a particular prime ?
To answer this restricted question, the relevant quotients of this time are the finite quotients . In such a finite quotient the probability of not being divisible by is , the probability of being divisible by (but not by ) is , the probability of being divisible by (but not by ) is , and so forth. This geometric behavior extends even to the limit of these quotients, which is the -adic integers . The -adic integers also have a canonical normalized Haar measure, and again because of compatibility with the finite quotients the measure of the kernel (the elements divisible by ) is , which duplicates the above behavior. So it is tempting to make the exponent of be equal to with probability .
But since the profinite integers are just the direct product of the , we are just repeating the same argument as above, but locally, so we know this can’t work. (To see this explicitly, once again we can compute the measure of .) However, it is still natural to use a geometric distribution, for example because it maximizes entropy given a mean: see Keith Conrad’s excellent exposition on the subject. So let’s make the exponent of equal to with probability for some constant .
With this choice, the measure of a positive integer is now equal to
so we see that the product needs to converge in order for this measure to make sense. Other than that requirement,mhow do we pin down a nice choice of the constants ?
One additional requirement we can impose is that the measure of a positive integer is monotonically decreasing. It turns out (and I will leave this as a nice exercise) that this is possible if and only if for some positive real constant . Since also needs to converge, it turns out (again, exercise) that this is possible if and only if is greater than . This then gives
where is the zeta function! (This is one way to flesh out the tantalizing statement made by Rota which I quoted in this old blog post, although I don’t know if it’s the precise result that Rota had in mind.) This distribution is the zeta distribution on with parameter .
One way to think about the zeta distribution is as the probability distribution coming from viewing as the partition function of a certain statistical-mechanical system, the Riemann gas, whose states are the positive integers and whose energies are their logarithms . As with many other cool connections in mathematics, this was discussed by John Baez, who has links to more information, such as this wonderful page. One can think of a positive integer as describing a multiset of fictitious particles called “primons,” where each primon occurs with multiplicity the corresponding exponent in the prime factorization of and energy . So is the sum of the energies of all the primons in the state corresponding to .
The parameter corresponds to inverse temperature, and as the partition function diverges, which corresponds to the Hagedorn temperature of the system. In this limit each state becomes closer to being equally likely; in other words, we are getting closer and closer to a uniform probability distribution on , if such a thing existed.
This suggests the following idea. Given a subset , we look at its measure under the zeta distribution
and then we take the high-temperature limit . The resulting number, when it exists, turns out to equal the logarithmic density
of . The logarithmic density is still not a measure on , but it has the benefit of existing for more sets than the natural density. Even better, if a set has a natural density, then it also has a logarithmic density, and the two agree. So studying zeta distributions can tell us what the natural density of a subset has to be, if it exists.
The proof
One reason I was finally inspired to write this post is that the zeta distribution occurred on my measure theory homework. We were asked to prove that the events of being divisible by distinct primes were independent and that this implies the celebrated Euler product formula; note that this is already implicit in the above derivation, where the assumption of independence leads inevitably to the product formula. Then we were asked to prove that the probability that a positive integer is squarefree is equal to . But since a squarefree positive integer is just a product of distinct primes, this is just
as desired. Finally, we were asked to prove that the probability that two positive integers are relatively prime is also equal to . This can be done as follows: the probability that a positive integer is squarefree is the same as the probability that one positive integer is squarefree and another positive integer is arbitrary. We want to show that this is the same as the probability that two positive integers are relatively prime. One way to show this is to exhibit a probability-preserving bijection between the two sets, which can be done as follows: given squarefree and arbitrary, send
.
This map is injective because can be recovered as the radical of . This map sends a squarefree positive integer and an arbitrary positive integer to two relatively prime positive integers, and it preserves probabilities. In the other direction, given relatively prime, let denote the product of the primes dividing and send
This gives a squarefree positive integer and another positive integer, and it preserves probabilities. In addition, it is not hard to verify that the two maps given above are inverses of each other.
This argument is equivalent to the statement that the number of ways to write a positive integer as the product of two relatively prime positive integers is the same as the number of ways to write a positive integer as the product of a squarefree integer and an arbitrary integer, which is in turn equivalent to a Dirichlet series argument which computes the probabilities above directly. This number is a multiplicative function in , so it’s enough to verify that the two are equal for prime powers, which is clear. But I thought the above argument was cute.
In any case, it now follows that the “logarithmic probability” that two positive integers are relatively prime is . Hence the “natural probability” that two positive integers are relatively prime, if it exists, is also . So the mysterious appearance of the zeta function in this result is now hopefully not so mysterious.
Generalizations
The above argument also works on more general zeta functions. For example, let be a Dedekind domain such that, for every ideal , the quotient is finite; this is what Pete Clark calls a Dedekind abstract number ring, and it includes the ring of integers in any number field as well as the ring of integers in any finite extension of . Define the norm of an ideal to be . Then we can define the Dedekind zeta function
where the Euler product is taken over prime ideals. When , this is just the Riemann zeta function. When , every ideal is principal and generated by some polynomial , and the norm of the corresponding ideal is where . Since there are exactly monic polynomials of degree , it follows that
where . The corresponding Euler product is described in this previous post.
The Dedekind zeta function of describes, for fixed such that the sum converges, a probability distribution on the set of ideals of , where the ideal occurs with probability . Since ideals uniquely factor into products of prime ideals, everything that was said above carries over verbatim to any . (In statistical-mechanical language, the prime ideals are the “primons” out of which states are built, and a prime ideal has energy .) This implies, for example, that in the limit as , the probability that a polynomial in is squarefree, which is also equal to the probability that two polynomials in are relatively prime, equals
.
There is a beautiful blog post by Jordan Ellenberg which explains how to get the analogous version of this statement when is replaced by .
What we have now is pretty funny: it is a probabilistic algorithm to compute for any for which it is possible to explicitly construct random(ish) elements. In fact, if “squarefree” is replaced by “-power free” for some then we have a probabilistic algorithm to compute for any positive integer . One might imagine an alien civilization discovering zeta functions this way.
In this generality, the construction of the profinite completion of still makes sense, although it should be taken as a ring instead of just a group, where we take the limit over all finite quotient rings . It is closely related to the adele ring of when is the ring of integers in a finite extension of a global field, and its Haar measure is very important in the modern approach to algebraic number theory, beginning with Tate’s thesis. I’m not qualified to describe how this works in any more detail than that, unfortunately.
This whole discussion generalizes even further to dynamical zeta functions, although I’m not entirely sure what to make of it. It gives a probability distribution on the space of multisets of orbits in a dynamical system whose primons are the orbits, and where the energy of an orbit is the number of points it contains.
Energy
Recall that if is the partition function of a statistical-mechanical system, then the negative logarithmic derivative
describes the average or expected energy of the system. For the Riemann zeta function, this gives
.
which is the Dirichlet series of the von Mangoldt function, and which figures in some proofs of the prime number theorem. Intuitively, this sum describes the contribution to the average energy of the system from each type of “primon” in the Riemann gas; the expected contribution to the energy from primon is times the expected number of -primons. One way to see why this expected number should be equal to is to use the identity
which is valid for any non-negative integer-valued random variable . (Note that one can run this argument in reverse to deduce the form of the Riemann zeta function from the average energy function; in particular, the Euler product is now a consequence of linearity of expectation!)
Now let be a dynamical system; that is, let be a set and a function. We require, as before, that for every the number of fixed points of is finite. Then the energy corresponding to the dynamical zeta function
,
where the product runs over all orbits of and is the size of an orbit, is equal to
.
The RHS follows from our computation of the average energy by summing over the average contribution from each primon, and the LHS is equal to the RHS because any element of corresponds to some number of copies of the same orbit (that is, primon). This is, to me, an appealing way to think about the equivalence between the exponential and product forms of the dynamical zeta function. Previously I only had a purely combinatorial way of thinking about it, but the energy computation is a nice alternate perspective, especially as it applies to the case where is a variety over and is the Frobenius map.