chalkdust can make it easier to provide a more useful formula whose value is easy to calculate. This deviation from the precise value of the function is usually called error, and provided that we don’t introduce too much, we can get a good idea of the dominant behaviour. For instance, say we want to get an idea of the size of the function f(x) = [x 2 ] for large x. It makes sense to resign ourselves to an error of at most 1, taking just the larger and larger x 2 term. To keep track of our neglected error we write f(x) = [x 2 ] = x 2 + O(1). The symbol O(1) means that we are allowing an error that is at most constant with respect to x. If our function were more complicated, we might wish to keep track of an error that varies with respect to x, instead of remaining constant. Big-O notation gives us this flexibility: we write r(x) = O(g(x)), for some positive function g(x), to mean that there is a constant K and a positive number x0 such that whenever x > x0 we have that |r(x)| 6 Kg(x). Let us lend flavour to this slightly knoy definition. For the longevity of this article and for our younger readership, we define an analogue radio to be an obsolete cuboidal object with a knob, which ingests radio waves and—subject to correct knob position and alignment of the clouds— excretes a melody of questionable taste. One day—not too cloudy—you twizzle the knob and your favourite tune blares out. At least, you think it is your favourite tune … There is some distractingly apparent white noise, rendering it a challenge to make out the precise melody. You try turning up the volume, but this only helps up to a certain point. From some volume onwards, the white noise remains proportionally loud. We can return to big-O notation by thinking of the x variable in the definition above as the volume of the radio’s excretion, the white noise as the error r(x) from your favourite tune. Then r(x) = O(g(x)) is to say that: whenever the volume x is greater than some x0 , the radio’s deviation r(x) from your favourite tune is at most proportional to some function g(x). Now for a more mathematical example: f(x) = x 7 + 5x + sin x. If x is large then x 7 is very large. If you were to plot a graph of f(x), the 5x and sin x terms would seem less and less significant. We could make use of big-O notation to indicate the dominant behaviour: f(x) = x 7 + O(x). For Dirichlet’s sum we can now write ) ∑1 ∑(x + O(1) = x + O(x). (3) D(x) = n n n6x n6x ∑ The sum n6x n1 is (up to a constant) the same as the integral of 1t over the interval [1, x]. This integral is log x. The constant in question approaches the Euler-Mascheroni constant, γ ≈ 0.57, ∑ 1 1 at a rate of O( x ). As x gets large this error vanishes, so that the sum n6x n is close in value to log x + γ. Substituting into (3) gives D(x) = x (log x + γ) + O(x) = x log x + O(x).
Phrased diﬀerently, the mean value of the divisor function does behave well for large x: it grows like log x with some fixed constant error: 1∑ d(n) = log x + O(1). (5) x n6x 35