maths

More joys of mathematics

Here’s another story of a maths problem that’s exactly solvable. Like last time, there’s a great way of reformulating the problem to end up with a different perspective. Also like last time, there’s quite a few branches of mathematics that come together to solve the problem. So, let’s dive right in!

The problem

In algebraic iterative methods, my current field of research, there’s an important function that describes the regulative effects. It’s given by

f_k(x) = \frac{1-(1-x^2)^k}{x},

where x \in [0,1) and k \geq 1. Here’s a plot of the function f_{10}.

filtering_function_basic_plot

Broadly speaking, this family of functions acts as filters. Usually it’s important to filter out high-frequency noise, which are here represented by smaller values of x. At the same time, it’s important to invert the lower frequencies. This is represented by the larger values of x, where f_k(x) roughly behaves as \frac{1}{x}, especially for large k.

The problem is to find the maximum of f_k, or at least a sharp bound of the maximum. We’re especially interesting in large k.

A rough bound

First off, it’s important to realise that f_k >0 always holds; that is, for all 0<x<1 and all k \geq 1. This will allow us to use squares and square roots with impunity, since we’re only looking at the positive branch.

Next, it’s all about expansions. We’d like to know how the function f_k behaves, roughly anyway, around x=0. To figure this out, we apply the Binomial Theorem to see that

(1-x^2)^k = 1 - k x^2 + \tfrac{1}{2}k(k-1)x^4 + \mathcal{O}(x^6).

The Binomial Theorem actually gives us all terms, but we’re only interested in small x here. Plugging this expansion in, we see that

f_k(x) = k x - \tfrac{1}{2}k(k-1)x^3 + \mathcal{O}(x^5).

From this expression, we can also see that f_k(x) \leq k x, since the next term is negative and all subsequent terms are smaller. The function f_k increases linearly, and quite quickly when k is large, for small x.

Next, we can also easily confirm that f_k(x) \leq \frac{1}{x}. Indeed, we can write

f_k(x) = \frac{1}{x} - \frac{(1-x^2)^k}{x},

where we should recognise that \frac{(1-x^2)^k}{x}>0 for our domain. We’re subtracting a positive number from \frac{1}{x}, which confirms our assertion that f_k(x) \leq \frac{1}{x}. One interesting thing is that this positive number becomes very small for x around 1 and large k. As I mentioned earlier, the function f_k approaches \frac{1}{x} as k \to \infty.

filtering_function_simple_bound

Above I’ve plotted the function f_{10} together with the two bounds we’ve found. The bounds are tight at opposite ends of the domain; they describe the behaviour of f_k. If we now check where the two bounds intersect, this will provide a bound on the maximum of f_k. So we look for the solution of kx = \frac{1}{x}, which is x = \frac{1}{\sqrt{k}}. Plugging this in either side, we find

f_k \leq \sqrt{k}.

The bound can be improved in a very simple manner by increasing the number of terms in the expansion around x = 0. Care must be taken that the expansion is an upper bound for the function as a whole. This means we should only use expansions where the highest-order term has a positive constant.

A (more or less) exact solution

We’re now in a position to provide an approximately exact solution. It’s gonna be approximate in the sense that it gets better for large k. We are looking for the unique point where f_k^\prime = 0. Taking the derivative, we see that

f_k^\prime (x) = \frac{1}{x^2} \left( 2k x^2 (1-x^2)^{k-1} + (1-x^2)^k - 1 \right).

Since this should equal zero, we can ignore the factor \frac{1}{x^2}. Next, we need to use the fact that x will be small for large k. We see this from the previous discussion where we estimated that the maximum occurs roughly where x = \frac{1}{\sqrt{k}}. This just means we can ignore high-order powers of x. As such, we again use the Binomial Theorem to expand the powers and only look at the lowest-order terms. If we only keep the lowest-order term, which will be powers of x^2, we won’t end up with anything. That would be like approximating the critical point with x=0. This would be correct in the limit, but we also want to know how it moves towards that limit. So, we need the two lowest-order terms, which will also involve x^4. The first term is expanded as

2k x^2 (1-x^2)^{k-1} = 2k x^2 - 2k (k-1) x^4 + \mathcal{O}(x^6).

The second term works out as

(1-x^2)^k = 1 - k x^2 + \tfrac{1}{2} k(k-1) x^4 + \mathcal{O}(x^6).

Adding them up, we find that the derivative of f_k will be zero where

k x^2 - \tfrac{3}{2} k(k-1) x^4 + \mathcal{O}(x^6) = 0.

Dividing out the k x^2 and ignoring the big-Oh term, we find that

x = \sqrt{ \frac{3}{2(k-1)}}.

The only thing that’s left is to plug this into f_k, which yields

\max_{x \in [0,1]} f_k(x) = \sqrt{\frac{2}{3}} \left( 1 - \left(1-\frac{3}{2(k-1)} \right)^k \right) \sqrt{k-1}.

We can again simplify this expression for large k, recognising the definition of the exponential function. We then find that the constant is

\alpha = \sqrt{\frac{2}{3}} \left( 1 - e^{-\frac{3}{2}} \right).

The numerical value of \alpha is roughly 0.634. I say roughly, because of course it’s a transcendental number because of the exponential, which means it has an unending and aperiodical decimal expansion. The approximate maximum value of f_k will be

\max_{x \in [0,1]} f_k(x) = \alpha \sqrt{k} + \mathcal{O}(\tfrac{1}{\sqrt{k}}).

For smaller k, the approximation will not be very good. But for large k, say 100 or so and upwards, it gets better. Below I’ve plotted the function f_{100} together with the location \frac{\beta}{\sqrt{k}} with \beta = \sqrt{\tfrac{3}{2}}. It’s definitely close, but it’s not quite there yet.filtering_function_approximate

We can make one final refinement to our approximation of the maximum, which we will do next.

A smarter approximation

There’s something interesting about the approximate solution we just saw. Both it and the bound estimate the maximum to occur at some x = \frac{\beta}{\sqrt{k}} and the maximum value itself behaves as \alpha \sqrt{k}. In the first case, we have \alpha = \beta = 1 and in the second case we have a more complicated expression.

Another question we might ask is what is the best possible approximation of the specified form. So what we’re gonna do next is to assume x has a certain form and find the best possible constant. This will look a little bit like a similarity transform, but we’re going to only investigate the case of large k.

Let’s substitute x = \frac{\beta}{\sqrt{k}} into f_k and let’s assume k is large. We see that

f_k \left( \frac{\beta}{\sqrt{k}} \right) = \frac{\sqrt{k}}{\beta} \left( 1 - \left( 1- \frac{\beta^2}{k} \right)^k \right).

We should recognise once again the standard limit definition of the exponential function, so that we find

f_k \left( \frac{\beta}{\sqrt{k}} \right) = \frac{\sqrt{k}}{\beta} \left( 1 - e^{-\beta^2} + \mathcal{O}\left( \frac{1}{k} \right) \right).

From this we learn that the best possible approximation of the specified form x = \frac{\beta}{\sqrt{k}} happens when \beta maximises the function

h(\beta) = \frac{1-e^{-\beta^2}}{\beta}.beta_function

Above I’ve plotted the function h(\beta), for which we need to find the \beta that maximises it. For positive values of \beta, the function has a unique maximum. Perhaps surprisingly, it can be found analytically, but not in terms of elementary functions. We’ll need to use something that’s known as Lambert’s W-function.

There’s a class of functions in mathematics that’s known as “special functions”. These are functions that have a more or less established notation and well-known properties, but cannot be expressed in terms of elementary functions. The distinction is a little vague, but elementary functions are things like polynomials, exponentials and trigonometric functions. Some notable special functions are the Bessel functions or the hypergeometric functions. Since they generally cannot be expressed in terms of elementary functions, special functions are usually defined implicitly or by a differential equation. Lambert’s W-function is another one of those special functions, and it’s defining feature is that it’s the inverse of z e^z on the whole complex plane, so that

W(z e^z) = z.

What’ll be important to remember is that Lambert’s W-function has two branches, denoted W_0 and W_{-1}. The branch second branch is defined by W_{-1} \leq -1.

It turns out that the \beta that maximises h(\beta) can be expressed in terms of Lambert’s W-function. This astounding fact can be discovered by the standard procedure of differentiating h and equating with zero. The derivative is given by

h^\prime (\beta) = \frac{ (2 \beta^2+1) e^{-\beta^2} - 1 }{\beta^2}.

Since \beta \neq 0 at the maximum, we’re looking for a real-valued \beta such that the numerator is zero. The trick is to write it in terms of an expression that looks like z e^z on one side and something else on the other. This can be done by adding and subtracting \tfrac{1}{2} in the exponential,

(2 \beta^2+1) e^{- (\beta^2 + \frac{1}{2}) + \frac{1}{2}} = 1.

If we call \gamma = - \beta^2 - \tfrac{1}{2}, then we have

-2 \gamma e^\gamma e^{\frac{1}{2}} = 1,

or

\gamma e^\gamma = - \tfrac{1}{2} e^{ - \frac{1}{2}}.

You might say this equation is easy to solve, simply use \gamma =- \tfrac{1}{2} and Bob’s your uncle. Too bad that would result in \beta = 0, so what’s going on here? As it turns out, there are two real-valued solutions, which you can see in the plot below.lambert_two_roots.png

These two roots represent precisely the two branches of Lambert’s W-function. The one solution \gamma = - \tfrac{1}{2} is given by W_0, while the other root is given by W_{-1}. Since we’re looking for some nonzero \beta, we should try the other branch.

We can now simply work out the value of \beta, which comes out as

\beta  = \sqrt{ -W_{-1} \left( -\tfrac{1}{2} e^{-\frac{1}{2}} \right) - \tfrac{1}{2} }.

A sanity check here is that W_{-1} \leq 1, so that the square root comes out real and positive. The numerical value of \beta is roughly 1.123, although it has an infinite aperiodic expansion due to its transcendental nature. By definition, \alpha is given by

\alpha = \frac{1-e^{-\beta^2}}{\beta},

which is roughly 0.638. Together, \alpha and \beta form the best possible estimations of the given form x = \frac{\alpha}{\sqrt{k}} and \max f_k(x) = \beta \sqrt{k} for large k.

Below, I’ve plotted f_{100} together with the simple approximation of earlier and the smarter approximation. The position of the smarter approximation seems to be right on the money, but the value seems to be still a bit off. At any rate, it’s a lot better than the previous approximation.

filtering_function_smarter_maximum

A family of bounds

As a final thought, we’ll derive a smarter collection of bounds using the transformation above. Instead of looking at the family f_k, we look at the transformed family of functions

\tilde{f}_k(x) = \frac{1}{\sqrt{k}} f_k \left( \frac{x}{\sqrt{k}} \right).

This family doesn’t live on a fixed domain; rather, the domain of \tilde{f}_k is [0,\sqrt{k}). As k becomes larger and larger, \tilde{f}_k approaches h.

beta_function_limits.png

Above I’ve plotted a collection of functions from the family \tilde{f}_k. What’s important to notice here is that they approach h from above. Moreover, we have

\tilde{f}_{k+1}(x) \leq \tilde{f}_k(x),

for all x in [0,\sqrt{k}). For any fixed K, we can find a uniform bound on \tilde{f}_k for all k \geq K. We do this simply by finding the maximum of \tilde{f}_K.

As a consequence, we can find a bound for f_k for all k \geq K. This is due to the fact that we have a simple scaling of the x-axis, together with a scaling with \sqrt{k}. So we can find a uniform bound as

f_k \leq \sqrt{k} \, \max \tilde{f}_K,

for all k\geq K.

For the smallest values of K, say K\leq 3, we can find the maximum analytically. For larger values it’s likely to be impossible, as the polynomial degree becomes too high. Fortunately, we don’t need an exact figure, as we know there’s a bit of a gap. We can find the maximum numerically, we only need to make sure that the approximate value for \tilde{f}_K is larger than the maximum value of \tilde{f}_{K+1}. If we do that, we have a sufficient bound on our hands.

To give an example, we can set K = 2 and find a bound for all k \geq 2. We first work out

\tilde{f}_2(x) = \frac{1}{x} \left( 1 - \left( 1- \frac{x^2}{2} \right)^2 \right) = x - \tfrac{1}{4} x^3.

Demanding that the derivative vanishes, we find

1- \tfrac{3}{4} x^2 = 0.

We’re looking for a positive root so that we have

x = \frac{2}{\sqrt{3}}.

Consequently, the maximum value of \tilde{f}_2 is given by

\max \tilde{f}_2 = \frac{4}{3\sqrt{3}},

which is roughly 0.770. Hence, we have that any f_k with k\geq 2 is bounded by

f_k \leq \frac{4}{3\sqrt{3}} \sqrt{k}.

Another example: if we’d like to find a uniform bound for all k \geq 100, we find the maximum of \tilde{f}_{100} numerically, for instance using Newton’s method. Doing so, we find a constant of 0.64019. We need to make sure that it’s bigger than the same constant for k=101, which numerically comes out as 0.64017. Note that in this case, it’s important to take a sufficient number of decimals.

Rounding off

We attacked the problem of finding the maximum to a family of functions that are central to algebraic iterative methods. Through studying the problem from various perspectives, we were able to make something like a similarity transform. I find it very interesting that usually in mathematics, a shift in perspective enables one to solve the problem. Here, the shift in perspective was very subtle indeed. We needed to shift the axes in order to find that the entire family approximates a limiting function. This limiting function did allow us to find the maximum exactly, but only by resorting to the special functions, in particular Lambert’s W-function.

Like the previous maths problem I discussed, a lot of different branches came together. Here we needed calculus, infinite series, approximation theory and special functions. In the end, we were able to provide approximate values for the maximum and uniform bounds. Maybe it isn’t exactly solving the problem we set out to solve, but it’s certainly close enough.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s