Magical Algorithms 03 -- Polynomial Multiplication

← Back

Today’s article mainly discusses polynomial multiplication. This is probably the most hardcore article I’ve written so far, but I will try my best to explain it in the most accessible language possible, and I believe that all of you with a high school education can understand it.

Polynomial multiplication has applications in our daily lives in investment and financial management, market forecasting, building construction, etc.:

Compound interest calculation: When calculating compound interest, if the investment period is long and involves multiple interest calculations, the idea of polynomial multiplication can be used to understand it. Supposing the principal is $P$, the annual interest rate is $r$, and the investment is $n$ years, the calculation of the principal and interest sum for each year is similar to polynomial expansion. The principal and interest sum for the first year is $P(1 + r)$, for the second year is $P(1 + r)(1 + r)=P(1 + r)^2$, and so on, for the $n$-th year is $P(1 + r)^n$. Here, $(1 + r)^n$ expanded is an $n$-degree polynomial about $r$. Through the principle of polynomial multiplication, you can clearly see the impact of each year’s interest on the final yield, helping investors accurately calculate returns.

Sales trend analysis: When a merchant analyzes the sales trend of a certain commodity, they might build a polynomial model based on the sales data of the past few months or years. For example, using time $t$ as a variable, sales $S$ might be represented in the form $S = a_0 + a_1t + a_2t^2+\cdots+a_nt^n$, where $a_i$ are the coefficients of each term. Through polynomial multiplication, sales for a future period can be predicted based on the current point in time, helping merchants reasonably arrange inventory, formulate marketing strategies, etc.

At the same time, the multiplication of two integers can also be calculated through polynomial multiplication. From this, it can be seen that polynomial multiplication is extremely important in life.

Problem Description

For two polynomials:

\[\begin{align}& A(x) = a_0 + a_1x^1 + \dots + a_nx^n \\ & B(x) = b_0 + b_1x^1 + \dots + b_mx^m\end{align}\]

We want to find the coefficients of the polynomial $C(x) = A(x) \times B(x)$.

We might as well assume here that the two input polynomials are at most of degree $n$, and the output polynomial is at most of degree $2\cdot n$. Since the difference between polynomials is mainly reflected in the coefficients, we can use a coefficient vector of degree $n$ to represent a polynomial.

1A[i] = a_i => A(x) = A[0] + A[1] * x ** 1 + ...

Note: Here we assume that the time complexity of multiplication is $O(1)$.

Algorithm I: Loops

For such a problem, the most naive way is to calculate the coefficients in $C(x)$ separately through two loops. The pseudocode is as follows:

1def poly_mul(A, B): 
2  C[2n] = {0}
3  for i in range(n):
4    for j in range(n):
5      C[i + j] += A[i] * B[j]
6  return C

Obviously, the time complexity of this algorithm is $O(n^2)$.

Algorithm II: Polynomial Interpolation

Next, let’s introduce a theorem that needs to be used:

Theorem: For any $n$-degree polynomial, we can uniquely determine the coefficients of this polynomial by determining values of this polynomial at $n + 1$ or more distinct $x$’s.

First, let me introduce some definitions and theorems needed later:

Matrix multiplication: If the element in the $i$-th row and $j$-th column of matrix $A_{m \times n}$ is denoted as $a_{ij}$, and $B_{n \times s}[j, k] = b_{jk}$, then

\[(A \cdot B)[i, j] = \sum_{k = 1}^n a_{ik}b_{kj}\]

Matrix inversion: The inverse of a matrix can be understood as the “reciprocal” of a matrix: a matrix (number) multiplied by its inverse (reciprocal) equals the identity matrix ($1$). It is worth noting that only square matrices with a determinant non-zero can be inverted.

The proof is as follows:
\[\begin{align*}&\begin{cases}a_0 + a_1x_1 + \dots + a_nx_1^n = y_1\\a_0 + a_1x_2 + \dots + a_nx_2^n = y_2\\ \dots\\ a_0 + a_1x_{n+1} + \dots + a_nx_{n+1}^n = y_{n+1}\end{cases} \implies \begin{pmatrix}1 & x_1 & \dots & x_1^n \\ 1 & x_2 & \dots & x_2^n \\ &\dots \\ 1 & x_{n+1} & \dots &x_{n+1}^n\end{pmatrix}\cdot\begin{pmatrix}a_0\\a_1\\\dots\\a_n\end{pmatrix} = \begin{pmatrix}y_1\\y_2\\\dots\\y_{n+1}\end{pmatrix} \\ &A = \begin{pmatrix}1 & x_1 & \dots & x_1^n \\ 1 & x_2 & \dots & x_2^n \\ &\dots \\ 1 & x_{n+1} & \dots &x_{n+1}^n\end{pmatrix}\text{has a non-zero determinant, therefore we can find }A^{-1}\text{. We call matrices of the form }A\text{ Vandermonde matrices} \\ &\therefore \begin{pmatrix}a_0\\a_1\\\dots\\a_n\end{pmatrix} = A^{-1}\cdot \begin{pmatrix}y_1\\y_2\\\dots\\y_{n+1}\end{pmatrix}\end{align*}\]

This theorem is relatively intuitive. Essentially, what we say “two points determine a line”, “three points determine a plane” are all manifestations of this theorem in low dimensions.

So, for the polynomial $C = A \times B$, we can separately determine the values of $A$ and $B$ at $2n + 1$ distinct $x$’s: $A(x_1), A(x_2), \dots, A(x_{2n+1})$ and $B(x_1), B(x_2), \dots, B(x_{2n+1})$, and multiply them correspondingly, to obtain $C(x_1), C(x_2), \dots, C(x_{2n+1})$, hence we can determine the coefficients of $C$!

Finding the coefficient vector of $C$ is also very simple, just according to the proof, write down any $2n + 1$ distinct $x$’s to get their value vector $Y$ on $C$. Next, we find the $2n+1$-degree Vandermonde matrix $A$ corresponding to these $x$’s, and find its inverse $A^{-1}$. Finally, the coefficient vector of $C$ is found as $C = A^{-1}\cdot Y$. The pseudocode is as follows:

 1def poly_mul(A, B): 
 2  X = [i for i in range(2 * n + 1)]
 3  Y = [0 for i in range(2 * n + 1)]
 4  
 5  for i in range(2 * n + 1): # O(n ^ 2)
 6    a, b = 0, 0
 7    for j in range(n):
 8      a += A[j] * X[i] ** j
 9      b += B[j] * X[i] ** j
10    Y[i] = a * b
11  
12  V = [[0 for _ in range(2 * n + 1)] for _ in range(2 * n + 1)]
13  
14  for i in range(2 * n + 1): # O(n ^ 2)
15    for j in range(2 * n + 1): 
16      V[i][j] = X[i] ** j
17  
18  C = get_inverse(V) * Y
19  return C

From the pseudocode, it can be seen that this algorithm has at least an $O(n^2)$ time complexity.

Actually, the time complexity of get_inverse(V) is $O(n^3)$. So, the time complexity of this algorithm is even worse than Algorithm I.

This is too bad! Is such a clever method useless? If only we could quickly find a Vandermonde matrix and its inverse.

Algorithm III: Fast Fourier Transform (FFT)

First, let’s recall some high school knowledge needed later:

Complex plane: Every point in the complex plane corresponds to a complex number, where $(a, b)$ corresponds to $a + bi$. It is worth noting that we can also use polar coordinates to represent a point: $(\rho, \theta)$ represents the complex number $\rho e^{\theta i}$.

Complex number multiplication: The product of complex numbers represented by $(\rho_1, \theta_1)$ and $(\rho_2, \theta_2)$ in polar coordinates is $(\rho_1\cdot\rho_2, \theta_1+\theta_2)$. This theorem is easy to prove, it can be derived using trigonometric functions and sum formulas, so I won’t go into details here.

Let’s consider a complex number $\omega = e^{2\pi i / n}$. The point corresponding to this complex number in polar coordinates is $(1, 2\pi /n)$. At the same time, the point corresponding to $\omega^2$ is $(1, 4\pi /n)$, and $\omega^k$ is $(1, 2k\pi / n)$. It can be found that $1, \omega, \omega^2, \dots, \omega^{n-1}$ correspond to a series of points on the unit circle in the complex plane constantly rotating counterclockwise by $2\pi /n$ starting from the x-axis.

Obviously, $\omega$ has the following properties:

$\omega^{i} = \omega^{i + n}$
$\bar{\omega^i} = \omega^{n-i}$, where $\bar{a}$ is the complex conjugate of $a$.

At the same time, consider the Vandermonde matrix $V_{ij} = (\omega^i)^j = \omega^{ij}$, after a series of brilliant eigenvalue decompositions, it can be derived that: $V^{-1} = \frac{1}{n}\bar{V}$.

Wow! This way we can quickly obtain a Vandermonde determinant and its inverse! Consider Algorithm II, there is still one problem: calculating $C(\omega), C(\omega^2), \dots$ is still an $O(n^2)$ algorithm.

What should we do! Are our efforts going to waste again!!!

Of course not! Let’s consider the value of a polynomial $P$ at $x$:

\[\begin{align}P(x) = & a_{0} + a_{1}x + a_{2}x^{2} + \dots \\= & (a_{0} + a_{2}x^{2} + a_{4}(x^{2})^{2} + \dots) + (a_{1} + a_{3}x^{2} + a_{5}(x^{2})^{2} + \dots)x\\= & P_{even}(x^{2}) + P_{odd}(x^{2})x\end{align}\]

Therefore, to find $P(\omega^{0}), P(\omega^{1}), \dots, P(\omega^{n-1})$, we only need to find

\[P_{even}(\omega^{0}), P_{even}(\omega^{2}), \dots, P_{even}(\omega^{2n-2})\]

and

\[P_{odd}(\omega^{0}), P_{odd}(\omega^{2}), \dots, P_{odd}(\omega^{2n-2})\]

And these are the values of $2$ polynomials of degree $\frac{n}{2}$ at $\frac{n}{2}$ points ($\omega^{i} = \omega^{n+i}$), so we can derive a recurrence formula for the time complexity $T(n)$:

\[T(n) = 2T\left( \frac{n}{2} \right) + \Theta(n)\implies T(n) = O(n\log n)\]

Yay! Our polynomial multiplication has finally become an $O(n\log n)$ algorithm!

Tags: learning cs

← Back