Montgomery’s Trick
Note: This is extracted from our article on the power of GPU parallelization.
The objective is to invert a batch or array of field elements. In a naïve approach, one would invert each element in the array. This would mean applying the extended Euclidean algorithm for each element to compute its inverse. However, since field inversion is an expensive operation, we will use something called Montgomery’s trick.
It starts with an array of elements to invert.
It then computes the accumulated products up to each element.
This could be done with as many products as there are elements in the array.
In the next step, we compute the field inversion for the final accumulated product, using the extended Euclidean algorithm.
This will be the only time we use this expensive operation.
Now, we want to calculate the inversion of each element from the inversion of the final accumulated product. The trick uses the previously accumulated products for this, as follows.
How does that work? Well, we could explain it by imagining the field inversion as a division.
This is an intuitive representation only, since there is no such thing as division in
The inversion of the accumulated product is equal to the product of all the individual inverses. By multiplying it with the elements we don’t want, those inversions are canceled out, and only the element we want remains.
Now it can calculate the inversion of the previous accumulated product.
The same idea applies.
This idea would now be sequentially repeated for the remaining elements in the array.
Element | Inversion of the accumulated product | Inversion of the element |
---|---|---|
Which leaves us with every element’s inversion.
But why have we done all this?
Well, for an array with
- A naïve implementation would need
field inversions. - Montgomery’s trick needs
field multiplications and only one field inversion.- Needs
multiplications to compute the accumulated products. - Needs one field inversion for the final accumulated product.
- Needs
multiplications to invert each element.
- Needs
Knowing how expensive the extended Euclidean algorithm is, we will gladly take the trade.