I’ve realized that some design choices in my bn256 implementation don’t seem well-motivated to everybody… including myself. So I’d like to document here all of the tricks I find myself forgetting.

### Subgroup Security

**Background.** Imagine that I have a secret scalar and I’ll multiply
my by any point that you give me, and return the output to
you as long as passes a simple publicly-known test. Your goal is to learn
.

If I don’t check anything about your point (and we’re operating on a Weierstrass
curve ), your best bet is an **Invalid Curve Attack**. See,
elliptic curve addition and doubling formulas only incorporate the
coefficient of a Weierstrass curve. So if you give me any random
and I start doing arithmetic on it, I’m operating on an implicit curve
where is chosen to satisfy the equation, and this
curve will likely have a drastically different structure than the one I intended
to work on. In particular, the implicit curve may have subgroups which are small
enough for the discrete logarithm problem (DLP) to be solvable by brute force,
and you may choose to lie in one of these subgroups. When you get back
, you can solve the DLP and learn where is the
order of the small subgroup. Given for a handful of different
relatively-prime , it’s easy to reconstruct the full 256-bit value of
with the Chinese-Remainder
Theorem.

This makes it necessary for me to check that your point is *on the curve*, which
can easily be done by checking that satisfies the curve formula. Is
this sufficient? Sometimes yes, but not always: it’s possible that the correct
curve also has subgroups where DLP is easy, outside of the subgroup we’re using
for cryptography where DLP is hard. In this case, you would mount an **Invalid
Subgroup Attack**, which is exactly the same as an Invalid Curve Attack but
restricted to just using different subgroups of the same curve.

Should a sufficient number of small subgroups exist on the curve to make an
Invalid Subgroup Attack feasible, then I’d need to check both that your point is
on the curve as well as *in the correct subgroup*. This is quite expensive
because I’d need to multiply by the 256-bit order of the correct subgroup
and check that the result is the point at infinity, which is at least as
expensive as multiplying by in the first place. (Side note: Subgroup
security for Montgomery curves is different than discussed, and involves **Twist
Attacks**; there are no Twist Attacks on a Weierstrass curve.)

**Discussion.** I’ve given all this background to say that my bn256
implementation implements solely `IsOnCurve`

checks on both
G1 and
G2. This is
sufficient for G1 because it is a *prime curve*, meaning it has only one
prime-order subgroup, and that’s the subgroup we use for cryptography. G2, on
the other hand, is quite a large curve and has many subgroups which are not used
for cryptography. However we can compute G2’s order and see that these subgroups
aren’t sufficient to launch and Invalid Subgroup Attack.

The full order of is given by formula, thanks to the way BN curves are constructed: where is the BN-parameter, is the order of the subgroup we’re using for cryptography, and

Finally,

```
13 *
7369 *
678523854563781785784486348657681406965565833599613983254937284932557401 *
65000549695646603732796438742359905742570406053903786389881062969044166799969
```

The final, largest factor is . I confess that it is clearly ideal to *not*
accept adversary-provided points for G2 and rely on the stronger structure of
G1. But if it is unavoidable, in order to recover a secret scalar an adversary
would likely have to solve DLP in the subgroup corresponding to the third prime
factor. This would take around steps.

[1]

### Lattice Decomposition

Lattice Decomposition is a generic technique for speeding up scalar multiplication on certain kinds of elliptic curves. I open-sourced an implementation of this technique as an un-merged feature branch.

**Background.** Take the curve
(playground). The first interesting
aspect of this curve it that it has 13 points, so it’s a prime-order curve. You
can choose any point on this curve and say that it maps to 1 (besides the point
at infinity, which maps to 0), and from there build up a map from any other
point on the curve to an integer 2 through 12 which respects both addition on
the curve and integer addition mod 13. This map is a group
isomorphism.

The second interesting aspect is that both the ‘base field’ (integers mod 7) and ‘curve field’ (integers mod 13) have cubic roots of unity, or solutions to the equation . For the base field, this number is 2 since , and for the curve field this number is 3 since . Cubic roots of unity are nice because we can multiply the x-coordinate of a point and get another point on the curve since , and they ultimately end up playing well with the elliptic curve’s addition law. That is, the function on our curve is equivalent to multiplying the point by 3. You can see this in the playground, where for example, , then , and finally .

From the point of view of optimization, what we’ve found is a way to multiply a
curve point by a fixed scalar with single base field operation. For curves of a
cryptographically-useful size, a single base field multiplication is going to be
more than 1000x faster than scalar multiplication on the curve. Now we just need
to find a way to harness this speed-up to multiply by *any* scalar, not just our
cubic root of unity.

**Lattices.** Our strategy for integrating this information into our scalar
multiplication routine is to take the user-provided point , use this trick
to quickly compute ( in the example above), multiply and
by different (hopefully small) scalars and , and output
where is the user-provided scalar.
There’s something called Shamir’s Trick that makes computing ,
given and , about as fast as computing or
individually; it goes like this:

So by minimizing the size of both and , we minimize the time spent in the algorithm. Then what is the smallest pair such that where is the order of the curve? ( in the example above) There’s a sharp edge here to be mindful of: the inputs to the function and are arbitrary integers, but ’s output is computed modulo .

The first thing we observe is that is additive: if we have and , then we know that . But honestly, it’s nicest when and are zero because then we can add multiples and to each other and get as many inputs to that output zero as we want. Wait, hold up?

- Saying that any integer combination of solutions to gives
another solution, is the same as saying that these solutions have
**closure**under addition. - The set of solutions includes the
**zero**vector since . - If is a solution, then so is the
**inverse**solution . We call it an inverse because , the zero vector.

We’ve ticked all the boxes that make this a group! Though we call it a lattice,
specifically, since the solutions are integer vectors combined under addition.
What this says about the output of as a whole, is that its output sort of
*tessellates* according to the lattice of inputs that take it to zero, and that
the smallest such that for any particular is
going to be the that does so in the lattice’s
fundamental
region.
To find this , we only need to observe that and if we find the nearest vector of the lattice to , then and will be
in the fundamental region. This is not as easy as it sounds though, so let’s
continue to unpack it.

**Technicalities.** The standard algorithm for finding the closest vector in
a lattice to some other arbitrary vector is called Babai’s Rounding Technique:

- Find a
*short basis*for the lattice. While there’s an infinite number of bases for any particular lattice, a short basis is a set of linearly independent, roughly orthogonal vectors of minimal bit length. This basis generates the lattice in a relatively intuitive way, such that a linear combination of the basis vectors with large coefficients doesn’t incidentally result in a very small vector. - Use linear algebra to find a linear combination of the basis vectors
with
*rational*coefficients that exactly equals the target vector, and then round the rational coefficients to integers. Perform the linear combination of the basis vectors with the rounded integer coefficients to get an element of the lattice that is*close to*(but not typically equal to) the target vector, and output this. Specifically, we embed the basis vectors as column vectors in a matrix and solve the equation for the rational vector . Note that we only need to store the first column vector of , and in our implementation we make this column vector (which would normally be rational) an integer vector by storing a common denominator separately. The common denominator is called`det`

because it ends up being the determinant of , due to how matrix inversion works. *(Optional)*The last remaining issue is that Babai’s Rounding Technique will sometimes return vectors with negative components, which can break lazily-written scalar multiplication algorithms. To prevent this from happening, we add the additional constraint to our short basis that at least one vector has all-positive components. We add a small multiple of this basis vector to every output, enough to ensure that every component of the output vector is positive as well.

A short basis is easily found by giving the LLL algorithm some example solutions to , and tinkering with the output until a basis that satisfies the additional constraint in step 3 is found. The overall approach generalizes to groups with more than one efficiently-computable endomorphism; in the discussion so far we’ve only used the endormorphism provided by a cubic root of unity. With GT, for example, we have multiple Frobenius endomorphisms to choose from.

**Results / Caveats.** The lattice decomposition technique described so far
makes scalar multiplication in G1 and G2 twice as fast, and four times faster in
GT. One major caveat to keep in mind though, is that the algorithm relies on
input points having a prescribed order to work correctly. If it is applied
to attacker-provided points that are on an invalid curve or invalid subgroup, it
will produce unexpected results.

[2]