The magnitude of the f and g variables generally goes down as the algorithm progresses. Make use of this by keeping tracking how many limbs are used, and when the number becomes small enough, make use of this to reduce the complexity of arithmetic on them.
Partial backport of secp256k1#831:
https://github.com/bitcoin-core/secp256k1/pull/831/commits/ebc1af700f9ec6e96586152b7090a2a6494308c3#diff-91cfb587705679268ee32d45895a62884faf262add85ba385cf55f74fcd51471R32-R575
Depends on D9410.