pqc/external/flint-2.4.3/todo.txt

TODO
====

general
-------

* Write a flint2 memory manager, both reentrant and non-reentrant stack based
  versions

* [maybe] a type mpfr which is an alias for __mpfr_struct and using throughout


fmpz
----

* [maybe] Improve the functions fmpz_get_str and fmpz_set_str

* [maybe] figure out how to write robust test code for fmpz_read (which reads
  from stdin), perhaps using a pipe

* Inline or create inline versions of core fmpz functions.

* [maybe] Avoid the double allocation of both an mpz struct and limb data,
  having an fmpz point directly to a combined structure. This would require
  writing replacements for most mpz functions.


ulong_extras
------------

* in is_prime_pocklington allow the cofactor to be a perfect power not just
  prime

* factor out some common code between n_is_perfect_power235 and
  n_factor_power235

* n_mod2_preinv may be slower than the chip on Core2 due to the fact that it can
  pipeline 2 divisions. Check all occurrences of this function and replace with
  divisions where it will speed things up on Core2. Beware, this will slow things
  down on AMD, so it is necessary to do this per architecture. The macros in 
  nmod_vec will also be faster than the functions in ulong_extras, thus they should
  be tried first.

* add profile code for factor_trial, factor_one_line, factor_SQUFOF

* [maybe] make n_factor_t an array of length 1 so it can be passed by reference
  automatically, as per mpz_t's, etc

* [enhancement] Implement a primality test which only requires factoring of
  n-1 up to n^1/3 or n^1/4

* [enhancement] Implement a combined p-1 and p+1 primality test as per
  http://primes.utm.edu/prove/prove3_3.html

* [enhancement] Implement a quadratic sieve and use it in n_factor once things
  get too large for SQUFOF


long_extras
-----------

* write and use z_gcd and z_invert in fmpz_gcd and fmpz_invert, respectively


fmpz_vec
--------

* add a cache of mpfr's which can be used as temporaries for functions like
  _mpfr_vec_scalar_product

* test code for ulong_extras/revbin.c

* add test code for numerous mpfr_vec functions and mpfr_poly_mul_classical

* make use of mpfr type througout LLL, mpfr_vec and mpfr_mat modules


fmpz_factor
-----------

* Add primality testing, perfect power testing, fast factorisation
  (Brent-Pollard, QS, ...)


fmpz_mpoly / nmod_mpoly
-----------------------

* Write fmpz_mpoly_max_bits, use in t-mul_heap test code and mul_heap

* Write ACCUM2 and ACCUM3 assembly functions and use in mul_heap

* Make mul_heap take arrays of fmpz's as arguments and document function


nmod_poly
---------

* Make some assembly optimisations to nmod_poly module.

* Add basecase versions of log, sqrt, invsqrt series

* Add O(M(n)) powering mod x^n based on exp and log

* Implement fast mulmid and use to improve Newton iteration

* Determine cutoffs for compose_series_divconquer for default use in
  compose_series (only when one polynomial is small).

* Add asymptotically fast resultants?

* Optimise, write an underscore version of, and test
  nmod_poly_remove

* Improve powmod and powpowmod using precomputed Newton inverse and
  2^k-ary/sliding window powering.

* Maybe restructure the code in factor.c

* Add a (fast) function to convert an nmod_poly_factor_t to
  an expanded polynomial


fmpz_poly
---------

* add test code for fmpz_poly_max_limbs

* Improve the implementations of fmpz_poly_divrem, _div, and _rem, check that 
  the documentations still apply, and write test code for this --- all of this 
  makes more sense once there is a choice of algorithms

* Include test code for fmpz_poly_inv_series, once this method does anything 
  better than aliasing fmpz_poly_inv_newton

* Sort out the fmpz_poly_pseudo_div and _rem code.  Currently this is just 
  a hack to call fmpz_poly_pseudo_divrem

* Fix the inefficient cases in CRT_ui, and move the relevant parts of this
  function to the fmpz_vec module

* Avoid redundant work and memory allocation in fmpz_poly_bit_unpack
  and fmpz_poly_bit_unpack_unsigned.

* Add functions for composition, multiplication and division
  by a monic linear factor, i.e. P(x +/- c), P * (x +/- c), P / (x +/- c).

* xgcd_modular is really slow. But it is not clear why. 1/3 of the time is
  spent in resultant, but the CRT code or the nmod_poly_xgcd code may also
  be to blame.

* Make resultants use fast GCD?

* In fmpz_poly_pseudo_divrem_divconquer, fix the repeated memory allocation 
  of size O(lenA) in the case when lenA >> lenB.

fmpq_poly
---------

* add fmpq_poly_fprint_pretty

* Rewrite _fmpq_poly_interpolate_fmpz_vec to use the Newton form as done
  in the fmpz_poly interpolation function. In general this should
  be much faster.

* Add versions of fmpq_poly_interpolate_fmpz_vec for fmpq y values,
  and fmpq values of both x and y.

* Add mulhigh

* Add asymptotically fast resultants?

* Add pow_trunc


fmpz_mod_poly
-------------

* Replace fmpz_mod_poly_rem by a proper implementation which 
  actually saves work over divrem.  Then, also add test code.

* Implement a faster GCD function then the euclidean one, then 
  make the wrapping GCD function choose the appropriate algorithm, 
  and add some test code

fmpz_poly_mat
-------------

* Tune multiplication cutoffs.

* Take sparseness into account when selecting between algorithms.

* Investigate more clever pivoting strategies in row reduction.


arith
-----

* Think of a better name for this module and/or move parts of it
  to other modules.

* Write profiling code.

* Write a faster arith_divisors using the merge sort algorithm
  (see Sage's implementation). Alternatively (or as a complement)
  write a supernaturally fast _fmpz_vec_sort.

* Improve arith_divisors by using long and longlong arithmetic
  for divisors that fit in 1 or 2 limbs.

* Optimise memory management in mpq_harmonic.

* Maybe move the helper functions in primorial.c to the mpn_extras
  module.

* Implement computation of generalised harmonic numbers.

* Maybe: move Stirling number matrix functions to the fmpz_mat module.

* Implement computation of Bernoulli numbers modulo a prime
  (e.g. porting the code from flint 1)

* Implement multimodular computation of large Bernoulli numbers
  (e.g. porting bernmm)

* Implement rising factorials and falling factorials (x)_n, (x)^n
  as fmpz_poly functions, and add fmpz functions for their
  direct evaluation.

* Implement the binomial coefficient binomial(x,n) as an fmpq_poly
  function.

* Implement Fibonacci polynomials and fmpz Fibonacci numbers.

* Implement orthogonal polynomials (Jacobi, Hermite, Laguerre, Gegenbauer).

* Implement hypergeometric polynomials and series.

* Change the partition function code to use an fmpz (or mpz) instead of
  ulong for n, to allow n larger than 10^9 on 32 bits (or 10^19 on 64 bits!)

* Write tests for the arith_hrr_expsum_factored functions.

fmpz_mat
--------

* Add fmpz_mat/randajtai2.c based on Stehle's matrix.cpp in fpLLL
  (requires mpfr_t's).

* Add element getter and setter methods.

* Implement Strassen multiplication.

* Implement fast multiplication when when results are smaller than
  2^(FLINT_BITS-1) by using fmpz arithmetic directly. Also use 2^FLINT_BITS
  as one of the "primes" for multimodular multiplication, along with
  fast CRT code for this purpose.

* Write multiplication functions optimised for sparse matrices by changing
  the loop order and discarding zero multipliers.

* Implement fast null space computation.

* The Dixon p-adic solver should implement output-sensitive termination.

* The Dixon p-adic solver currently spends most of the time computing
  integer matrix-vector products. Instead of using a single prime, it
  is likely to be faster to use a product of primes to increase the
  proportion of time spent on modular linear algebra. The code should also
  use fast radix conversion instead of building up the result incrementally
  to improve performance when the output gets large.

* Maybe optimise multimodular multiplication by pre-transposing
  so that transposed nmod_mat multiplication can be used directly instead of
  creating a transposed copy in nmod_mat_mul. However, this doesn't help
  in the Strassen range unless there also is a transpose version of
  nmod_mat_mul_strassen.

* Use _fmpz_vec functions instead of for loops in some more places.

* Add transpose versions of common functions, in-place addmul etc.

* Take sparseness into account when selecting between algorithms.

* Maybe simplify the interface for row reduction by guaranteeing
  that the denominator is the determinant.


nmod_mat
--------

* Support BLAS and use this for multiplication when entries fit in a double
  before reduction. Even for large moduli, it might be faster to use
  repeated BLAS multiplications modulo a few small primes followed by CRT.
  Linear algebra operations would benefit from BLAS versions of triangular
  solving as well.

* Improve multiplication with packed entries using SSE. Maybe also write
  a Strassen for packed entries that does additions faster.

* Investigate why the constant of solving/rref/inverse compared to
  multiplication appears to be worse than in theory (recent paper by Jeannerod,
  Pernet and Storjohann).

* See if Strassen can be improved using combined addmul operations.

* Consider getting rid of the row pointer array, using offsets instead of
  window matrices. The row pointer is only useful for Gaussian elimination,
  but there we end up working with a separate permutation array anyway.

* Add element getter and setter methods, more convenience functions
  for setting the zero matrix, identity matrix, etc.

* Implement nmod_mat_pow.

* Add functions for computing A*B^T and A^T*B, using transpose
  multiplications directly to avoid creating a temporary copy.

* Maybe: add asserts to check that the modulus is a prime
  where this is assumed.

* Add transpose versions of common functions, in-place addmul etc.

* The current addmul/submul functions are misnamed since they
  implement a more general operation.

* Improve rref and inverse to perform everything in-place.


fmpq
----

* Add more functions for generating random numbers.

* Write a subquadratic fmpq_get_cfrac

* Implement subquadratic rational reconstruction. Also improve detection
  of integers, etc. and perhaps add CRT functions to hide the intermediate
  step going from residues -> integer -> rational.


fmpq_mat
--------

* Add more random functions.

* Add a user-friendly function for LUP decomposition.

* Add a nullspace function.

padic
-----

* Add test code for the various output formats; 
  perhaps in the form of examples?

* Implement padic_val_fac for generic inputs
ALL: Add flint 2014-05-18 22:03:37 +00:00			`TODO`
			`====`

			`general`
			`-------`

			`* Write a flint2 memory manager, both reentrant and non-reentrant stack based`
			`versions`

			`* [maybe] a type mpfr which is an alias for __mpfr_struct and using throughout`


			`fmpz`
			`----`

			`* [maybe] Improve the functions fmpz_get_str and fmpz_set_str`

			`* [maybe] figure out how to write robust test code for fmpz_read (which reads`
			`from stdin), perhaps using a pipe`

			`* Inline or create inline versions of core fmpz functions.`

			`* [maybe] Avoid the double allocation of both an mpz struct and limb data,`
			`having an fmpz point directly to a combined structure. This would require`
			`writing replacements for most mpz functions.`


			`ulong_extras`
			`------------`

			`* in is_prime_pocklington allow the cofactor to be a perfect power not just`
			`prime`

			`* factor out some common code between n_is_perfect_power235 and`
			`n_factor_power235`

			`* n_mod2_preinv may be slower than the chip on Core2 due to the fact that it can`
			`pipeline 2 divisions. Check all occurrences of this function and replace with`
			`divisions where it will speed things up on Core2. Beware, this will slow things`
			`down on AMD, so it is necessary to do this per architecture. The macros in`
			`nmod_vec will also be faster than the functions in ulong_extras, thus they should`
			`be tried first.`

			`* add profile code for factor_trial, factor_one_line, factor_SQUFOF`

			`* [maybe] make n_factor_t an array of length 1 so it can be passed by reference`
			`automatically, as per mpz_t's, etc`

			`* [enhancement] Implement a primality test which only requires factoring of`
			`n-1 up to n^1/3 or n^1/4`

			`* [enhancement] Implement a combined p-1 and p+1 primality test as per`
			`http://primes.utm.edu/prove/prove3_3.html`

			`* [enhancement] Implement a quadratic sieve and use it in n_factor once things`
			`get too large for SQUFOF`


			`long_extras`
			`-----------`

			`* write and use z_gcd and z_invert in fmpz_gcd and fmpz_invert, respectively`


			`fmpz_vec`
			`--------`

			`* add a cache of mpfr's which can be used as temporaries for functions like`
			`_mpfr_vec_scalar_product`

			`* test code for ulong_extras/revbin.c`

			`* add test code for numerous mpfr_vec functions and mpfr_poly_mul_classical`

			`* make use of mpfr type througout LLL, mpfr_vec and mpfr_mat modules`


			`fmpz_factor`
			`-----------`

			`* Add primality testing, perfect power testing, fast factorisation`
			`(Brent-Pollard, QS, ...)`


			`fmpz_mpoly / nmod_mpoly`
			`-----------------------`

			`* Write fmpz_mpoly_max_bits, use in t-mul_heap test code and mul_heap`

			`* Write ACCUM2 and ACCUM3 assembly functions and use in mul_heap`

			`* Make mul_heap take arrays of fmpz's as arguments and document function`


			`nmod_poly`
			`---------`

			`* Make some assembly optimisations to nmod_poly module.`

			`* Add basecase versions of log, sqrt, invsqrt series`

			`* Add O(M(n)) powering mod x^n based on exp and log`

			`* Implement fast mulmid and use to improve Newton iteration`

			`* Determine cutoffs for compose_series_divconquer for default use in`
			`compose_series (only when one polynomial is small).`

			`* Add asymptotically fast resultants?`

			`* Optimise, write an underscore version of, and test`
			`nmod_poly_remove`

			`* Improve powmod and powpowmod using precomputed Newton inverse and`
			`2^k-ary/sliding window powering.`

			`* Maybe restructure the code in factor.c`

			`* Add a (fast) function to convert an nmod_poly_factor_t to`
			`an expanded polynomial`


			`fmpz_poly`
			`---------`

			`* add test code for fmpz_poly_max_limbs`

			`* Improve the implementations of fmpz_poly_divrem, _div, and _rem, check that`
			`the documentations still apply, and write test code for this --- all of this`
			`makes more sense once there is a choice of algorithms`

			`* Include test code for fmpz_poly_inv_series, once this method does anything`
			`better than aliasing fmpz_poly_inv_newton`

			`* Sort out the fmpz_poly_pseudo_div and _rem code. Currently this is just`
			`a hack to call fmpz_poly_pseudo_divrem`

			`* Fix the inefficient cases in CRT_ui, and move the relevant parts of this`
			`function to the fmpz_vec module`

			`* Avoid redundant work and memory allocation in fmpz_poly_bit_unpack`
			`and fmpz_poly_bit_unpack_unsigned.`

			`* Add functions for composition, multiplication and division`
			`by a monic linear factor, i.e. P(x +/- c), P * (x +/- c), P / (x +/- c).`

			`* xgcd_modular is really slow. But it is not clear why. 1/3 of the time is`
			`spent in resultant, but the CRT code or the nmod_poly_xgcd code may also`
			`be to blame.`

			`* Make resultants use fast GCD?`

			`* In fmpz_poly_pseudo_divrem_divconquer, fix the repeated memory allocation`
			`of size O(lenA) in the case when lenA >> lenB.`

			`fmpq_poly`
			`---------`

			`* add fmpq_poly_fprint_pretty`

			`* Rewrite _fmpq_poly_interpolate_fmpz_vec to use the Newton form as done`
			`in the fmpz_poly interpolation function. In general this should`
			`be much faster.`

			`* Add versions of fmpq_poly_interpolate_fmpz_vec for fmpq y values,`
			`and fmpq values of both x and y.`

			`* Add mulhigh`

			`* Add asymptotically fast resultants?`

			`* Add pow_trunc`


			`fmpz_mod_poly`
			`-------------`

			`* Replace fmpz_mod_poly_rem by a proper implementation which`
			`actually saves work over divrem. Then, also add test code.`

			`* Implement a faster GCD function then the euclidean one, then`
			`make the wrapping GCD function choose the appropriate algorithm,`
			`and add some test code`

			`fmpz_poly_mat`
			`-------------`

			`* Tune multiplication cutoffs.`

			`* Take sparseness into account when selecting between algorithms.`

			`* Investigate more clever pivoting strategies in row reduction.`


			`arith`
			`-----`

			`* Think of a better name for this module and/or move parts of it`
			`to other modules.`

			`* Write profiling code.`

			`* Write a faster arith_divisors using the merge sort algorithm`
			`(see Sage's implementation). Alternatively (or as a complement)`
			`write a supernaturally fast _fmpz_vec_sort.`

			`* Improve arith_divisors by using long and longlong arithmetic`
			`for divisors that fit in 1 or 2 limbs.`

			`* Optimise memory management in mpq_harmonic.`

			`* Maybe move the helper functions in primorial.c to the mpn_extras`
			`module.`

			`* Implement computation of generalised harmonic numbers.`

			`* Maybe: move Stirling number matrix functions to the fmpz_mat module.`

			`* Implement computation of Bernoulli numbers modulo a prime`
			`(e.g. porting the code from flint 1)`

			`* Implement multimodular computation of large Bernoulli numbers`
			`(e.g. porting bernmm)`

			`* Implement rising factorials and falling factorials (x)_n, (x)^n`
			`as fmpz_poly functions, and add fmpz functions for their`
			`direct evaluation.`

			`* Implement the binomial coefficient binomial(x,n) as an fmpq_poly`
			`function.`

			`* Implement Fibonacci polynomials and fmpz Fibonacci numbers.`

			`* Implement orthogonal polynomials (Jacobi, Hermite, Laguerre, Gegenbauer).`

			`* Implement hypergeometric polynomials and series.`

			`* Change the partition function code to use an fmpz (or mpz) instead of`
			`ulong for n, to allow n larger than 10^9 on 32 bits (or 10^19 on 64 bits!)`

			`* Write tests for the arith_hrr_expsum_factored functions.`

			`fmpz_mat`
			`--------`

			`* Add fmpz_mat/randajtai2.c based on Stehle's matrix.cpp in fpLLL`
			`(requires mpfr_t's).`

			`* Add element getter and setter methods.`

			`* Implement Strassen multiplication.`

			`* Implement fast multiplication when when results are smaller than`
			`2^(FLINT_BITS-1) by using fmpz arithmetic directly. Also use 2^FLINT_BITS`
			`as one of the "primes" for multimodular multiplication, along with`
			`fast CRT code for this purpose.`

			`* Write multiplication functions optimised for sparse matrices by changing`
			`the loop order and discarding zero multipliers.`

			`* Implement fast null space computation.`

			`* The Dixon p-adic solver should implement output-sensitive termination.`

			`* The Dixon p-adic solver currently spends most of the time computing`
			`integer matrix-vector products. Instead of using a single prime, it`
			`is likely to be faster to use a product of primes to increase the`
			`proportion of time spent on modular linear algebra. The code should also`
			`use fast radix conversion instead of building up the result incrementally`
			`to improve performance when the output gets large.`

			`* Maybe optimise multimodular multiplication by pre-transposing`
			`so that transposed nmod_mat multiplication can be used directly instead of`
			`creating a transposed copy in nmod_mat_mul. However, this doesn't help`
			`in the Strassen range unless there also is a transpose version of`
			`nmod_mat_mul_strassen.`

			`* Use _fmpz_vec functions instead of for loops in some more places.`

			`* Add transpose versions of common functions, in-place addmul etc.`

			`* Take sparseness into account when selecting between algorithms.`

			`* Maybe simplify the interface for row reduction by guaranteeing`
			`that the denominator is the determinant.`


			`nmod_mat`
			`--------`

			`* Support BLAS and use this for multiplication when entries fit in a double`
			`before reduction. Even for large moduli, it might be faster to use`
			`repeated BLAS multiplications modulo a few small primes followed by CRT.`
			`Linear algebra operations would benefit from BLAS versions of triangular`
			`solving as well.`

			`* Improve multiplication with packed entries using SSE. Maybe also write`
			`a Strassen for packed entries that does additions faster.`

			`* Investigate why the constant of solving/rref/inverse compared to`
			`multiplication appears to be worse than in theory (recent paper by Jeannerod,`
			`Pernet and Storjohann).`

			`* See if Strassen can be improved using combined addmul operations.`

			`* Consider getting rid of the row pointer array, using offsets instead of`
			`window matrices. The row pointer is only useful for Gaussian elimination,`
			`but there we end up working with a separate permutation array anyway.`

			`* Add element getter and setter methods, more convenience functions`
			`for setting the zero matrix, identity matrix, etc.`

			`* Implement nmod_mat_pow.`

			`* Add functions for computing AB^T and A^TB, using transpose`
			`multiplications directly to avoid creating a temporary copy.`

			`* Maybe: add asserts to check that the modulus is a prime`
			`where this is assumed.`

			`* Add transpose versions of common functions, in-place addmul etc.`

			`* The current addmul/submul functions are misnamed since they`
			`implement a more general operation.`

			`* Improve rref and inverse to perform everything in-place.`


			`fmpq`
			`----`

			`* Add more functions for generating random numbers.`

			`* Write a subquadratic fmpq_get_cfrac`

			`* Implement subquadratic rational reconstruction. Also improve detection`
			`of integers, etc. and perhaps add CRT functions to hide the intermediate`
			`step going from residues -> integer -> rational.`


			`fmpq_mat`
			`--------`

			`* Add more random functions.`

			`* Add a user-friendly function for LUP decomposition.`

			`* Add a nullspace function.`

			`padic`
			`-----`

			`* Add test code for the various output formats;`
			`perhaps in the form of examples?`

			`* Implement padic_val_fac for generic inputs`