358 lines
10 KiB
Plaintext
358 lines
10 KiB
Plaintext
|
TODO
|
||
|
====
|
||
|
|
||
|
general
|
||
|
-------
|
||
|
|
||
|
* Write a flint2 memory manager, both reentrant and non-reentrant stack based
|
||
|
versions
|
||
|
|
||
|
* [maybe] a type mpfr which is an alias for __mpfr_struct and using throughout
|
||
|
|
||
|
|
||
|
fmpz
|
||
|
----
|
||
|
|
||
|
* [maybe] Improve the functions fmpz_get_str and fmpz_set_str
|
||
|
|
||
|
* [maybe] figure out how to write robust test code for fmpz_read (which reads
|
||
|
from stdin), perhaps using a pipe
|
||
|
|
||
|
* Inline or create inline versions of core fmpz functions.
|
||
|
|
||
|
* [maybe] Avoid the double allocation of both an mpz struct and limb data,
|
||
|
having an fmpz point directly to a combined structure. This would require
|
||
|
writing replacements for most mpz functions.
|
||
|
|
||
|
|
||
|
ulong_extras
|
||
|
------------
|
||
|
|
||
|
* in is_prime_pocklington allow the cofactor to be a perfect power not just
|
||
|
prime
|
||
|
|
||
|
* factor out some common code between n_is_perfect_power235 and
|
||
|
n_factor_power235
|
||
|
|
||
|
* n_mod2_preinv may be slower than the chip on Core2 due to the fact that it can
|
||
|
pipeline 2 divisions. Check all occurrences of this function and replace with
|
||
|
divisions where it will speed things up on Core2. Beware, this will slow things
|
||
|
down on AMD, so it is necessary to do this per architecture. The macros in
|
||
|
nmod_vec will also be faster than the functions in ulong_extras, thus they should
|
||
|
be tried first.
|
||
|
|
||
|
* add profile code for factor_trial, factor_one_line, factor_SQUFOF
|
||
|
|
||
|
* [maybe] make n_factor_t an array of length 1 so it can be passed by reference
|
||
|
automatically, as per mpz_t's, etc
|
||
|
|
||
|
* [enhancement] Implement a primality test which only requires factoring of
|
||
|
n-1 up to n^1/3 or n^1/4
|
||
|
|
||
|
* [enhancement] Implement a combined p-1 and p+1 primality test as per
|
||
|
http://primes.utm.edu/prove/prove3_3.html
|
||
|
|
||
|
* [enhancement] Implement a quadratic sieve and use it in n_factor once things
|
||
|
get too large for SQUFOF
|
||
|
|
||
|
|
||
|
long_extras
|
||
|
-----------
|
||
|
|
||
|
* write and use z_gcd and z_invert in fmpz_gcd and fmpz_invert, respectively
|
||
|
|
||
|
|
||
|
fmpz_vec
|
||
|
--------
|
||
|
|
||
|
* add a cache of mpfr's which can be used as temporaries for functions like
|
||
|
_mpfr_vec_scalar_product
|
||
|
|
||
|
* test code for ulong_extras/revbin.c
|
||
|
|
||
|
* add test code for numerous mpfr_vec functions and mpfr_poly_mul_classical
|
||
|
|
||
|
* make use of mpfr type througout LLL, mpfr_vec and mpfr_mat modules
|
||
|
|
||
|
|
||
|
fmpz_factor
|
||
|
-----------
|
||
|
|
||
|
* Add primality testing, perfect power testing, fast factorisation
|
||
|
(Brent-Pollard, QS, ...)
|
||
|
|
||
|
|
||
|
fmpz_mpoly / nmod_mpoly
|
||
|
-----------------------
|
||
|
|
||
|
* Write fmpz_mpoly_max_bits, use in t-mul_heap test code and mul_heap
|
||
|
|
||
|
* Write ACCUM2 and ACCUM3 assembly functions and use in mul_heap
|
||
|
|
||
|
* Make mul_heap take arrays of fmpz's as arguments and document function
|
||
|
|
||
|
|
||
|
nmod_poly
|
||
|
---------
|
||
|
|
||
|
* Make some assembly optimisations to nmod_poly module.
|
||
|
|
||
|
* Add basecase versions of log, sqrt, invsqrt series
|
||
|
|
||
|
* Add O(M(n)) powering mod x^n based on exp and log
|
||
|
|
||
|
* Implement fast mulmid and use to improve Newton iteration
|
||
|
|
||
|
* Determine cutoffs for compose_series_divconquer for default use in
|
||
|
compose_series (only when one polynomial is small).
|
||
|
|
||
|
* Add asymptotically fast resultants?
|
||
|
|
||
|
* Optimise, write an underscore version of, and test
|
||
|
nmod_poly_remove
|
||
|
|
||
|
* Improve powmod and powpowmod using precomputed Newton inverse and
|
||
|
2^k-ary/sliding window powering.
|
||
|
|
||
|
* Maybe restructure the code in factor.c
|
||
|
|
||
|
* Add a (fast) function to convert an nmod_poly_factor_t to
|
||
|
an expanded polynomial
|
||
|
|
||
|
|
||
|
fmpz_poly
|
||
|
---------
|
||
|
|
||
|
* add test code for fmpz_poly_max_limbs
|
||
|
|
||
|
* Improve the implementations of fmpz_poly_divrem, _div, and _rem, check that
|
||
|
the documentations still apply, and write test code for this --- all of this
|
||
|
makes more sense once there is a choice of algorithms
|
||
|
|
||
|
* Include test code for fmpz_poly_inv_series, once this method does anything
|
||
|
better than aliasing fmpz_poly_inv_newton
|
||
|
|
||
|
* Sort out the fmpz_poly_pseudo_div and _rem code. Currently this is just
|
||
|
a hack to call fmpz_poly_pseudo_divrem
|
||
|
|
||
|
* Fix the inefficient cases in CRT_ui, and move the relevant parts of this
|
||
|
function to the fmpz_vec module
|
||
|
|
||
|
* Avoid redundant work and memory allocation in fmpz_poly_bit_unpack
|
||
|
and fmpz_poly_bit_unpack_unsigned.
|
||
|
|
||
|
* Add functions for composition, multiplication and division
|
||
|
by a monic linear factor, i.e. P(x +/- c), P * (x +/- c), P / (x +/- c).
|
||
|
|
||
|
* xgcd_modular is really slow. But it is not clear why. 1/3 of the time is
|
||
|
spent in resultant, but the CRT code or the nmod_poly_xgcd code may also
|
||
|
be to blame.
|
||
|
|
||
|
* Make resultants use fast GCD?
|
||
|
|
||
|
* In fmpz_poly_pseudo_divrem_divconquer, fix the repeated memory allocation
|
||
|
of size O(lenA) in the case when lenA >> lenB.
|
||
|
|
||
|
fmpq_poly
|
||
|
---------
|
||
|
|
||
|
* add fmpq_poly_fprint_pretty
|
||
|
|
||
|
* Rewrite _fmpq_poly_interpolate_fmpz_vec to use the Newton form as done
|
||
|
in the fmpz_poly interpolation function. In general this should
|
||
|
be much faster.
|
||
|
|
||
|
* Add versions of fmpq_poly_interpolate_fmpz_vec for fmpq y values,
|
||
|
and fmpq values of both x and y.
|
||
|
|
||
|
* Add mulhigh
|
||
|
|
||
|
* Add asymptotically fast resultants?
|
||
|
|
||
|
* Add pow_trunc
|
||
|
|
||
|
|
||
|
fmpz_mod_poly
|
||
|
-------------
|
||
|
|
||
|
* Replace fmpz_mod_poly_rem by a proper implementation which
|
||
|
actually saves work over divrem. Then, also add test code.
|
||
|
|
||
|
* Implement a faster GCD function then the euclidean one, then
|
||
|
make the wrapping GCD function choose the appropriate algorithm,
|
||
|
and add some test code
|
||
|
|
||
|
fmpz_poly_mat
|
||
|
-------------
|
||
|
|
||
|
* Tune multiplication cutoffs.
|
||
|
|
||
|
* Take sparseness into account when selecting between algorithms.
|
||
|
|
||
|
* Investigate more clever pivoting strategies in row reduction.
|
||
|
|
||
|
|
||
|
arith
|
||
|
-----
|
||
|
|
||
|
* Think of a better name for this module and/or move parts of it
|
||
|
to other modules.
|
||
|
|
||
|
* Write profiling code.
|
||
|
|
||
|
* Write a faster arith_divisors using the merge sort algorithm
|
||
|
(see Sage's implementation). Alternatively (or as a complement)
|
||
|
write a supernaturally fast _fmpz_vec_sort.
|
||
|
|
||
|
* Improve arith_divisors by using long and longlong arithmetic
|
||
|
for divisors that fit in 1 or 2 limbs.
|
||
|
|
||
|
* Optimise memory management in mpq_harmonic.
|
||
|
|
||
|
* Maybe move the helper functions in primorial.c to the mpn_extras
|
||
|
module.
|
||
|
|
||
|
* Implement computation of generalised harmonic numbers.
|
||
|
|
||
|
* Maybe: move Stirling number matrix functions to the fmpz_mat module.
|
||
|
|
||
|
* Implement computation of Bernoulli numbers modulo a prime
|
||
|
(e.g. porting the code from flint 1)
|
||
|
|
||
|
* Implement multimodular computation of large Bernoulli numbers
|
||
|
(e.g. porting bernmm)
|
||
|
|
||
|
* Implement rising factorials and falling factorials (x)_n, (x)^n
|
||
|
as fmpz_poly functions, and add fmpz functions for their
|
||
|
direct evaluation.
|
||
|
|
||
|
* Implement the binomial coefficient binomial(x,n) as an fmpq_poly
|
||
|
function.
|
||
|
|
||
|
* Implement Fibonacci polynomials and fmpz Fibonacci numbers.
|
||
|
|
||
|
* Implement orthogonal polynomials (Jacobi, Hermite, Laguerre, Gegenbauer).
|
||
|
|
||
|
* Implement hypergeometric polynomials and series.
|
||
|
|
||
|
* Change the partition function code to use an fmpz (or mpz) instead of
|
||
|
ulong for n, to allow n larger than 10^9 on 32 bits (or 10^19 on 64 bits!)
|
||
|
|
||
|
* Write tests for the arith_hrr_expsum_factored functions.
|
||
|
|
||
|
fmpz_mat
|
||
|
--------
|
||
|
|
||
|
* Add fmpz_mat/randajtai2.c based on Stehle's matrix.cpp in fpLLL
|
||
|
(requires mpfr_t's).
|
||
|
|
||
|
* Add element getter and setter methods.
|
||
|
|
||
|
* Implement Strassen multiplication.
|
||
|
|
||
|
* Implement fast multiplication when when results are smaller than
|
||
|
2^(FLINT_BITS-1) by using fmpz arithmetic directly. Also use 2^FLINT_BITS
|
||
|
as one of the "primes" for multimodular multiplication, along with
|
||
|
fast CRT code for this purpose.
|
||
|
|
||
|
* Write multiplication functions optimised for sparse matrices by changing
|
||
|
the loop order and discarding zero multipliers.
|
||
|
|
||
|
* Implement fast null space computation.
|
||
|
|
||
|
* The Dixon p-adic solver should implement output-sensitive termination.
|
||
|
|
||
|
* The Dixon p-adic solver currently spends most of the time computing
|
||
|
integer matrix-vector products. Instead of using a single prime, it
|
||
|
is likely to be faster to use a product of primes to increase the
|
||
|
proportion of time spent on modular linear algebra. The code should also
|
||
|
use fast radix conversion instead of building up the result incrementally
|
||
|
to improve performance when the output gets large.
|
||
|
|
||
|
* Maybe optimise multimodular multiplication by pre-transposing
|
||
|
so that transposed nmod_mat multiplication can be used directly instead of
|
||
|
creating a transposed copy in nmod_mat_mul. However, this doesn't help
|
||
|
in the Strassen range unless there also is a transpose version of
|
||
|
nmod_mat_mul_strassen.
|
||
|
|
||
|
* Use _fmpz_vec functions instead of for loops in some more places.
|
||
|
|
||
|
* Add transpose versions of common functions, in-place addmul etc.
|
||
|
|
||
|
* Take sparseness into account when selecting between algorithms.
|
||
|
|
||
|
* Maybe simplify the interface for row reduction by guaranteeing
|
||
|
that the denominator is the determinant.
|
||
|
|
||
|
|
||
|
nmod_mat
|
||
|
--------
|
||
|
|
||
|
* Support BLAS and use this for multiplication when entries fit in a double
|
||
|
before reduction. Even for large moduli, it might be faster to use
|
||
|
repeated BLAS multiplications modulo a few small primes followed by CRT.
|
||
|
Linear algebra operations would benefit from BLAS versions of triangular
|
||
|
solving as well.
|
||
|
|
||
|
* Improve multiplication with packed entries using SSE. Maybe also write
|
||
|
a Strassen for packed entries that does additions faster.
|
||
|
|
||
|
* Investigate why the constant of solving/rref/inverse compared to
|
||
|
multiplication appears to be worse than in theory (recent paper by Jeannerod,
|
||
|
Pernet and Storjohann).
|
||
|
|
||
|
* See if Strassen can be improved using combined addmul operations.
|
||
|
|
||
|
* Consider getting rid of the row pointer array, using offsets instead of
|
||
|
window matrices. The row pointer is only useful for Gaussian elimination,
|
||
|
but there we end up working with a separate permutation array anyway.
|
||
|
|
||
|
* Add element getter and setter methods, more convenience functions
|
||
|
for setting the zero matrix, identity matrix, etc.
|
||
|
|
||
|
* Implement nmod_mat_pow.
|
||
|
|
||
|
* Add functions for computing A*B^T and A^T*B, using transpose
|
||
|
multiplications directly to avoid creating a temporary copy.
|
||
|
|
||
|
* Maybe: add asserts to check that the modulus is a prime
|
||
|
where this is assumed.
|
||
|
|
||
|
* Add transpose versions of common functions, in-place addmul etc.
|
||
|
|
||
|
* The current addmul/submul functions are misnamed since they
|
||
|
implement a more general operation.
|
||
|
|
||
|
* Improve rref and inverse to perform everything in-place.
|
||
|
|
||
|
|
||
|
fmpq
|
||
|
----
|
||
|
|
||
|
* Add more functions for generating random numbers.
|
||
|
|
||
|
* Write a subquadratic fmpq_get_cfrac
|
||
|
|
||
|
* Implement subquadratic rational reconstruction. Also improve detection
|
||
|
of integers, etc. and perhaps add CRT functions to hide the intermediate
|
||
|
step going from residues -> integer -> rational.
|
||
|
|
||
|
|
||
|
fmpq_mat
|
||
|
--------
|
||
|
|
||
|
* Add more random functions.
|
||
|
|
||
|
* Add a user-friendly function for LUP decomposition.
|
||
|
|
||
|
* Add a nullspace function.
|
||
|
|
||
|
padic
|
||
|
-----
|
||
|
|
||
|
* Add test code for the various output formats;
|
||
|
perhaps in the form of examples?
|
||
|
|
||
|
* Implement padic_val_fac for generic inputs
|
||
|
|