Efficiency Improvements in Maple 9 - Maple Programming Help

Efficiency Improvements in Maple 9

 A number of efficiency improvements have been applied to make Maple 9 faster and use less memory.

Long Integer Arithmetic

 • Maple 9 uses GNU Multiple Precision (GMP) library for integer arithmetic when the operands are longer than kernelopts(gmpthreshold) digits. GMP is very fast for arbitrary precision arithmetic on integers. The speed of GMP is achieved by using fullwords as the basic arithmetic type and sophisticated algorithms, and by including carefully optimized assembly code for the most common inner loops for many different CPUs. For more details about GMP library, see GMP.
 The following example runs in 160 seconds on Maple 8 but only in 6 seconds on Maple 9.

Examples

 > $\mathrm{add}\left(\frac{1}{{k}^{2}},k=1..100000\right):$

rtable_eval

 • Accessing elements from a Matrix, Vector, or Array inside a procedure is now more efficient in Maple 9.  Evaluation rules for these structures, also known as rtables, have been changed so that it is possible to avoid repeated deep evaluation when elements are referenced.
 Applying eval() to an rtable returns immediately without changing the given rtable.  This is unlike a list, which maps eval() onto each element and returns a copy.  Evaluation of rtable elements is normally deferred until elements are referenced.  If the same elements are repeatedly referenced, evaluation of those elements is also repeated.  Again, this is in contrast to a list, which when individual elements are referenced, needs only to apply one level of evaluation.  Using the rtable_eval(A) command provides a way to evaluate every element once so that, like lists, future references can bypass evaluation.
 When Maple has enough information to determine that a given rtable does not need full evaluation, it automatically uses the one level evaluation scheme.  This can be done, for example, if the datatype is set to a hardware type, or if the rtable only ever contained literal data.  In addition, many internal algorithms now use rtable_eval to avoid repeated full evaluations.
 The following example compares the cost of an $\mathrm{O}\left({N}^{3}\right)$ operation with and without rtable_eval() using a Matrix containing polynomials that are expensive to evaluate.
 > F := proc(n) option remember; if n <= 2 then n else x*F(n-1)+F(n-2); end if; end proc:
 > M := Matrix(5,5,(i,j)->F(5*i+j)):
 > dostuff1 := proc(M) local i, j, k, y;     for i from 1 to op([1,1],M) do         for j from 1 to op([1,2],M) do             for k from 1 to op([1,2],M) do                 y := M[i,j];             end do;         end do;     end do; end proc:
 > dostuff2 := proc(M) local i, j, k, y;     rtable_eval(M,'inplace');     for i from 1 to op([1,1],M) do         for j from 1 to op([1,2],M) do             for k from 1 to op([1,2],M) do                 y := M[i,j];             end do;         end do;     end do; end proc:
 > time(dostuff1(M));
 ${0.944}$ (1)
 > time(dostuff2(M));
 ${0.206}$ (2)

Freelist Changes

 • Before Maple 9, Maple used a linked list to store large free memory blocks. To allocate a large block, a linear scan of the list was performed to find the first block large enough to satisfy the request.  In most cases this was sufficient.  However, as problem sizes increased and larger settings for gcfreq became common, the average length of these free lists increased. Internal testing found examples whose running time increased as gcfreq increased (usually increasing gcfreq decreases running time).  The problem was long free lists which took a long time to search.  For Maple 9, the free lists have been replaced by a tree.  The tree allows for faster searches, especially when lists would have become long.
 In general there is no slowdown related to using the tree.  For those cases where long free lists were a problem, the tree accelerates allocations considerably.  A test file that takes 56 seconds in Maple 8, takes 9 seconds for Maple 9.  Another test that took 50 seconds in Maple 8 now takes only 8. As well, this change fixes reports involving Maple performance decreasing as memory allocation increases.

Dynamic Hashtables

 • The hashtables in Maple 8 were designed to give good performance for most cases.  However in extreme cases, when a very large amount of data is inserted into the table, their performance decreased.  To fix this, Maple 9 has dynamic hashtables.  Dynamic hashtables reorganize themselves so that table operations remain fast even when large amounts of data are inserted.  Due to the overhead of reorganization, a dynamic hashtable is only used when a normal hashtable becomes overloaded.  Therefore using dynamic hashtables does not cause a slowdown.
 A simple way to test the performance of the dynamic hashtables is to perform a large number of insertions and searches on a table.  The following example runs in 430 seconds in Maple 8, but only 200 seconds in Maple 9.
 > num := 2^22:
 > a := table():
 > r := rand( 1..num ):
 > for i from 1 to num do     a[r()] := r(); end do: for i from 1 to num do     a[r()]: end do:
 Dynamic hashtables also interact better with the Maple memory manager.  This can mean a significant reduction in memory allocation when dealing with very large tables.  The example above allocates 62 Mb in Maple 8 and 40 Mb in Maple 9.

LinearSolve

 • The LinearSolve function in the LinearAlgebra package has a new solver, which is activated by supplying the optional parameter method=hybrid. This solver does the LU factorization in hardware floating-point mode followed by iterative refinement of the solution done in software floating-point mode. This can be measurably more efficient, especially at higher settings of Digits.

Examples

 > with(LinearAlgebra):
 > UseHardwareFloats:=false:
 > m := RandomMatrix(100,outputoptions=[datatype=sfloat]):
 > v := RandomVector(100,outputoptions=[datatype=sfloat]):
 > Digits := 40:
 > st:=time():hsol:=LinearSolve(m,v,method=hybrid):time()-st;
 ${0.620}$ (3)
 > st:=time():lsol:=LinearSolve(m,v):time()-st;
 ${1.112}$ (4)
 > Norm(m.hsol-v);
 ${6.2}{}{{10}}^{{-36}}$ (5)
 > Norm(m.lsol-v);
 ${2.64}{}{{10}}^{{-35}}$ (6)

LinearAlgebra Operations

 • The Basic Linear Algebra Subprograms (BLAS) libraries for Sun and Linux platforms now have built-in architecture-specific optimizations, as well as expected updates resulting from an upgrade to newer versions of ATLAS BLAS.
 The following is a quick comparison between Maple 8 and Maple 9 of the run time for multiplication of two 1000x1000 random float[8] matrices:

 Platform/Arch Maple 8 Maple 9 Linux/PII 8.45 7.46 Linux/PIII 4.38 4.35 Linux/P4 1.96 0.99 Linux/Athlon 3.62 2.55 Linux/Athlon XP 1.77 0.98 Sun/Sparc 45.74 42.28 Sun/Ultra 5.11 3.14

sparse numeric rtables

 • When using storage=sparse, and a hardware datatype with an Array, Matrix, or Vector, the NAG sparse format is used.  The internal data structure is the same as in previous versions of Maple, except that the index vectors are now maintained in sorted and unsorted blocks.  A resort is automatically triggered after the insertion of kernelopts (sparse_sort_cutoff) new elements.
 The following example, which took 10.200 seconds to initialize a 200 x 200 corner of a sparse Matrix in Maple 8, now takes under 0.2 seconds on the same computer in Maple 9.

Examples

 > M := Matrix(10000,10000,storage=sparse,datatype=float[8]):
 > t:=time():
 > for i from 1 to 200 do   for j from 1 to 200 do     M[i,j]:=i+j:   end do: end do: time()-t;
 ${0.147}$ (7)

combinat[randperm]

 • The procedure combinat[randperm] has been optimized. It is now 10 to 20 percent faster, and uses nearly an order of magnitude less memory.

expand

 • Expansion of large powers of some kinds of expressions is now faster. For example, the following computation is much faster in Maple 9 than in Maple 8.
 > time( expand( ((sqrt(5)-1)/2)^10000 ) );
 ${0.010}$ (8)