Wednesday, 31 July 2013

Slow progress

This morning I implemented the new matrix expression evaluation code which operates without temporary merging. This was done without particular difficulty, and works as expected. I did not implement the full class fmpz_matxx for now, though. Instead, I will brush up the classes we have so far with respect to the things which I have learned since I started:

  • lazy functions with more than two arguments
  • lazy functions with more than one return value
  • more member functions, and in general the changes in naming conventions discussed on the list
  • generic coefficient access
  • static assert

The one thing I have worked on today is lazy functions with more than one return value, implemented using a lazy tuple class. This is a bit tricky (as expected), but I think I have a working design and implementation now. It allows writing things such as

fmpzxx a;
int b;
ltupleref(a, b) = some_lazy_function(17) 
tassert(some_lazy_function(17) == ltuple(a, b))

Monday, 29 July 2013

A bit of a detour

This morning Bill suggested that create_doc.c should use a parser generator like peg/leg. Since I had criticized the code somewhat harshly before, I felt I should undertake the trivial exercise of switching it over to an automatically generated parser.

Long story short, while this really wasn't hard, it was also not completely straightforward (just a bunch of "soft" difficulties related to writing a working grammar and stuff like that). I spent much more time on this than I probably should have, but there is a working version in my gsoc-doc branch now.

After that I thought about matrix temporary allocation jumbles. I think Bill's solution (allocating temporaries of sufficient size to fit every intermediate result, and using cheap resizing before each assignment) will work. I started the fmpz_matxx class, but haven't really gotten anything done yet. But everything is ready to go tomorrow.

Thursday, 25 July 2013

More polys

I "finished" fmpz_polyxx (except for newton bases, hensel lifting, input, modular reduction, and addmul), and then started fmpq_polyxx. This is simply a laborious process, not much to tell.

Wednesday, 24 July 2013

polys polys polys

I spent all day wrapping fmpz_poly. And I'm not even close to finished. I have wrapped the functions from all sections up to and including 13.25 (evaluation), and also 13.28 (composition).

The most interesting thing I implemented today (on other days this would hardly be worth mentioning) is to make the following work:

    fmpz_polyxx p, q;
    p = "3  1 0 1";
    q = "3  0 0 1";

    tassert(p(fmpzxx(2)) == 5);
    tassert(p(q).to_string() == "5  1 0 0 0 1");

You see, the operator() is overloaded to do either composition or evaluation, just like in real life. Isn't this exciting? ...

Tuesday, 23 July 2013

More easy stuff

Today I kept working on writing wrappers. I begun by extending/fixing create_doc.c (the program we use to parse documentation files) to allow functions with empty arguments, and with a const qualifier on the right side. While at it I fixed all the parsing exceptions in the flint sources. This revealed that I had forgotten a few functions for fmpzxx; I have now added all of them which are not pertaining to bitpacking or chinese remaindering.

After that I started the fmpz_polyxx class; nothing much there yet.

Monday, 22 July 2013

Things are taking shape.

I "finished" fmpqxx today, and also padicxx except for documentation.

Don't expect more exciting updates any time soon. There are quite a number of data types in FLINT...

Friday, 19 July 2013

Some comments on error codes and their improvements

I spent the morning (well, quite a bit more than that by now) looking through the error message reports I generated yesterday, and seeing how to improve them. I think I have made some progress without being too intrusive. For now the work is in my branch gsoc-errors. One interesting thing to note is that this actually made many error messages longer (I'll comment on this below).

Here is a rundown of what typical error messages look like. I'll concentrate on gcc, since I assume this is the primary target. One typical problem is trying to construct an object from the wrong arguments. Here is an excerpt from the error message report:

    const fmpzxx a;
    fmpzxx_ref ar(a);
Compiler error output is 37 line(s), 8535 characters
------START COMPILER ERROR OUTPUT------- In function ‘int main()’: error: no matching function for call to ‘flint::fmpzxx_expression<flint::operations::immediate, flint::flint_classes::ref_data<flint::fmpzxx_expression<flint::operations::immediate, flint::detail::fmpz_data>, long int> >::fmpzxx_expression(const fmpzxx&)’ note: candidates are:
In file included from
/home/ness/src/flint/fmpzxx.h:51:5: note: template<class T, class U> flint::fmpzxx_expression::fmpzxx_expression(const T&, const U&, typename flint::mp::enable_if<flint::fmpzxx_expression<Operation, Data>::enablec2<T, U> >::type*)
/home/ness/src/flint/fmpzxx.h:43:7: note:   no known conversion for argument 1 from ‘const fmpzxx {aka const flint::fmpzxx_expression<flint::operations::immediate, flint::detail::fmpz_data>}’ to ‘const flint::fmpzxx_expression<flint::operations::immediate, flint::flint_classes::ref_data<flint::fmpzxx_expression<flint::operations::immediate, flint::detail::fmpz_data>, long int> >&’

The "...." denotes a lot more similar messages I have skipped. This is one of the error messages which got longer, but much more descriptive. Basically what is happening is that none of the constructors is applicable (as it states in the second line!), and then the compiler lists all five constructors (which is not particularly helpful in our case, but I suppose usually a good strategy). In the old error message one of the constructors was invoked, which then failed way down in the implementation details.

One thing to observe is that printing out all the type names in full is really distracting. Maybe I should write a sed script to shorten them.

An example of similar nature is trying to assign to something for which there simply is no rule to do the assignment (e.g. to a constant reference). Here is an excerpt:

    fmpzxx a;
    fmpzxx_srcref b(a);
    b = a;
Compiler error output is 7 line(s), 3014 characters
In file included from /home/ness/src/flint/fmpzxx.h:32:0,
/home/ness/src/flint/flintxx/expression.h: In instantiation .......
/home/ness/src/flint/fmpzxx.h:50:5:   required from ..........   required from here
/home/ness/src/flint/flintxx/expression.h:255:1: error: invalid application of ‘sizeof’ to incomplete type ‘flint::detail::STATIC_ASSERT_FAILED<false>’
/home/ness/src/flint/flintxx/expression.h:256:9: error: ‘doit’ is not a member of ‘rule_t {aka flint::rules::assignment<flint::fmpzxx_expression<flint::operations::immediate, flint::flint_classes::srcref_data<flint::fmpzxx_expression<flint::operations::immediate, flint::detail::fmpz_data>, flint::fmpzxx_expression<flint::operations::immediate, flint::flint_classes::ref_data<flint::fmpzxx_expression<flint::operations::immediate, flint::detail::fmpz_data>, long int> >, long int> >, flint::fmpzxx_expression<flint::operations::immediate, flint::detail::fmpz_data>, void>}’

This time the "........." denote really really really long lines I chopped off. We see a simulated STATIC_ASSERT here (and this line is the only thing new compared with yesterday's results). It's not as great as one might like, but at least the STATIC_ASSERT assures us that the following errors are most likely garbage. There is also a message associated with the STATIC_ASSERT, which can be found by looking in the source. Alternatively, when compiling in C++11 mode, the error message is displayed directly:

/home/ness/src/flint/flintxx/expression.h:254:9: error: static assertion failed: no rule to assign expression T to derived_t (after evaluating)

As a final example, consider trying to do unimplemented arithmetic operations. Here is an excerpt:

    fmpzxx a;
    newtype n;
    a + (a*a + (a / n) + a)*a;
Compiler error output is 10 line(s), 1387 characters
------START COMPILER ERROR OUTPUT------- In function ‘int main()’: error: no match for ‘operator/’ in ‘a / n’ note: candidate is:
In file included from /home/ness/src/flint/fmpzxx.h:32:0,
/home/ness/src/flint/flintxx/expression.h:490:1: note: template<class Expr1, class Expr2> typename flint::detail::binary_op_helper<Expr1, flint::operations::divided_by, Expr2>::enable::type flint::operator/(const Expr1&, const Expr2&)
/home/ness/src/flint/flintxx/expression.h:490:1: note:   template argument deduction/substitution failed:
/home/ness/src/flint/flintxx/expression.h: In substitution of ‘template<class Expr1, class Expr2> typename flint::detail::binary_op_helper<Expr1, flint::operations::divided_by, Expr2>::enable::type flint::operator/(const Expr1&, const Expr2&) [with Expr1 = flint::fmpzxx_expression<flint::operations::immediate, flint::detail::fmpz_data>; Expr2 = newtype]’:   required from here
/home/ness/src/flint/flintxx/expression.h:490:1: error: no type named ‘type’ in ‘flint::detail::binary_op_helper<flint::fmpzxx_expression<flint::operations::immediate, flint::detail::fmpz_data>, flint::operations::divided_by, newtype>::enable {aka struct flint::mp::enable_if<flint::traits::is_implemented<flint::rules::UNIMPLEMENTED>, int>}’

This is unchanged compared to yesterday, because it is actually pretty good.

Thursday, 18 July 2013

Documentation and similar non-code

This morning I went on documenting, this time how to extend flintxx. I felt that the easiest way to get the basics across (to someone who already knows how to use filntxx, which admittedly the small amount of documentation I wrote yesterday is probably not going to achieve) would be by a heavily-commented toy example. So I wrote an example file fooxx.cpp, which is an amalgamation of a fake C interface, part of a C++ wrapper for it, and an example program. Then I added some explaining remarks and a general overview to the appendix of flint-manual.tex. This effort is far from complete, but it should give anyone who is eager to try out flintxx an idea how to proceed.

Then I gave some further thought to static_assert. I didn't really reach any conclusions, but I decided that at the very least we need a method to run the compiler against a set of test cases (which are known to fail) and record the output. I wrote a small (probably not terribly representative) set of test cases, and a script to do the execution. Here are the results for g++-4.7 and clang++-3.4. I'll not comment on them for now.

Wednesday, 17 July 2013

Completing fmpzxx

During the  first half of today I completed the fmpzxx.h wrapper header. This means I added wrapper functions for all (documented) fmpz functions. There are still a few shortcomings (marked TODO in the code), but fmpzxx is now basically feature complete.

The rest of the day I documented the whole thing.

ps: There was no blog post yesterday because I could not log into blogger.

Monday, 15 July 2013

Some boring refactoring

Today I spent refactoring fmpzxx.h and fmpqxx.h. In particular, any code in fmpzxx.h which was not directly pertaining to the fmpz wrapper class was abstracted and moved to a new helper file flint_classes.h. (This is not entirely true, I still have to complete this process for ternary expression evaluation.) This made fmpzxx.h quite a bit cleaner (imho) and shorter (but at the expense of creating flint_classes.h, obviously). Then I applied the same changes to fmpqxx.h. Of course the point being that this can now reuse flint_classes.h.

I will continue the same process tomorrow.

Thursday, 11 July 2013

Some honest progress

Today was a fun day, no grave problems. I wrote a wrapper for fmpq, and extended the wrapper for padic I started yesterday. A few minor things came up which I will talk about in my email tomorrow, and a few bugs where uncovered, but all in all things went smoothly. The two wrappers are in a state similar to fmpzxx: all arithmetic and some functions implemented, some details still to do (recall that this is still part of my "case study").

Wednesday, 10 July 2013

Finally, reference types

This morning I coded up another implementation of reference types ("view classes"). And all tests (including codegen) still pass. Yay.

So what has been going on the last two days? Let's first recap the problem statement. In addition to the class fmpzxx, we also want classes fmpzxx_ref and fmpzxx_cref, which we would like to behave essentially like fmpzxx& and const fmpzxx&, except that it should be possible to instantiate them from the underlying C type fmpz*. (NB: you might think that one can use const fmpzxx_ref in place of fmpzxx_cref. But to do so, one has to disable copying of the reference types (i.e. always pass them by reference), and that is not desirable in the applications we have in mind.)

The main problem we have with this is that we do not want to have to write rules for all combinations of argument types (reference, const-reference, non-reference). The natural idea to deal with this is to replace all use of (say) const fmzxx& by fmpzxx_cref, and somehow have the core library deal with this (e.g. using implicit conversion constructors, and translation types, as outlined in my friday email). This takes some work, but is doable, and the result is fairly nice (imho). I have saved the work in my gsoc-views branch.

Unfortunately, this messes with the optimizer. Apparently, the increased number of temporary objects and indirections push the compiler beyond its limits. The result is code like this - which is hilariously bad, and also completely unacceptable. This sort of thing - misestimated register pressure causing the introduction of unnecessary spill slot and then blocking scalar to aggregate optimization - seems to be observable across all versions of gcc, and also clang.

This means two things. Secondly, at some later point in time, I should revisit some of the design decisions of the core library which rely on the optimizer. (I am fairly confident that this will be doable without having to change the concrete classes.) But firstly, I had to abandon Monday's work and start afresh. We need a system in which reference-objects and non-references can be mixed, without converting references to non-reference objects to reference-objects. This can be done (using a kind of duck typing approach), and in fact is what I did this morning. I don't think it's quite as pretty, but still good enough.

Tuesday, 9 July 2013

Still no luck

This morning I finished the transition which enables view classes for fmpzxx. Unfortunately, as it turns out, this pushes the compiler too far. The optimizer produces correct but incredibly crappy code. (Interestingly enough, the optimizers in various versions of gcc all seem to optimize some functions well, and fail hilariously at others. No version can do all my test, but every test can be done by some version. What sort of rubbish is that??) After some quick discussion on the gcc-help mailing list, it seems like I may have placed too much confidence in the capability of compiler optimizers.

I have now come up with a third strategy, which should allow the compiler to keep producing good code as long as no "views" are involved, while still retaining all the features when views are involved.

Hopefully I can explain details tomorrow, after implementing the new idea. Fingers crossed...

Monday, 8 July 2013

Quick update

Since I didn't send out any emails today, here is a quick update. I have been working on implementing "view classes", aka references to fmpz without underlying instance of fmpzxx. I said a few things about this in my email on Friday; it turns out a few more things should be said. Anyway, I didn't manage to complete what I wanted to do today; I will post a fuller update tomorrow.

Tuesday, 2 July 2013

Ternary expressions, take 2

I spent the morning finishing the ternary operation support for fmpz. Well, there is some testing still to implement, but things look good. As anticipated, the required additional code is not much - the whole ternary expression code is now 600 LOC. (On the other hand the whole wrapper library is 1800 LOC so far, so this is a quite significant amount.)

I also extended the code in t-mpz.cpp to take care of all the new cases. By then the memory usage and compile time in gcc had become ridiculous (resident set > 4GB). This wasn't particularly unexpected, since the relevant test code size tripled, and apparently gcc does not seem to discard certain data structures, yielding to linear growth. I improved the situation a bit by wrapping the test lines into inner structs, apparently that tricks gcc into giving up a bit of its data, and reduced the resident set size to about 1.5GB again (and compile time to about 1 minute).

Just now I did what I was most wary about - I tested the code generation. Consider the following function:

void test1(mpz& out, const mpz& a, const mpz& b, const mpz& c, const mpz& d)
    out = a + ((a+b) + ((c+a) + (a+d))*d);

This is a typical case where a ternary operation can be used (it is of the form out = a + (x + y*z), and the term on the right goes into a temporary, so we can use addmul). It is not the most complicated possible case (note that z in the above expression corresponds to d, an immediate), but still reasonably representative (note that evaluating x requires one temporary, y two, and z none).

The most efficient hand-coded version of this I could come up with is the following:

void test2(mpz& out, const mpz& a, const mpz& b, const mpz& c, const mpz& d)
    fmpz_t tmp1, tmp2;

    fmpz_add(tmp1, c._data(), a._data());
    fmpz_add(tmp2, a._data(), d._data());
    fmpz_add(tmp1, tmp1, tmp2);
    fmpz_add(tmp2, a._data(), b._data());
    fmpz_addmul(tmp2, tmp1, d._data());
    fmpz_add(out._data(), a._data(), tmp2);


The first function compiles to this, whereas the second one compiles to this. The results are quite similar to my earlier investigations. On the negative side, the first function is compiled to asm which
  • is 10% longer
  • uses 10% more storage instructions
  • uses 1 more register
On the positive side, both functions use the same number of call statements, conditional and unconditional jumps.

Monday, 1 July 2013

ternary operations, take1

Today my plan was to implement use of ternary flint operations, i.e. addmul and submul. However, I must confess I severely underestimated the complexity of this. To get an idea of what is going on, suppose we want to evaluate the expression a + b*c into a temporary. The idea is that, if a is itself an expression template, and assuming for simplicity that both b and c are ordinary fmpzs, we first evaluate a into the temporary (henceforth tmp) and then call addmul(tmp, b, c).

The devil hides in "for simplicity". Namely, if b and/or c are allowed to be expression templates as well, things get difficult. The optimal order of evaluation depends on the number of temporaries required for evaluating each of a, b, and c (henceforth t1, t2 and t3). Moreover, the total number of temporaries required depends not only on the maximum of t1, t2 and t3, but also on the number of coincidences among these. Additionally, if b or c is an immediate, then evaluation still has to proceed differently, because in this case we cannot determine where the evaluation ends up (because we don't evaluate anything at all). A final complication arises because even for purposes of evaluating a, b, and c, they are not on equal footing. Namely, we need a to end up in tmp, whereas b and c need to end up in newly allocated temporaries.

I now have a first implementation of addmul on github. It takes up about 550 lines of code. The code takes care of all the above difficulties; adding some missing bits (handling a*b + c, and submul) should not significantly increase the code size. Neither should making this a bit more generic (since it is currently tied to fmpz). I am making one simplifying assumption, though: that the expression we are evaluating is homogeneous. By this I mean that it only involves one data type (fmpz in this case). This way instead of merging temporaries, we just need to count them.

Tomorrow I will finish up this code. Missing features I have indicated above; beyond that we need a lot more testing. I am already testing all the different scenarios alluded to in the second paragraph. Additionally, I will need to test the new functionality, and some more complex expressions. But most importantly, we need more codegen tests. This expression template evaluation code is highly complex, and involves a lot of "nominally runtime" code which has to be optimized away; I have to verify that.

One interesting observation is that this code also seems to stress test the compiler quite a bit. Compiling t-mpz.cpp using gcc requires about 2GB of memory, and takes on the order of 2 minutes on my machine (intel Core 2 Duo P8700 @ 2.53GHz, 4 GB main memory -- most of the time is probably spent allocating memory). In comparison, clang requires about 400MB and 10 seconds.