out = ((a + b) + (c + d)) + ((a + c) + (b + d))
Currently, this should be turned into the C equivalent of
fmpz_t tmp1, tmp2, tmp3, tmp4, tmp5, tmp6;
fmpz_init(tmp1);
fmpz_init(tmp2);
fmpz_init(tmp3);
fmpz_init(tmp4);
fmpz_init(tmp5);
fmpz_init(tmp6);
fmpz_add(tmp1, a._data(), b._data());
fmpz_add(tmp2, c._data(), d._data());
fmpz_add(tmp3, tmp1, tmp2);
fmpz_add(tmp4, a._data(), c._data());
fmpz_add(tmp5, b._data(), d._data());
fmpz_add(tmp6, tmp1, tmp2);
fmpz_add(out._data(), tmp3, tmp6);
fmpz_clear(tmp1);
fmpz_clear(tmp2);
fmpz_clear(tmp3);
fmpz_clear(tmp4);
fmpz_clear(tmp5);
fmpz_clear(tmp6);
fmpz_init(tmp1);
fmpz_init(tmp2);
fmpz_init(tmp3);
fmpz_init(tmp4);
fmpz_init(tmp5);
fmpz_init(tmp6);
fmpz_add(tmp1, a._data(), b._data());
fmpz_add(tmp2, c._data(), d._data());
fmpz_add(tmp3, tmp1, tmp2);
fmpz_add(tmp4, a._data(), c._data());
fmpz_add(tmp5, b._data(), d._data());
fmpz_add(tmp6, tmp1, tmp2);
fmpz_add(out._data(), tmp3, tmp6);
fmpz_clear(tmp1);
fmpz_clear(tmp2);
fmpz_clear(tmp3);
fmpz_clear(tmp4);
fmpz_clear(tmp5);
fmpz_clear(tmp6);
(This is of course a rather inefficient way of doing it, but that's besides the point. The question is whether the compiler really manages to turn the first line into the second snippet.)
When compiling without exception support (which clutters things up with stack unwinding code and makes the comparison harder), on my machine the first snippet is turned into this. For comparison, the handwritten code turns into this. The key differences are as follows:
- the hand-written code is 10% shorter
- the hand-written code uses one register less
I'm a bit wary about the increase in register usage and code length, but I think I will press on for now.
No comments:
Post a Comment