Thursday, 20 June 2013

A detour into assembly

So as I mentioned last time, today I decided to take a look at the assembly code my prototype is compiled into, to verify that the compiler does the necessary optimizations. The example I looked at is

    out = ((a + b) + (c + d)) + ((a + c) + (b + d))

Currently, this should be turned into the C equivalent of

    fmpz_t tmp1, tmp2, tmp3, tmp4, tmp5, tmp6;
    fmpz_init(tmp1);
    fmpz_init(tmp2);
    fmpz_init(tmp3);
    fmpz_init(tmp4);
    fmpz_init(tmp5);
    fmpz_init(tmp6);

    fmpz_add(tmp1, a._data(), b._data());
    fmpz_add(tmp2, c._data(), d._data());
    fmpz_add(tmp3, tmp1, tmp2);
    fmpz_add(tmp4, a._data(), c._data());
    fmpz_add(tmp5, b._data(), d._data());
    fmpz_add(tmp6, tmp1, tmp2);
    fmpz_add(out._data(), tmp3, tmp6);

    fmpz_clear(tmp1);
    fmpz_clear(tmp2);
    fmpz_clear(tmp3);
    fmpz_clear(tmp4);
    fmpz_clear(tmp5);
    fmpz_clear(tmp6);

(This is of course a rather inefficient way of doing it, but that's besides the point. The question is whether the compiler really manages to turn the first line into the second snippet.)

When compiling without exception support (which clutters things up with stack unwinding code and makes the comparison harder), on my machine the first snippet is turned into this. For comparison, the handwritten code turns into this. The key differences are as follows:

  • the hand-written code is 10% shorter
  • the hand-written code uses one register less
On the other hand, both use exactly the same number of call statements. This is good, because it indicates that everything is inlined as expected.

I'm a bit wary about the increase in register usage and code length, but I think I will press on for now.

No comments:

Post a Comment