Tuesday, July 3, 2012

Comparison of JIT with Oct files

In my last post I tested the Octave JIT compiler on a problem presented on the mailing list. I got a request for a comparison with oct files. I think this is an interesting comparison, because ideally the JIT compiler should reduce the need to rewrite Octave scripts as oct files.

Oct file

The oct file is mostly equivalent to the loopy version in my previous post.
#include <octave/oct.h>
#include <octave/parse.h>

DEFUN_DLD (oct_loopy, args, , "TODO")
{
  feval ("tic");

  octave_value ret;
  int nargin = args.length ();
  if (nargin != 2)
    print_usage ();
  else
    {
      NDArray data = args(0).array_value ();
      octave_idx_type nconsec;
      nconsec = static_cast<octave_idx_type> (args(1).double_value ());

      if (!error_state)
        {
          double *vec = data.fortran_vec ();
          octave_idx_type counter = 0;
          octave_idx_type n = data.nelem ();
          for (octave_idx_type i = 0; i < n; ++i)
            {
              if (vec[i])
                ++counter;
              else
                {
                  if (counter > 0 && counter < nconsec)
                    std::fill (vec + i - counter, vec + i, 0);

                  counter = 0;
                }
            }

          if (counter > 0 && counter < nconsec)
            std::fill (vec + n - counter, vec + n, 0);

          ret = octave_value (data);
        }
    }

  feval ("toc");
  return ret;
}

Results

I ran each test five times, taking the lowest time. I have also separated out the compile/link time from the run time. For JIT, compile time was determined by running the function twice, and subtracting the first run time from the second run time. The compile time for the oct file was determined by timing mkoctfile. The initial parameters were a random vector, A, of size 1,000,000 and a K = 3.

Compile time Run time
JIT 14ms 21ms
OCT 2400ms 3.3ms

When using JIT, the compile time is part of the run time for the first execution of the loop. This means that for this example, JIT is currently about 10 times slower than the oct file. However, if we were to execute the function 50 times on 1,000,000 element vectors, then JIT would be 6 times slower.

After looking at the assembly, it looks like JIT runs into issues with checks for matrix index validity and that loop variables are doubles (in loops like `for ii=1:5' ii is a double). It should be possible to fix these issues in JIT, but it will result in a larger compile time.

2 comments:

  1. "It should be possible to fix these issues in JIT, but it will result in a larger compile time."

    Can you share any thoughts on the trade-off between JIT compile time and the performance of the generated code?

    Will there be user-accessible configuration options for the JIT compiler? If so, maybe include an option for the user to pick a compiler optimization level?

    Thanks!

    ReplyDelete
    Replies
    1. For now, I'm not worrying too much about the trade-off. I would rather spend my time compiling more of the language.

      I think giving the user control over the optimization level is a good idea, but we need multiple optimization levels first :)

      Delete