Monday, November 5, 2012

JIT, Debugging, and Interrupts

I finally found some time to work on Octave last weekend. There has been some talk on the mailing list recently about releasing a new version of Octave, so I figured I should clean up a few loose ends in JIT.

Breakpoints

Up until now JIT has been skipping breakpoints. While there are several issues with supporting breakpoints in JIT, the biggest one is that the Octave debugger allows for the execution of arbitrary statements. Arbitrary statement execution is a very powerful and useful feature of the Octave debugger, but it allows users to change the type of a variable in the middle of code execution.

JIT gets its performance improvement by making assumptions about variable types, so entering debug mode means we need to exit JIT. I took the simple way out here and do not start JIT if there are any breakpoints in the code (see hg).

Interrupts

In Octave if you hit control-c Octave will stop execution and return to the Octave prompt (unless debug on interrupt is set). The interpreter does this by calling octave_quit, which checks the interrupt state and throws an octave_interrupt_exception if an interrupt has occured.

Ideally, to support interrupts in JIT a call to octave_quit should be inserted once per loop iteration. Unfortunately, it is not that simple. After an interrupt occurs the current variable values need to be reported to the interpreter. For example,
i = 0;
while 1
  i += 1;
endwhile
If the user interrupts the loop the interpreter needs to save the current value of i. This means JIT needs a way to catch and rethrow the octave_interrupt_exception. While LLVM does have a way of handling exceptions, the infrastructure in Octave does not yet exist to support the LLVM feature.

Instead, I inserted a check to octave_interrupt_state. If octave_interrupt_state is greater than 0, we need to exit to the Octave prompt. I reused the code for checking error_state to achieve this.

Now that JIT handles interrupts and breakpoints in a manner consistent with the interpreter, I can't think of ways in which JIT and the interpreter differ (besides speed). The amount of code which JIT can compile is still fairly limited. Hopefully, I will get some time over winter break to make it easier to extend JIT and improve what JIT can compile. In its current state JIT should be ready to include as an experimental feature in the next Octave release.

Saturday, August 18, 2012

Tuesday, August 7, 2012

Multidimensional indexing and end

This week I added support for multidimensional matrix indexing and using end in jit. Support for the end keyword is interesting. Take the following for example,
y = A(1, sin (end));
From that line alone, it is not clear what end refers to. If sin is a matrix, then end will be the end of sin. If sin is a function, then end will be the end of A.

I have solved this problem by keeping track of the full context information for each end. Lets take a look at a slightly more complicated example
y = A(1, 2, B(sin (end), 5));
then during type inference our context might look something like this
type     identifier index count
function sin        0     1
matrix   B          0     2
matrix   A          2     4
In this context end then refers to the end of matrix B at index 0.

Friday, July 27, 2012

OctConf Reflection and Project Plan

I had a great time at OctConf 2012. There were a lot of interesting people there, and it is nice to be able to put faces to names. I definitively hope I will be able to make it next year.

I recently realized that there are only two weeks left before the GSoC suggested pencils down date (8/13/2012). In the remaining time I plan on focusing my effort on better matrix support and supporting more builtin functions. I should be able to make significant progress on this in the remaining two weeks.

After GSoC is done, I plan on working on compiling user functions and function handles. I think adding support for user functions in JIT is important, but I'm not sure if I will be able to complete it in two weeks.

Tuesday, July 3, 2012

Comparison of JIT with Oct files

In my last post I tested the Octave JIT compiler on a problem presented on the mailing list. I got a request for a comparison with oct files. I think this is an interesting comparison, because ideally the JIT compiler should reduce the need to rewrite Octave scripts as oct files.

Oct file

The oct file is mostly equivalent to the loopy version in my previous post.
#include <octave/oct.h>
#include <octave/parse.h>

DEFUN_DLD (oct_loopy, args, , "TODO")
{
  feval ("tic");

  octave_value ret;
  int nargin = args.length ();
  if (nargin != 2)
    print_usage ();
  else
    {
      NDArray data = args(0).array_value ();
      octave_idx_type nconsec;
      nconsec = static_cast<octave_idx_type> (args(1).double_value ());

      if (!error_state)
        {
          double *vec = data.fortran_vec ();
          octave_idx_type counter = 0;
          octave_idx_type n = data.nelem ();
          for (octave_idx_type i = 0; i < n; ++i)
            {
              if (vec[i])
                ++counter;
              else
                {
                  if (counter > 0 && counter < nconsec)
                    std::fill (vec + i - counter, vec + i, 0);

                  counter = 0;
                }
            }

          if (counter > 0 && counter < nconsec)
            std::fill (vec + n - counter, vec + n, 0);

          ret = octave_value (data);
        }
    }

  feval ("toc");
  return ret;
}

Results

I ran each test five times, taking the lowest time. I have also separated out the compile/link time from the run time. For JIT, compile time was determined by running the function twice, and subtracting the first run time from the second run time. The compile time for the oct file was determined by timing mkoctfile. The initial parameters were a random vector, A, of size 1,000,000 and a K = 3.

Compile time Run time
JIT 14ms 21ms
OCT 2400ms 3.3ms

When using JIT, the compile time is part of the run time for the first execution of the loop. This means that for this example, JIT is currently about 10 times slower than the oct file. However, if we were to execute the function 50 times on 1,000,000 element vectors, then JIT would be 6 times slower.

After looking at the assembly, it looks like JIT runs into issues with checks for matrix index validity and that loop variables are doubles (in loops like `for ii=1:5' ii is a double). It should be possible to fix these issues in JIT, but it will result in a larger compile time.

Thursday, June 28, 2012

A Realistic Test

There was an interesting post on the mailing list (https://mailman.cae.wisc.edu/pipermail/help-octave/2012-June/052642.html). The problem is, given some logical array, A = [1 0 0 1 0 0 1 1 1 ...], and minimum length of consecutive ones, K, all sequences of ones less than K should be filtered out.

The exciting part for me is that the JIT compiler is currently far enough along to compile the loopy solution.

Input generation

I used a simple script to generate some random input data. A double matrix is used, because JIT does not yet work for logical matrices.
function result = gen_test (n)
  result = double (rand (1, n) > .01);
endfunction

Vectorized

Vectorized code (based off of code from the mailing list)
function z = vectorized (A, K)
  tic;
  temp = ones (1, K);
  z = conv (A, temp);
  z = z > K-1;
  z = conv (z, temp);
  z = z(K:end-K+1);
  z = z >= 1;
  toc;
endfunction

Loopy

I didn't do anything fancy here.
function z = loopy (A, K)
  tic;
  z = A;
  n = numel (A);
  counter = 0;
  for ii=1:n
    if z(ii)
      counter = counter + 1;
    else
      if counter > 0 && counter < K
        z(ii-counter:ii-1) = 0;
      endif
      counter = 0;
    endif
  endfor

  if counter > 0 && counter < K
    z(end-counter+1:end) = 0;
  endif
  toc;
endfunction

Results

These numbers were taken from a AMD FX(tm)-4100 Quad-Core Processor with 8 GB of RAM. I just ran each test once. For each test, the number of elements in A was 1,000,000.

Vectorized Loopy JIT Loopy JIT (no overhead) Loopy (No JIT)
K=3 0.078s 0.059s 0.028s 5.27s
K=100 0.43s 0.063s 0.028s 5.66s
K=500 1.58s 0.082s 0.033s 5.73s

These results are expected. The efficiency of the vectorized approach depends on K, while the loopy version does not. While JIT support is not complete or stable yet1, I think this shows that the current JIT implementation is able to handle a few practical examples, not just interpreter benchmarks.

hg id for regular octave branch: 52cb71787cd1
hg id for jit branch: f649b66ef1af

1 Occasionally I break functionality like assignment, addition, and function calls.

Monday, June 18, 2012

Matrix Support

The Octave Language

An interesting feature of the Octave language is that matrices are treated as values, not references. For example, the following code
A = [1 2 3];
B = A;
A(1) = 100;
disp (B(1));
will display the value 1, not 100. Matrices are also passed by value in function calls.

Interpreter Implementation

Octave currently uses a really cool system to minimize the number of times matrices. It combines Copy on Write (COW) with reference counting to reduce the number of copies.

The idea is to delay copies until a matrix is mutated. Then the copy is only made if the reference count is greater than one. Take the previous example,
A = [1 2 3];
B = A; # No copy
A(1) = 100; # Copy, as both A and B refer to [1 2 3]
This algorithm has two nice properties. First, it is simple. Second, copies are only made when required.

JIT

I plan on using the same algorithm in JIT. This decision means that every assignment of the form
A(idx) = x; # Where idx and x are scalar
requires a check to see if a copy is required. This actually is not such a big issue, because we already require a check to ensure the index is legal. In order to speed up the normal case, we can directly inline the condition check.

Which leads us to the pseudo code for the inlined function.
if (isint (idx) && idx >= 1 && i < nelem (A) && count (A) == 1)
  A(idx) = x;
else
  A = handle_odd_assign (A, idx, x);
Hopefully, this will allow the normal case to be quick, but still provide a correct implementation for more complicated cases.