VMMA Documentation John Strawn 18 September 1987 Here are the basic steps to be performed: move x:(r1),x0 move y:(r5),y0 move x:(r0),x1 move y:(r4),y1 mpy x0,y0,a macr x1,y1,a move a,x:(r6) That produces two instructions plus five moves, which means that at least three operations will be required for one output element (and this is the solution used in the code): ; loop setup move x:(r1),x0 y:(r5),y0 mpy x0,y0,a x:(r0),x1 y:(r4),y1 macr x1,y1,a x:(r1),x0 y:(r5),y0 move x:(r0),x1 y:(r4),y1 ; inner loop mpy x0,y0,a a,x:(r6) macr x1,y1,a x:(r0),x1 y:(r4),y1 move x:(r1),x0 y:(r5),y0 For the sake of completeness, here are some alternatives. Doubling up to use two accumulators will lose, because writing out the results with (R_O) will always write to the same side of memory. Here is a best-case example. By listing the operations to be done, it becomes obvious that there are 6 x moves and 4 y moves, which will require 6 instructions minimum. So no savings is possible by doubling up the accumulators. move x:(r1),x0 move y:(r5),y0 move x:(r0),x1 move y:(r4),y1 mpy x0,y0,a macr x1,y1,a move a,x:(r6) move x:(r1),x0 move y:(r5),y0 move x:(r0),x1 move y:(r4),y1 mpy x0,y0,b macr x1,y1,b move b,x:(r6) An alternative might be to use two accumulators with an explicit round. The hope would be that the accumulators could be doubled up to save execution time: move x:(r1),x0 move y:(r5),y0 move x:(r0),x1 move y:(r4),y1 mpy x0,y0,a mpy x1,y1,b add a,b rnd b move b,x:(r6) But since this results in four explicit operations per element, no savings is possible. Yet another possiblity might be to forego Motorola's nifty rounding algorithm and add in a rounding constant; but then *that* constant would have to come from somewhere, and no registers are left for storing constants, and moving a constant from memory would in most cases add yet another instruction. So this is a bad idea. WARNING: This macro ends with move M_X,R_L Therefore, the next instruction after the end of this macro should not use the R_L register.