VMMA Documentation
John Strawn
18 September 1987
Here are the basic steps to be performed:
move x:(r1),x0
move y:(r5),y0
move x:(r0),x1
move y:(r4),y1
mpy x0,y0,a
macr x1,y1,a
move a,x:(r6)
That produces two instructions plus five moves, which means that
at least three operations will be required for one output
element (and this is the solution used in the code):
; loop setup
move x:(r1),x0 y:(r5),y0
mpy x0,y0,a x:(r0),x1 y:(r4),y1
macr x1,y1,a x:(r1),x0 y:(r5),y0
move x:(r0),x1 y:(r4),y1
; inner loop
mpy x0,y0,a a,x:(r6)
macr x1,y1,a x:(r0),x1 y:(r4),y1
move x:(r1),x0 y:(r5),y0
For the sake of completeness, here are some alternatives.
Doubling up to use two accumulators will lose, because writing
out the results with (R_O) will always write to the same side of
memory. Here is a best-case example. By listing the operations to
be done, it becomes obvious that there are 6 x moves and 4 y
moves, which will require 6 instructions minimum. So no savings
is possible by doubling up the accumulators.
move x:(r1),x0
move y:(r5),y0
move x:(r0),x1
move y:(r4),y1
mpy x0,y0,a
macr x1,y1,a
move a,x:(r6)
move x:(r1),x0
move y:(r5),y0
move x:(r0),x1
move y:(r4),y1
mpy x0,y0,b
macr x1,y1,b
move b,x:(r6)
An alternative might be to use two accumulators with an explicit
round. The hope would be that the accumulators could be doubled
up to save execution time:
move x:(r1),x0
move y:(r5),y0
move x:(r0),x1
move y:(r4),y1
mpy x0,y0,a
mpy x1,y1,b
add a,b
rnd b
move b,x:(r6)
But since this results in four explicit operations per element,
no savings is possible. Yet another possiblity might be to forego
Motorola's nifty rounding algorithm and add in a rounding
constant; but then *that* constant would have to come from
somewhere, and no registers are left for storing constants, and
moving a constant from memory would in most cases add yet another
instruction. So this is a bad idea.
WARNING: This macro ends with
move M_X,R_L
Therefore, the next instruction after the end of this macro
should not use the R_L register.