Vmsa Documentation
John Strawn
5 August 1987
1. How the code was developed.
Here is the basic algorithm. We assume that the scalar is already in y1.
move y:(r1),y0
move x:(r2),x0
mpy x0,y0,a
add y1,a
rnd a
move a,y:(r6)
One solution is to scrunch that code together as follows:
mpy x0,y0,a x:(r2),x0 a,y:(r6)
add y1,a y:(r1),y0
rnd a
But it makes sense to examine the macr instructon too. Here is
the basic code:
tfr y1,a
move y:(r1),y0
move x:(r5),x0
macr x0,y0,a
move a,y:(r6)
This looks more promising, since it reduces to two operations
plus three bus moves. Here is a canonical scrunch:
tfr y1,a a,y:(r6)
macr x0,y0,a x:(r5),x0 y:(r1),y0
but the problem with that is that the tfr overwrites the product
in a before it can be written out. Here is what one solution
(for a certain combination of sinp_a, sinp_b, sout) might look
like:
tfr y1,a x:(r1),y0 a,y:(r6)
macr x0,y0,a x:(r5),x0
Before enumerating all possible combinations, I should point out
that trying to double up the macrs, using both a and b registers,
won't gain much in processing speed. Here's what the raw
material would be:
move y1,a
move y:(r1),y0
move x:(r2),x0
macr x0,y0,a
move a,y:(r6)
move y1,b
move y:(r1),y0
move x:(r2),x0
macr x0,y0,b
move b,y:(r6)
There are so many operations there that you will need at least four
instructions to execute them all. Therefore, doubling up won't save
execution time. Even if it did save you time, you can't take the code
segment just listed and scrunch it down to, say, two macr operations plus
two double moves. The reason is that you can't possibly fit the "y1,a"
move into a double move. You can if you replace those moves with a tfr
instruction:
tfr y1,a x:(r2),x0 y:(r1),y0
macr x0,y0,a b,y:(r6)
tfr y1,b x:(r2),x0 y:(r1),y0
macr x0,y0,b a,y:(r6)
But then you've still got four instructions to calculate two output
elements.
Here, then, we enumerate the core loop for all possible combinations of
sinp_a, sinp_b, and sout:
; sinp_a = x (r1), sinp_b = x (r5), sout = x (r6)
tfr y1,a a,x:(r6)
move x:(r5),x0
macr x0,y0,a x:(r1),y0
; sinp_a = x (r1), sinp_b = x (r5), sout = y (r6)
tfr y1,a x:(r1),y0 a,y:(r6)
macr x0,y0,a x:(r5),x0
; sinp_a = x (r1), sinp_b = y (r5), sout = x (r6)
tfr y1,a a,x:(r6)
macr x0,y0,a x:(r1),y0 y:(r5),x0
; sinp_a = x (r1), sinp_b = y (r5), sout = y (r6)
tfr y1,a a,y:(r6)
macr x0,y0,a x:(r1),y0 y:(r5),x0
; sinp_a = y (r1), sinp_b = x (r5), sout = x (r6)
tfr y1,a a,x:(r6)
macr x0,y0,a x:(r5),x0 y:(r1),y0
; sinp_a = y (r1), sinp_b = x (r5), sout = y (r6)
tfr y1,a a,y:(r6)
macr x0,y0,a x:(r5),x0 y:(r1),y0
; sinp_a = y (r1), sinp_b = y (r5), sout = x (r6)
tfr y1,a a,x:(r6) y:(r1),y0
macr x0,y0,a y:(r5),x0
; sinp_a = y (r1), sinp_b = y (r5), sout = y (r6)
tfr y1,a a,y:(r6)
move y:(r5),x0
macr x0,y0,a y:(r1),y0
And here those enumerated cases are collapsed where possible:
; a)
; sinp_a = x (r1), sinp_b = x (r5), sout = x (r6)
; sinp_a = y (r1), sinp_b = y (r5), sout = y (r6)
tfr y1,a a,x:(r6) ; change x: for all 3
move x:(r5),x0
macr x0,y0,a x:(r1),y0
; b)
; sinp_a = x (r1), sinp_b = x (r5), sout = y (r6)
; sinp_a = y (r1), sinp_b = y (r5), sout = x (r6)
tfr y1,a x:(r1),y0 a,y:(r6) ; exchange x: and y:
macr x0,y0,a x:(r5),x0
; c)
; sinp_a = x (r1), sinp_b = y (r5), sout = x (r6)
; sinp_a = x (r1), sinp_b = y (r5), sout = y (r6)
tfr y1,a a,x:(r6) ; change x:
macr x0,y0,a x:(r1),y0 y:(r5),x0
; d)
; sinp_a = y (r1), sinp_b = x (r5), sout = x (r6)
; sinp_a = y (r1), sinp_b = x (r5), sout = y (r6)
tfr y1,a a,x:(r6) ; change x:
macr x0,y0,a x:(r5),x0 y:(r1),y0
; for c) to d), exchange x: with y: in second
For a change in pace in this macro, I have switched the roles of r1 and r5
between c) and d). This test can occur outside of the main loop, which
simplifies the appearance of the main loop significantly.
Here is loop initialization for the case sinp_a = sinp_b = sout = x:
move x:(r0),y1 ; initialize scalar
move x:(r5),x0
tfr y1,a x:(r1),y0
macr x0,y0,a
; loop start
tfr y1,a a,x:(r6) ; change x: for all 3
move x:(r5),x0
macr x0,y0,a x:(r1),y0
Further analysis shows that there is no great gains to be had by optimizing
loop initialization for the various cases. At best you could do something
like
move x:(r0),y1
tfr y1,a x:(r1),y0 y:(r5),x0
macr x0,y0,a
but testing for that would needlessly complicate the code.
2. locations of symbols:
Name Type Value Section Attributes
ans1.............int X:000020B0 GLOBAL
ans999...........int X:000020BE GLOBAL
ax_vec...........int X:00002000 GLOBAL
ay_vec...........int Y:00002018 GLOBAL
bx_vec...........int X:0000200C GLOBAL
by_vec...........int Y:00002024 GLOBAL
ixself1_vec......int X:00002030 GLOBAL
ixself2_vec......int X:00002036 GLOBAL
ixself3_vec......int X:0000203C GLOBAL
iyself1_vec......int Y:00002042 GLOBAL
iyself2_vec......int Y:00002048 GLOBAL
out1_vec.........int X:00002050 GLOBAL
out2_vec.........int Y:00002057 GLOBAL
out3_vec.........int X:0000205E GLOBAL
out4_vec.........int Y:00002065 GLOBAL
out5_vec.........int X:0000206C GLOBAL
out6_vec.........int Y:00002073 GLOBAL
out7_vec.........int X:0000207A GLOBAL
out8_vec.........int Y:00002081 GLOBAL
out9_vec.........int Y:00002088 GLOBAL
out11_vec........int Y:00002096 GLOBAL
sum1.............int X:000020A2 GLOBAL
sum14............int X:000020AF GLOBAL
t1...............int P:0000004C GLOBAL
t2...............int P:00000062 GLOBAL
t3...............int P:00000077 GLOBAL
t4...............int P:0000008C GLOBAL
t5...............int P:000000A1 GLOBAL
t6...............int P:000000B6 GLOBAL
t7...............int P:000000CB GLOBAL
t8...............int P:000000E0 GLOBAL
t9...............int P:000000F6 GLOBAL
t10..............int P:0000010B GLOBAL
t11..............int P:00000120 GLOBAL
t12..............int P:00000135 GLOBAL
t13..............int P:0000014B GLOBAL
t14..............int P:00000160 GLOBAL
xscal............int X:0000204E GLOBAL
yscal............int Y:0000204F GLOBAL