Vmsa Documentation John Strawn 5 August 1987 1. How the code was developed. Here is the basic algorithm. We assume that the scalar is already in y1. move y:(r1),y0 move x:(r2),x0 mpy x0,y0,a add y1,a rnd a move a,y:(r6) One solution is to scrunch that code together as follows: mpy x0,y0,a x:(r2),x0 a,y:(r6) add y1,a y:(r1),y0 rnd a But it makes sense to examine the macr instructon too. Here is the basic code: tfr y1,a move y:(r1),y0 move x:(r5),x0 macr x0,y0,a move a,y:(r6) This looks more promising, since it reduces to two operations plus three bus moves. Here is a canonical scrunch: tfr y1,a a,y:(r6) macr x0,y0,a x:(r5),x0 y:(r1),y0 but the problem with that is that the tfr overwrites the product in a before it can be written out. Here is what one solution (for a certain combination of sinp_a, sinp_b, sout) might look like: tfr y1,a x:(r1),y0 a,y:(r6) macr x0,y0,a x:(r5),x0 Before enumerating all possible combinations, I should point out that trying to double up the macrs, using both a and b registers, won't gain much in processing speed. Here's what the raw material would be: move y1,a move y:(r1),y0 move x:(r2),x0 macr x0,y0,a move a,y:(r6) move y1,b move y:(r1),y0 move x:(r2),x0 macr x0,y0,b move b,y:(r6) There are so many operations there that you will need at least four instructions to execute them all. Therefore, doubling up won't save execution time. Even if it did save you time, you can't take the code segment just listed and scrunch it down to, say, two macr operations plus two double moves. The reason is that you can't possibly fit the "y1,a" move into a double move. You can if you replace those moves with a tfr instruction: tfr y1,a x:(r2),x0 y:(r1),y0 macr x0,y0,a b,y:(r6) tfr y1,b x:(r2),x0 y:(r1),y0 macr x0,y0,b a,y:(r6) But then you've still got four instructions to calculate two output elements. Here, then, we enumerate the core loop for all possible combinations of sinp_a, sinp_b, and sout: ; sinp_a = x (r1), sinp_b = x (r5), sout = x (r6) tfr y1,a a,x:(r6) move x:(r5),x0 macr x0,y0,a x:(r1),y0 ; sinp_a = x (r1), sinp_b = x (r5), sout = y (r6) tfr y1,a x:(r1),y0 a,y:(r6) macr x0,y0,a x:(r5),x0 ; sinp_a = x (r1), sinp_b = y (r5), sout = x (r6) tfr y1,a a,x:(r6) macr x0,y0,a x:(r1),y0 y:(r5),x0 ; sinp_a = x (r1), sinp_b = y (r5), sout = y (r6) tfr y1,a a,y:(r6) macr x0,y0,a x:(r1),y0 y:(r5),x0 ; sinp_a = y (r1), sinp_b = x (r5), sout = x (r6) tfr y1,a a,x:(r6) macr x0,y0,a x:(r5),x0 y:(r1),y0 ; sinp_a = y (r1), sinp_b = x (r5), sout = y (r6) tfr y1,a a,y:(r6) macr x0,y0,a x:(r5),x0 y:(r1),y0 ; sinp_a = y (r1), sinp_b = y (r5), sout = x (r6) tfr y1,a a,x:(r6) y:(r1),y0 macr x0,y0,a y:(r5),x0 ; sinp_a = y (r1), sinp_b = y (r5), sout = y (r6) tfr y1,a a,y:(r6) move y:(r5),x0 macr x0,y0,a y:(r1),y0 And here those enumerated cases are collapsed where possible: ; a) ; sinp_a = x (r1), sinp_b = x (r5), sout = x (r6) ; sinp_a = y (r1), sinp_b = y (r5), sout = y (r6) tfr y1,a a,x:(r6) ; change x: for all 3 move x:(r5),x0 macr x0,y0,a x:(r1),y0 ; b) ; sinp_a = x (r1), sinp_b = x (r5), sout = y (r6) ; sinp_a = y (r1), sinp_b = y (r5), sout = x (r6) tfr y1,a x:(r1),y0 a,y:(r6) ; exchange x: and y: macr x0,y0,a x:(r5),x0 ; c) ; sinp_a = x (r1), sinp_b = y (r5), sout = x (r6) ; sinp_a = x (r1), sinp_b = y (r5), sout = y (r6) tfr y1,a a,x:(r6) ; change x: macr x0,y0,a x:(r1),y0 y:(r5),x0 ; d) ; sinp_a = y (r1), sinp_b = x (r5), sout = x (r6) ; sinp_a = y (r1), sinp_b = x (r5), sout = y (r6) tfr y1,a a,x:(r6) ; change x: macr x0,y0,a x:(r5),x0 y:(r1),y0 ; for c) to d), exchange x: with y: in second For a change in pace in this macro, I have switched the roles of r1 and r5 between c) and d). This test can occur outside of the main loop, which simplifies the appearance of the main loop significantly. Here is loop initialization for the case sinp_a = sinp_b = sout = x: move x:(r0),y1 ; initialize scalar move x:(r5),x0 tfr y1,a x:(r1),y0 macr x0,y0,a ; loop start tfr y1,a a,x:(r6) ; change x: for all 3 move x:(r5),x0 macr x0,y0,a x:(r1),y0 Further analysis shows that there is no great gains to be had by optimizing loop initialization for the various cases. At best you could do something like move x:(r0),y1 tfr y1,a x:(r1),y0 y:(r5),x0 macr x0,y0,a but testing for that would needlessly complicate the code. 2. locations of symbols: Name Type Value Section Attributes ans1.............int X:000020B0 GLOBAL ans999...........int X:000020BE GLOBAL ax_vec...........int X:00002000 GLOBAL ay_vec...........int Y:00002018 GLOBAL bx_vec...........int X:0000200C GLOBAL by_vec...........int Y:00002024 GLOBAL ixself1_vec......int X:00002030 GLOBAL ixself2_vec......int X:00002036 GLOBAL ixself3_vec......int X:0000203C GLOBAL iyself1_vec......int Y:00002042 GLOBAL iyself2_vec......int Y:00002048 GLOBAL out1_vec.........int X:00002050 GLOBAL out2_vec.........int Y:00002057 GLOBAL out3_vec.........int X:0000205E GLOBAL out4_vec.........int Y:00002065 GLOBAL out5_vec.........int X:0000206C GLOBAL out6_vec.........int Y:00002073 GLOBAL out7_vec.........int X:0000207A GLOBAL out8_vec.........int Y:00002081 GLOBAL out9_vec.........int Y:00002088 GLOBAL out11_vec........int Y:00002096 GLOBAL sum1.............int X:000020A2 GLOBAL sum14............int X:000020AF GLOBAL t1...............int P:0000004C GLOBAL t2...............int P:00000062 GLOBAL t3...............int P:00000077 GLOBAL t4...............int P:0000008C GLOBAL t5...............int P:000000A1 GLOBAL t6...............int P:000000B6 GLOBAL t7...............int P:000000CB GLOBAL t8...............int P:000000E0 GLOBAL t9...............int P:000000F6 GLOBAL t10..............int P:0000010B GLOBAL t11..............int P:00000120 GLOBAL t12..............int P:00000135 GLOBAL t13..............int P:0000014B GLOBAL t14..............int P:00000160 GLOBAL xscal............int X:0000204E GLOBAL yscal............int Y:0000204F GLOBAL