Notes on the implementation of VSADD

John Strawn
20 July 1987

1.  How the inner loop came to be developed

The single scalar input is pointed to by R_I2, which is never 
incremented. Assuming sinp != sins ("input scalar side"), the 
inner loop might look like:

          do   x1,pf\vsadd_\ic\loop1
          move sinp:(R_I1)+N_I1,x0 sins:(R_I2),a
          add  x0,a      a,sout:(R_O)+N_O
pf\vsadd_\ic\loop1 

An alternate would be something like:

          move sins:(R_I2,),a
          move sinp:(R_I1)+N_I1,b
          add  b,a 
          do   x1,pf\vsadd_\ic\loop2
          add  b,a a,sout:(R_O)+N_O
          move sinp:(R_I1)+N_I1,a
pf\vsadd_\ic\loop2
          
which is even better because there are no sins vs sinp vs sout 
conflicts.   Note that you can't fold that last move into the 
second move slot in the add instruction because you'd be moving 
into register a, which is what the add is writing into.

In that last example, the "add" doesn't use one of the two move 
fields.  Assuming sinp != sout, you can effectively "double up" 
the instructions.  Pipelining must be carefully handled to avoid 
the screw case cnt=1.  Of course, you've already tested against 
cnt=0.  Assume x1 contains cnt/2. You'd have something like

          move sins:(R_I2),y0
          move sinp:(R_I1)+N_I1,a
          move sinp:(R_I1)+N_I1,b
          add  x0,a
          do   x1,pf\vsadd_\ic\l3
          add  y0,b  move sinp:(R_I1)+N_I1,a a,sout:(R_O)+N_O
          add  y0,a  move sinp:(R_I1)+N_I1,b b,sout:(R_O)+N_O   
pf\vsadd_\ic\l3 
          (if odd then:)
               move a,sout:(R_O)+N_O

As Julius says, this is asymptotically twice as fast as the loop 
1 and loop2 solutions given earlier. Of course, if sinp==sout, 
then the solutions are identical in execution time. 

2.  Debugging.  

Of the possible combinations of sins, sinp, and sout:

          sinp sins sout
     1    x    x    x
     2    x    x    y
     3    x    y    x
     4    x    y    y
     5    y    x    x
     6    y    x    y
     7    y    y    x
     8    y    y    y

actually, sins is outside the main loop, so it should be adequate 
to test one x and one y of sins.  We must test sinp==sout and 
sinp=!sout for sinp==x and sinp==y.  So the four tests chosen are 
1, 2, 5, and 8, numbered t1 through t4 in tvsadd.asm. 

3. For testing for even and odd with:

          ror  b                   ; b gets cnt/2 
          jcc  pf\_vsadd_\ic\_l1
          move      #1,a1
pf\_vsadd_\ic\_l1     

you really want any arbitrary non-zero value in a1, because later 
you will do this: 

          move y1,b
          neg  b    
          jeq  pf\_vsadd_\ic\_l2   ; if cnt odd,
               move a,sout:(R_O)+N_O    ; store final element
pf\_vsadd_\ic\_l2

Nominally you could collapse the jcc followed by   move into a 
tcc, but I can't find a *foolproof* source of non-zero anywhere 
in the sources listed for the tcc instruction.

4.  Here are the addresses of the input and output vectors:

_SYMBOL X
xscal            I 0000201E
ax_vec           I 00002000
ixself_vec       I 00002018
out1_vec         I 00002020
out3_vec         I 0000202E
sum1             I 0000205D
sum2             I 0000205E
sum3             I 0000205F
sum4             I 00002060
sum5             I 00002061
sum6             I 00002062
sum7             I 00002063
sum8             I 00002064
sum9             I 00002065
ans1             I 00002066
ans2             I 00002067
ans3             I 00002068
ans4             I 00002069
ans5             I 0000206A
ans6             I 0000206B
ans7             I 0000206C
ans8             I 0000206D
ans9             I 0000206E
ans999           I 0000206F

_SYMBOL Y
yscal            I 0000201F
ay_vec           I 0000200C
out2_vec         I 00002027
out4_vec         I 00002035
out5_vec         I 0000203C
out6_vec         I 00002043
out7_vec         I 0000204A
out8_vec         I 00002051