Notes on the implementation of VSADD
John Strawn
20 July 1987
1. How the inner loop came to be developed
The single scalar input is pointed to by R_I2, which is never
incremented. Assuming sinp != sins ("input scalar side"), the
inner loop might look like:
do x1,pf\vsadd_\ic\loop1
move sinp:(R_I1)+N_I1,x0 sins:(R_I2),a
add x0,a a,sout:(R_O)+N_O
pf\vsadd_\ic\loop1
An alternate would be something like:
move sins:(R_I2,),a
move sinp:(R_I1)+N_I1,b
add b,a
do x1,pf\vsadd_\ic\loop2
add b,a a,sout:(R_O)+N_O
move sinp:(R_I1)+N_I1,a
pf\vsadd_\ic\loop2
which is even better because there are no sins vs sinp vs sout
conflicts. Note that you can't fold that last move into the
second move slot in the add instruction because you'd be moving
into register a, which is what the add is writing into.
In that last example, the "add" doesn't use one of the two move
fields. Assuming sinp != sout, you can effectively "double up"
the instructions. Pipelining must be carefully handled to avoid
the screw case cnt=1. Of course, you've already tested against
cnt=0. Assume x1 contains cnt/2. You'd have something like
move sins:(R_I2),y0
move sinp:(R_I1)+N_I1,a
move sinp:(R_I1)+N_I1,b
add x0,a
do x1,pf\vsadd_\ic\l3
add y0,b move sinp:(R_I1)+N_I1,a a,sout:(R_O)+N_O
add y0,a move sinp:(R_I1)+N_I1,b b,sout:(R_O)+N_O
pf\vsadd_\ic\l3
(if odd then:)
move a,sout:(R_O)+N_O
As Julius says, this is asymptotically twice as fast as the loop
1 and loop2 solutions given earlier. Of course, if sinp==sout,
then the solutions are identical in execution time.
2. Debugging.
Of the possible combinations of sins, sinp, and sout:
sinp sins sout
1 x x x
2 x x y
3 x y x
4 x y y
5 y x x
6 y x y
7 y y x
8 y y y
actually, sins is outside the main loop, so it should be adequate
to test one x and one y of sins. We must test sinp==sout and
sinp=!sout for sinp==x and sinp==y. So the four tests chosen are
1, 2, 5, and 8, numbered t1 through t4 in tvsadd.asm.
3. For testing for even and odd with:
ror b ; b gets cnt/2
jcc pf\_vsadd_\ic\_l1
move #1,a1
pf\_vsadd_\ic\_l1
you really want any arbitrary non-zero value in a1, because later
you will do this:
move y1,b
neg b
jeq pf\_vsadd_\ic\_l2 ; if cnt odd,
move a,sout:(R_O)+N_O ; store final element
pf\_vsadd_\ic\_l2
Nominally you could collapse the jcc followed by move into a
tcc, but I can't find a *foolproof* source of non-zero anywhere
in the sources listed for the tcc instruction.
4. Here are the addresses of the input and output vectors:
_SYMBOL X
xscal I 0000201E
ax_vec I 00002000
ixself_vec I 00002018
out1_vec I 00002020
out3_vec I 0000202E
sum1 I 0000205D
sum2 I 0000205E
sum3 I 0000205F
sum4 I 00002060
sum5 I 00002061
sum6 I 00002062
sum7 I 00002063
sum8 I 00002064
sum9 I 00002065
ans1 I 00002066
ans2 I 00002067
ans3 I 00002068
ans4 I 00002069
ans5 I 0000206A
ans6 I 0000206B
ans7 I 0000206C
ans8 I 0000206D
ans9 I 0000206E
ans999 I 0000206F
_SYMBOL Y
yscal I 0000201F
ay_vec I 0000200C
out2_vec I 00002027
out4_vec I 00002035
out5_vec I 0000203C
out6_vec I 00002043
out7_vec I 0000204A
out8_vec I 00002051