As a preliminary step in designing the compiler, a program is presented that will take a series of assignment statements such as A=A+2 and assemble them into 6502 machine code. This program, Assignment, will then form the basis for the complete compiler.
This program compiles in a single pass, reading the input line and generating assembler statements as it goes. The following operations, which work on 8-bit numbers, are handled by the program
+ : add - : subtract & : logical AND | : logical OR >> : right shift << : left shift
All these operators have the same priority, and brackets can be used to alter the order of evaluation. Note that the shift operators must have a constant as their right-hand operand, as in:
A=B>> 2
Variable names can have up to six upper-case letters, and are automatically assigned to zero-page memory locations by the program.
The program uses a stack to hold intermediate results during compilation. Addresses are represented by numbers in the range 1 to FFFF (hexadecimal), and constants as the negative of their value; i.e. by numbers in the range 0 to -255. When the compiler reads an operator it performs the following sequence:
The intermediate results during the compilation of a complicated expression, such as A=(B+2) & (C+3), are associated with temporary locations, TT(0), TT(1) etc.
Using this procedure the program would compile the statement:
MAX=31&(VAR+15)
as:
LDA VAR CLC ADC @15 STA TT(0) LDA @31 AND TT(0) STA TT(1) LDA TT(1) STA MAX
The superfluous STA TT(1) and LDA TT(1) are eliminated by keeping track of the address whose contents are in the accumulator at any time. Instead of generating code to load the accumulator, a call is made to a subroutine that first checks to make sure that the accumulator does not already contain the required value. Furthermore, code is only generated to store the accumulator's contents when absolutely necessary; that is, when a new value is to be loaded into the accumulator.
Sample Run
The following section shows a sample run of the Assignment program, on the Atom, for various statements which are typed in after the '?' prompt. In the listing the first column, which is only present in the Atom version, shows the line in the compiler program that generated the assembler code. The next column gives the address of the machine code, followed by the one, two, or three bytes of the instruction, and the assembler statement. The symbols L, U, and H in the assembler statements are used by the program, and take different values at different points in the program.
>RUN ?VAR=1
7210 3A00 A9 01 LDA @-L 7200 3A02 85 51 STA H
The program has allocated location #51 for the variable VAR.
?VAR=VAR+17040 3A04 A5 51 LDA L 3070 3A06 18 CLC 3070 3A07 69 01 ADC @-U 7200 3A09 85 51 STA H?MAX=31&(VAR+15)7040 3A0B A5 51 LDA L 3070 3A0D 18 CLC 3070 3A0E 69 0F ADC @-U 7200 3A10 85 80 STA H 7210 3A12 A9 1F LDA @-L 3065 3A14 25 80 AND U 7200 3A16 85 52 STA H
The program has allocated location #52 for MAX. The bracketed expression is evaluated first, and stored in a temporary location #80.
?MIN=(1|VAR)-(MAX>>4)7210 3A18 A9 01 LDA @-L 3060 3A1A 05 51 ORA U 7200 3A1C 85 80 STA H 7040 3A1E A5 52 LDA L 3180 3A20 4A LSR A 3180 3A21 4A LSR A 3180 3A22 4A LSR A 3180 3A23 4A LSR A 7200 3A24 85 81 STA H 7040 3A26 A5 80 LDA L 3055 3A28 38 SEC 3055 3A29 E5 81 SBC U 7200 3A2B 85 53 STA H
Here MIN is allocated location #53, and two temporary locations are used. Note that the right-hand operand of '>>' or '<<' must be a constant.
?X=1+(VAR-) BRACKET MISSING X=l+(VAR-) ^
Finally, a statement which generated an error.
BBC Computer Version
5 REM ... Assignment ... 10 HIMEM=&2800 20 DIM SS(20),ID$(30),JJ(30) 30 DIM TT(20),X%7,Z%256:A$=CHR$(6)
Important addresses:
MC - Start address for machine code.
VARS - Start address for variables and arrays.
TEMPS - 20 temporary locations.
35 MC=&3800: VARS=&50:TEMPS=&80:PRARG=&94:SADD=&2800 115 FOR N=0T020:TT(N)=0:NEXT 140 I=0:P=MC:S=0:R=0:H=0:T=0
One pass of compilation. Initialise pointers, and make sure accumulator is stored finally.
200 REPEAT INPUT$Z%: A=Z% 210 PROCSTMT:PROCSTA 220 UNTIL FALSE
PROCSTMT - Statement. Skip blanks, then read symbol.
1000 DEF PROCSTMT 1010 PROCSP:PROCSYM 1110 PROCSP 1135 PROCV 1160 PROCRHS:H=V:PROCSTA:ENDPROC
PROCRHS - Right-hand side of assignment statement.
1180 DEF PROCRHS 1185 IF ?A<>ASC"=" PRINTA$"NO =-":PROCERR 1190 A=A+1:PROCEXP:L=FNPUL 1195 PROCLDA:V=FNPUL:ENDPROC
PROCIDENT - Read an identifier.
2000 DEF PROCIDENT 2010 PROCSYM:PROCV:ENDPROC
PROCV - Look up identifier Z$ an symbol table. If symbol does not already exist (N=I) allocate address for it. Push address to stack.
2020 DEF PROCV 2030 IF N=0 ENDPROC 2040 PROCLOOK 2050 IFN=I:I=I+1:R=R+1:JJ(N)=R+VARS 2070 IFI>30PRINT"TOO MANY VARIABLES":PROCERR 2080 U=JJ(N):PROCPSH(U):ENDPROC
PROCONST - Read a decimal constant. If not found, N=0. If found, push minus its value.
2100 DEF PROCONST 2105 PROCSP 2110 N=-1:C=0:REPEAT D=C:N=N+1:C=C*10 2120 C=C+A?N-ASC"0" 2130 UNTILA?N<ASC"0"ORA?N>ASC"9" 2140 IFN=0 ENDPROC 2150 A=A+N 2160 U=-D:PROCPSH(U):N=l:ENDPROC
PROCLOOK Look up X% in symbol table, ID$(0), ID$(1) ... If not found, N=I.
2400 DEF PROCLOOK 2410 ID$(I)=$X%:N=-1 2420 REPEAT N=N+1:UNTILID$(N)=ID$(I):ENDPROC
PROCEXP - Assemble code to calculate an expression, of the
form:
<factor> <operator> <factor>
where <operator> is one of:
+ : add - : subtract | : OR & : AND << : left shift >> : right shift
Then push the address of the result on the stack.
3000 DEF PROCEXP PROCSP:PROCFACTOR 3010 PROCSP 3020 IF?A=ASC"+"OR?A=ASC"-"OR?A=ASC"&" OR?A=ASC"|"O=?A:A=A+l:GOTO 3035 3025 IF NOT( (?A=ASC "> "AND A?1=ASC ">")OR (?A=ASC "< "AND A?1=ASC<"))ENDPROC 3030 O=?A:A=A+2 3035 PROCPSH(O) 3040 PROCFACTOR:U=FNPUL:O=FNPUL:L=FNPUL:PROCLDA 3045 IF U<=0 GOTO 3070 3050 IFO=ASC"+"[CLC:ADC U:] 3055 IFO=ASC"-"[SEC:SBC U:] 3060 IFO=ASC"|"[ORA U:] 3065 IFO=ASC"&("[AND U:] 3068 GOTO 3190 3070 IFO=ASC"+"[CLC:ADC #-U:] 3075 IFO=ASC"-"[SEC:SBC #-U:] 3080 IFO=ASC"|"[ORA #-U:] 3085 IFO=ASC"&"[AND #-U:l 3160 IFO=ASC"<"FOR N=1TO-U:[ASL A:]:NEXT 3180 IFO=ASC">"FOR N=1TO-U:[LSR A:]:NEXT 3190 L=U:PROCRELEASE(L):PROCTEMP 3195 GOTO 3010
PROCFACTOR - Factor. Check for symbol, constant, or bracketed expression. If the symbol is followed by '(' or '[ than it is a function or an array respectively.
3600 DEF PROCFACTOR 3610 PROCSYM:IFN=0GOT03630 3620 PROCV 3625 ENDPROC 3630 PROCONST: IF N ENDPROC 3635 IF?A<>ASC" ( " PRINTA$"BRACKET MISSING ": PROCERR 3640 A=A+1:PROCEXP:PROCSP 3650 IF?A<>ASC")" PRINTA$"BRACKET MISSING":PROCERR 3660 A=A+1: ENDPROC
PROCPSH - Push argument onto stack.
5020 DEF PROCPSH (U ) SS ( S ) =U: S=S+1: IFS <21 ENDPROC 5021 PRINTA$ "STACK FULL":PROCERR
FNPUL - Pull from stack.
5030 DEF FNPUL:S=S-1:IFS>=0 =SS(S) 5031 PRINTA$"STACK ERROR": PROCERR
PROCSP - Skip blanks.
5040 DEF PROCSP 5042 IF?A=32 REPEAT A=A+1:UNTIL?A<>32 5049 ENDPROC
PROCTEMP - Generate a temporary location TT(N ; return its address in T, set H to the address, and push the address.
5100 DEF PROCTEMP 5110 N=-1:REPEAT N=N+1: IF N>20PRINTA$"NOT ENOUGH TEMP":PROCERR 5120 UNTILTT(N)=0 5130 T=N+TEMPS:TT(N)=T:U=T:H=T:PROCPSH(U):ENDPROC
PROCSYM - Read a symbol into $X% from $A. Returns N=0 if no symbol found.
6000 DEF PROCSYM 6010 PROCSP:N=-1:REPEAT N=N+1: N?X%=A?N 6020 UNTILA?N>ASC"Z"ORA?N<ASC"A"ORN=7 6030 IF N=0 ENDPROC 6040 IF N<7 N?X%=&D:A=A+N:ENDPROC 6050 PRINTA$"SYMBOL TOO LONG":PROCERR
PROCLDA - Assemble code to load the accumulator with L. If accumulator already contains L (L=H) then do nothing; otherwise store its previous contents (PROCSTA) and load new contents.
7000 DEF PROCLDA 7010 IFL=H AND L>0 PROCRELEASE(L):ENDPROC 7020 PROCSTA 7030 IFL<=0 [LDA #-L:]:ENDPROC 7040 [LDA L:]:PROCRELEASE(L) 7050 ENDPROC
PROCSTA - Assemble code to store accumulator's contents to location H.
7100 DEF PROCSTA 7200 IFH>0[STA H:]:H=0 7210 ENDPROC
PROCRELEASE - Release temporary variable for re-use.
7300 DEF PROCRELEASE(L) 7310 IF L>=TEMPS AND L<TEMPS+20:TT(L-TEMPS)=0 7320 ENDPROC
PROCERR - Output error.
9000 DEF PROCERR 9010 PRINT '$Z% 9030 PRINT TAB(A-Z%-1);""":END
Variables:
A - Pointer to current position in expression being compiled C - Used to evaluate constant H - Address whose contents are currently in accumulator. H=0 means ignore previous contents I - Number of next free symbol ID$(0)..ID$(30) - Pointers to symbol names JJ(0)..JJ(30) - Addresses of symbols L - Value or address to be loaded into accumulator; used by PROCLDA N - Temporary variable O - Operator read by PROCEXP P - Program location counter, used by assembler RR(0)..RR(2) - Constant addresses R - Number of variable locations used up S - Next free location on SS stack SS(0)..SS(20) - Stack used by compiler T - Temporary location assigned by PROCTEMP TT(0) ..TT(20) - Flags for temporary locations; value=0 if location is free for use U - Value to be pushed by PROCPSH V - Used by FNPUL X% - String into which symbols and keywords are read by PROCSYM Z% - Input buffer
Atom Version
10 ... ASSIGNMENT ... 20 DIM SS(20),LL(20),II (30),JJ(30) 30 DIM X(7),TT(20),RR(2)
Important addresses:
RR0 - Start address of machine code.
RR1 - Start address of variables and arrays.
RR2 - 20 temporary locations.
35 RR0=#3A00; RRl=#50; RR2=#80 40 F.N=0TO 30; DIMI( 6 ); IIN=I; JJN=RR0; N. 115 F.N=0TO20; TTN=0; LLN=RR0; N. 140 Z=#100; G=0; I=0; P=RR0; S=0; R=0; H=0; T=0
One pass of compilation. Initialise pointers, and make sure accumulator is stored finally.
200 DO INPUT$Z;A=Z 210 GOS.s;GOS.m 220 UNTIL 0
s - Statement. Skip blanks, then read symbol.
1000sREM STATEMENT 1010 GOS.b;GOS.x 1110 GOS.b 1135 GOS.j 1160 GOS.d;H=V;GOS.m;R.
d - Right-hand side of assignment statement.
1180dIF?A<>CH"=" P.$6 "NO =";G.o 1190 A=A+1;GOS.e;GOS.v;L=V;GOS.1;GOS.v;R.
i - Read an identifier.
2000iREM IDENTIFIER 2010 GOS.x
j - Look up identifier X in symbol table. If symbol does not already exist (N=I) allocate address for it. Push address to stack.
2030jIF N=0 R. 2040 GOS.y 2050 IFN=I;I=I+1;R=R+1;JJN=R+RR1 2070 IFI>30P.$6"TOO MANY VARIABLES";G.o 2080 U=JJN;GOS.u;R.
c - Read a decimal constant. If not found, N=0. If found, push minus its value.
2100cREM CONSTANT 2105 GOS.b 2110 N=-1;C=0;DO D=C;N=N+1;C=C*10 2120 C=C+A?N-CH"0" 2130 U.A?N<CH"0"ORA?N>CH"9" 2140 IFN=0 R. 2150 A=A+N 2160 U=-D;GOS.u;N=l;R.
y - Look up X in symbol table, $II(0), $II(l) ...
If not found, N=I.
2400yREM LOOKUP 2410 $III=$X;N=-1 2420 DO N=N+1;U.$IIN=$III;R.
e - Assemble code to calculate an expression, of the form:
<factor> <operator> <factor>
where <operator> is one of:
+ : add - : subtract | : OR & : AND << : left shift >> : right shift
Then push that address of the result on the stack.
3000eREM EXPRESSION 3010 GOS.b;GOS.f 3015 Gos.b 3020 IF?A=CH"+"OR?A=CH"-"OR?A=CH"&"OR ?A=CH"|"O=?A;A=A+1;G.3035 3025 IF((?A=CH">"A.A?1=CH">")OR (?A=CH "< "A.A? 1=CH "< " ) ):1 R. 3030 O=?A;A=A+2 3035 U=O;GOS.u 3040 GOS.f;GOS.v;U=V;GOS.v;O=V;GOS.v;L=V;GOS.l 3045 IF U<=0 G.3070 3050 IFO=CH"+"[CLC;ADC U;] 3055 IFO=CH"-"[SEC;SBC U;] 3060 IFO=CH"|"[ORA U;] 3065 IFO=CH"&"[AND U;] 3068 G.3190 3070 IFO=CH"+" [CLC;ADC @-U; ] 3075 IFO=CH"-"[SEC;SBC @-U;] 3080 IFO=CH"|"[ORA @-U;) 3085 IFO=CH"&"[AND @-U;] 3160 IFO=CH"<"F.N=1TO-U;[ASL A;];N. 3180 IFO=CH">"F.N=1TO-U;[LSR A;];N. 3190 L=U;GOS.r;GOS.t 3195 G.3015
f - Factor. Check for symbol, constant, or bracketed expressipn. If the symbol is followed by '(' or '[' then it is a function or an array respectively.
3600fREM FACTOR 3610 GOS.x;IFN=0G.3630 3620 GOS.j 3625 R. 3630 GOS.c;IF N R. 3635 IF?A<>CH"(" P.$6"BRACKET MISSING";G.o 3640 A=A+l;GOS.e;GOS.b 3650 IF?A<>CH")" P.$6"BRACKET MISSING";G.o 3660 A=A+1;R.
u - Push U onto stack.
5020uSSS=U;S=S+1;IFS<21 R. 5021 P.$6"STACK FULL";G.o
v - Pull V from stack.
5030vS=S-1;IFS>=0V=SSS; R. 5031 P.$6"STACK ERROR";G.o
b - Skip blanks.
5040bIF?A=32 DO A=A+1;U.?A<>32 5043 R.
t - Generate a temporary location TTN; return its address in T, set H to the address, and push the address.
5100tREM TEMP. LOC. 5110 N=-1;DO N=Ntl; IF N>20P.$6"NOT ENOUGH TEMP";G.o 5120 U.TTN=0 5130 T=N+RR2; TTN=T; U=T; H=T; G.u
x - Read a symbol into $X from $A. Returns N=0 if no symbol found.
6000xREM READ SYMBOL 6010 GOS.b;N=-1;DO N=N+1; N?X=A?N 6020 U.A? N>CH "Z "ORA?N<CH "A" ORN=7 6030 IF N=0 R. 6040 IF N<7 N?X=#D;A=A+N;R. 6050 P.$6"SYMBOL TOO LONG";G.o
l - Assemble code to load the accumulator with L. If accumulator already contains L (L=H) then do nothing; otherwise store its previous contents (GOS.m) and load new contents.
7000lREM LOAD ACCUMULATOR 7010 IFL=H AND L>0 G.r 7020 GOS.m 7030 IFL<=0 [LDA @-L;];R. 7040 [LDA L;];G.r
m - Assemble code to store accumulator's contents to location H.
7100mREM STORE ACCUMULATOR 7200 IFH>0[STA H;];H=0 7210 R.
r - Release temporary variable with address L for re-use.
7300rREM RELEASE VARIABLE 7310 IF L>=RR2 AND L<RR3;TT(L-RR2)=0 7320 R.
o - Output error.
9000oREM ERROR 9020 P.'$Z' 9030 F.N=Z TOA-2;P." ";N.;P."^"';E.
Variables:
A - Pointer to current position in expression being compiled C - Used to evaluate constant H - Address whose contents are currently in accumulator. H=0 means ignore previous contents I - Number of next free symbol II(0)..II(30) - Pointers to symbol names JJ(0)..JJ(30) - Addresses of symbols L - Value or address to be loaded into accumulator; used by subroutine l N - Temporary variable O - Operator read by subroutine e P - Program location counter, used by assembler RR(0)..RR(2) - Constant addresses R - Number of variable locations used up S - Next free location on SS stack SS(0)..SS(20) - Stack used by compiler T - Temporary location assigned by subroutine t TT(0)..TT(20) - Flags for temporary locations; value=0 if location is free for use U - Value to be pushed by subroutine u V - Value to be pulled by subroutine v X - String into which symbols and keywords are read by subroutine x Z - Input buffer