1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
|
=-----------------------------------------------------------------------------=
* *
* Paren Stack *
* *
=-----------------------------------------------------------------------------=
At the heart of this algorithm are two stacks.
There is the Paren Stack (PS) and the Frame stack.
The PS (pse in the code) keeps track of braces, parens,
if/else/switch/do/while/etc items -- anything that is nestable.
Complex statements go through some of these BS_ stages:
BS_PAREN1 - paren on if/for/switch/while, etc
BS_PAREN2 - paren on do{}while()
BS_BRACE_DO - brace set on do{}
BS_BRACE2 - brace on if/else/for/switch/while
BS_ELSE - expecting 'else' after 'if'
BS_ELSEIF - expecting 'if' after 'else'
BS_WHILE - expecting 'while' after 'do'
The file is processed one token at a time to support #if/#else/#endif
preprocessors at any point.
Take this simple if statement as an example:
if ( x )
{
x--;
}
The stack would look like so:
The format is first the token processed and then the PSE stack as it appears
AFTER the token is processed.
'if' [IF - PAREN1]
'(' [IF - PAREN1] [SPAREN OPEN]
'x' [IF - PAREN1] [SPAREN OPEN]
')' [IF - BRACE2] <- note that the stage was changed on SPAREN_CLOSE
'{' [IF - BRACE2] [BRACE OPEN]
'x' [IF - BRACE2] [BRACE OPEN]
'--' [IF - BRACE2] [BRACE OPEN]
';' [IF - BRACE2] [BRACE OPEN]
'}' [IF - ELSE]
<- lack of else kills the ELSE, closes statement
Virtual brace example:
if ( x )
x--;
else if (y)
y--;
else
z++;
'if' [IF - PAREN1]
'(' [IF - PAREN1] [SPAREN OPEN]
'x' [IF - PAREN1] [SPAREN OPEN]
')' [IF - BRACE2]
'x' [IF - BRACE2] [VBRACE OPEN] <- VBrace open inserted before because
the token was not '{'
'--' [IF - BRACE2] [VBRACE OPEN]
';' [IF - ELSE] <- Semicolon causes a VBrace close to be
inserted after the semicolon
'else' [ELSE - ELSEIF] <- IF changed into ELSE, expect IF or BRACE
'x' [ELSE - BRACE2] [VBRACE OPEN] <- lack of '{' -> VBrace
'++' [ELSE - BRACE2] [VBRACE OPEN]
';' <- VBrace close inserted after semicolon
ELSE removed after statement close
Nested virtual brace example: (EOF represents the end of the file)
if ( x )
if (y)
y--;
else
z++;
EOF
'if' [IF - PAREN1]
'(' [IF - PAREN1] [PAREN OPEN]
'x' [IF - PAREN1] [PAREN OPEN]
')' [IF - BRACE2]
'if' [IF - BRACE2] [VBRACE OPEN] [IF - PAREN1] <- VBrace on BRACE2, IF opened
'(' [IF - BRACE2] [VBRACE OPEN] [IF - PAREN1] [SPAREN OPEN]
'y' [IF - BRACE2] [VBRACE OPEN] [IF - PAREN1] [SPAREN OPEN]
')' [IF - BRACE2] [VBRACE OPEN] [IF - BRACE2]
'y' [IF - BRACE2] [VBRACE OPEN] [IF - BRACE2] [VBRACE OPEN]
'--' [IF - BRACE2] [VBRACE OPEN] [IF - BRACE2] [VBRACE OPEN]
';' [IF - BRACE2] [VBRACE OPEN] [IF - ELSE]
'else' [IF - BRACE2] [VBRACE OPEN] [ELSE - ELSEIF]
'z' [IF - BRACE2] [VBRACE OPEN] [ELSE - BRACE2] [VBRACE OPEN]
'++' [IF - BRACE2] [VBRACE OPEN] [ELSE - BRACE2] [VBRACE OPEN]
';' [IF - BRACE2] [VBRACE OPEN] [ELSE - BRACE2] - step1
[IF - BRACE2] [VBRACE OPEN] - step2
[IF - ELSE] - step3
EOF
-- this last semi is more complicated - first it terminates the VBRACE and then
the else, which then, since it is the end of a statement, terminates the
VBRACE. That bumps the IF stage to ELSE.
The EOF kills that off (since it is not an else)
Order of operation:
1) if TOS=VBRACE && PC=SEMI, insert VBRACE close, PC=>VBRACE close
2) if PC=VBRACE close or PC=BRACE close, and TOS is complex (if/else/etc)
then advance complex stage. If statement ends, pop and advance
Stages for each complex statement:
if
IF-PAREN1, IF-BRACE2, IF-ELSE
if/else
IF-PAREN1, IF-BRACE2, IF-ELSE, ELSE-ELSEIF, ELSE-BRACE2
if/else if/else
IF-PAREN1, IF-BRACE2, IF-ELSE, ELSE-ELSEIF, IF-PAREN1, IF-BRACE2, IF-ELSE, ELSE-ELSEIF, ELSE-BRACE2
for
FOR-PAREN1, FOR-BRACE2
while
WHILE-PAREN1, WHILE-BRACE2
switch
SWITCH-PAREN1, SWITCH-BRACE2
synchronized
SYNCHRONIZED-PAREN1
do/while
DO-BRACE_DO, DO-WHILE, WHILE-PAREN2
Another less-interesting example:
{
if (x)
volatile
{
y++;
}
return y;
}
'{' [BRACE OPEN]
'if' [BRACE OPEN] [IF - PAREN1]
'(' [BRACE OPEN] [IF - PAREN1] [PAREN OPEN]
'x' [BRACE OPEN] [IF - PAREN1] [PAREN OPEN]
')' [BRACE OPEN] [IF - BRACE2]
'volatile' [BRACE OPEN] [IF - BRACE2] [VBRACE OPEN] [VOLATILE - BRACE2]
'{' [BRACE OPEN] [IF - BRACE2] [VBRACE OPEN] [VOLATILE - BRACE2] [BRACE OPEN]
'y' [BRACE OPEN] [IF - BRACE2] [VBRACE OPEN] [VOLATILE - BRACE2] [BRACE OPEN]
'++' [BRACE OPEN] [IF - BRACE2] [VBRACE OPEN] [VOLATILE - BRACE2] [BRACE OPEN]
';' [BRACE OPEN] [IF - BRACE2] [VBRACE OPEN] [VOLATILE - BRACE2] [BRACE OPEN]
'}' [BRACE OPEN] [IF - ELSE] <- the brace close ends brace-open,
volatile-brace2 and vbrace-open
'return' [BRACE OPEN] <- not else
'y' [BRACE OPEN]
';' [BRACE OPEN]
'}' <- empties the stack
=-----------------------------------------------------------------------------=
* *
* Parse Frames *
* *
=-----------------------------------------------------------------------------=
The pse stack is kept on a frame stack.
The frame stack is need for languages that support preprocessors (C, C++, C#)
that can arbitrarily change code flow. It also isolates #define macros so
that they are indented independently and do not affect the rest of the program.
When an #if is hit, a copy of the current frame is push on the frame stack.
When an #else/#elif is hit, a copy of the current stack is pushed under the
#if frame and the original (pre-#if) frame is copied to the current frame.
When #endif is hit, the top frame is popped.
This has the following effects:
- a simple #if / #endif does not affect program flow
- #if / #else /#endif - continues from the #if clause
When a #define is entered, the current frame is pushed and cleared.
When a #define is exited, the frame is popped.
Take this example, which isn't very exciting, as both the #if and #else parts
end up with the same paren stack. This is the usual case.
{
foo(param1,
#ifdef DEBUG
"debug");
#else
"release");
#endif
}
Right before the #ifdef, we have this for the paren stack:
Top> [BRACE OPEN] [PAREN OPEN]
The #ifdef pushes a copy of the current stack, so we have this:
Top> [BRACE OPEN] [PAREN OPEN]
[BRACE OPEN] [PAREN OPEN]
The close paren after "debug" closes out the PAREN-OPEN on the top of the stack.
Top> [BRACE OPEN]
[BRACE OPEN] [PAREN OPEN]
The #else swaps the top two frames.
Top> [BRACE OPEN] [PAREN OPEN]
[BRACE OPEN]
Right after the #else, we hit another close paren after the "release".
Top> [BRACE OPEN]
[BRACE OPEN]
At the #endif, the top of stack is thrown out, which restores us to the #if path.
Top> [BRACE OPEN]
|