--accepting rule #nRules are numbered sequentially with the first one being 1. Rule #0 is executed when the scanner backtracks; Rule #(n+1) (where n is the number of rules) indicates the default action; Rule #(n+2) indicates that the input buffer is empty and needs to be refilled and then the scan restarted. Rules beyond (n+2) are end-of-file actions.
Figure 3: Example of File Containing Lexical Analyzer |
" \ { } [ ] ^ $ < > ? . * + | ( ) /The meaning of each operator is summarized below: 1234¯1234¯1234¯1234¯1234¯1234 x -- the character "x"
Rules Interpretations ----- --------------- a or "a" The character a Begin or "Begin" The string Begin \"Begin\" The string "Begin" ^\t or ^"\t" The tab character \t at the beginning of line. \n$ The newline character \n at the end of line.There are a few special characters which can be specified in a regular expression: 1234¯1234¯1234¯1234¯1234¯1234 \n -- newline
Rules Interpretations ----- --------------- [^abc] Any character except a, b, or c. [abc] The single character a, b, or c. [-+0-9] The - or + sign or any digit from 0 to 9. [\t\n\b] The tab, newline, or backspace character.
Rules Interpretations ----- --------------- ab?c Matches either abc or ac. ab.c Matches all strings of length 4 having a, b and c as the first, second and fourth letter where the third character is not a newline.
Rules Interpretations ----- --------------- [a-z]+ Matches all strings of lower case letters. [A-Za-z][A-Za-z0-9]* Indicates all alphanumeric strings with a leading alphabetic character.
Rules Interpretations ----- --------------- ab|cd Matches either ab or cd. (ab|cd+)?(ef)* Matches such strings as abefef, efefef, cdef, or cddd; but not abc, abcd, or abcdef.
Rules Interpretations ----- --------------- ^ab Matches ab at the beginning of line. ab$ Matches ab at the end of line.
Rules Interpretations ----- --------------- {INTEGER} If INTEGER is defined in the macro definition section, then it will be expanded here.
definitions section %% rules section %% user defined section \#\# user defined sectionwhere %% is used as a delimiter between sections and ## indicates where function yylex will be placed. Both %% and ## must occur in column one.
name expressionwhere name must begin with a letter and contain only letters, digits and underscores, and expression is any string of characters that name will be textually substituted to if found in the rule section. At least one space must separate name from expression in the definition. No syntax checking is done in the expression, instead the whole rule is parsed after expansion. The macro facility is very useful in writing regular expressions which have common substrings, and in defining often-used ranges like digit and letter. Perhaps its best advantage is to give a mnemonic name to a rather strange regular expression - making it easier for the programmer to debug the expressions. These macros, once defined, can be used in the regular expression by surrounding them with { and }, e.g., {DIGIT}. For example, the rule
[a-zA-Z]([0-9a-zA-Z])* {put_line ("Found an identifier");} [0-9]+ {put_line ("Found a number");}defines identifiers and integer numbers. With macros, the source file is
LETTER [a-zA-Z] DIGIT [0-9] %% {LETTER}({DIGIT}|{LETTER})* {put_line ("Found an identifier");} {DIGIT}+ {put_line ("Found a number");}
%Start cond1 cond2 ...where cond1 and cond2 indicate start conditions. Note that %Start may be abbreviated as %S or %s.
ENTER(cond1);Aflex also provides exclusive start conditions. These are similar to normal start conditions except they have the property that when they are active no other rules are active. Exclusive start conditions are declared and used like normal start conditions except that the declaration is done with %x instead of %s.
pattern {action}where pattern is a regular expression and action is an Ada code fragment enclosed between { and }. A pattern must always begin in column one.
%% begin|BEGIN {copy (yytext, buffer); Install (yytext,symbol_table); return RESERVED;}recognizes the reserved word "begin" or "BEGIN", copies the token string into the buffer, inserts it in the symbol table and returns the value, RESERVED. Note that the user must provide the procedures copy and install along with all necessary types and variables in the user defined section.
return (token_val);to return the appropriate token value. Ayacc creates a package defining this token type from its specification file, which in turn should be with'ed at the beginning of the user defined section. Thus, this token package must be compiled before the lexical analyzer. The user is encouraged to read the Ayacc User Manual [] for more information on the interaction between aflex and ayacc.
LOWER [a-z] UPPER [A-Z] %% {LOWER}+ { Lower_Case := Lower_Case + 1; TEXT_IO.PUT(To_Upper_Case(Example_DFA.YYText)); } -- convert all alphabetic words in lower case -- to upper case {UPPER}+ { Upper_Case := Upper_Case + 1; TEXT_IO.PUT(Example_DFA.YYText); } -- write uppercase word as is \n { TEXT_IO.NEW_LINE;} . { TEXT_IO.PUT(Example_DFA.YYText); } -- write anything else as is %% with U_Env; -- VADS environment package for UNIX procedure Example is type Token is (End_of_Input, Error); Tok : Token; Lower_Case : NATURAL := 0; -- frequency of lower case words Upper_Case : NATURAL := 0; -- frequency of upper case words function To_Upper_Case (Word : STRING) return STRING is Temp : STRING(1..Word'LENGTH); begin for i in 1.. Word'LENGTH loop Temp(i) := CHARACTER'VAL(CHARACTER'POS(Word(i)) - 32); end loop; return Temp; end To_Upper_Case; -- function YYlex will go here!! ## begin -- Example Example_IO.Open_Input (U_Env.argv(1).s); Read_Input : loop Tok := YYLex; exit Read_Input when Tok = End_of_Input; end loop Read_Input; TEXT_IO.NEW_LINE; TEXT_IO.PUT_LINE("Number of lowercase words is => " & INTEGER'IMAGE(Lower_Case)); TEXT_IO.PUT_LINE("Number of uppercase words is => " & INTEGER'IMAGE(Upper_Case)); end Example;This source file is run through aflex using the command
% aflex example.laflex produces an output file called example.a along with two packages, example_dfa.a and example_io.a. Assuming that the main procedure, Example, is used to construct an object file called example.out, the Unix command
% example.out example.lprints to the screen the exact file example.l with letters in uppercase, i.e. the output to the screen is
LOWER [A-Z] UPPER [A-Z] %% {LOWER}+ { LOWER_CASE := LOWER_CASE + 1; TEXT_IO.PUT(TO_UPPER_CASE(EXAMPLE_DFA.YYTEXT)); } -- CONVERT ALL ALPHABETIC WORDS IN LOWER CASE -- TO UPPER CASE {UPPER}+ { UPPER_CASE := UPPER_CASE + 1; TEXT_IO.PUT(EXAMPLE_DFA.YYTEXT); } -- WRITE UPPERCASE WORD AS IS \N { TEXT_IO.NEW_LINE;} . { TEXT_IO.PUT(EXAMPLE_DFA.YYTEXT); } -- WRITE ANYTHING ELSE AS IS %% WITH U_ENV; -- VADS ENVIRONMENT PACKAGE FOR UNIX PROCEDURE EXAMPLE IS TYPE TOKEN IS (END_OF_INPUT, ERROR); TOK : TOKEN; LOWER_CASE : NATURAL := 0; -- FREQUENCY OF LOWER CASE WORDS UPPER_CASE : NATURAL := 0; -- FREQUENCY OF UPPER CASE WORDS FUNCTION TO_UPPER_CASE (WORD : STRING) RETURN STRING IS TEMP : STRING(1..WORD'LENGTH); BEGIN FOR I IN 1.. WORD'LENGTH LOOP TEMP(I) := CHARACTER'VAL(CHARACTER'POS(WORD(I)) - 32); END LOOP; RETURN TEMP; END TO_UPPER_CASE; -- FUNCTION YYLEX WILL GO HERE!! ## BEGIN -- EXAMPLE EXAMPLE_IO.OPEN_INPUT (U_ENV.ARGV(1).S); READ_INPUT : LOOP TOK := YYLEX; EXIT READ_INPUT WHEN TOK = END_OF_INPUT; END LOOP READ_INPUT; TEXT_IO.NEW_LINE; TEXT_IO.PUT_LINE("NUMBER OF LOWERCASE WORDS IS => " & INTEGER'IMAGE(LOWER_CASE)); TEXT_IO.PUT_LINE("NUMBER OF UPPERCASE WORDS IS => " & INTEGER'IMAGE(UPPER_CASE)); END EXAMPLE; Number of lowercase words is => 144 Number of uppercase words is => 120
definitions section %% rules section %% user defined section ## -- places yylex function user defined section
Create_Output("/dev/tty");be made. This will still work but because of differences in implementation this may cause difficulties in redirecting output using the unix shell pipes and redirection. Instead just don't call Open_Input and output will go to the default standard_output.
{DIG} [0-9] -- a digitIn which the pushed-back text is "([0-9] - a digit)".