This manual describes the Pure programming language and how to invoke the Pure
interpreter program. To read the manual inside the interpreter, just type
help at the command prompt. See the Online Help section for details.
Pure is a functional programming language based on term rewriting. This means
that all your programs are essentially just collections of symbolic equations
which the interpreter uses to reduce expressions to their simplest (“normal”)
form. This makes for a rather powerful and flexible programming model
featuring dynamic typing and general polymorphism. In addition, Pure programs
are compiled to efficient native code on the fly, using the LLVM compiler
framework, so programs are executed reasonably fast and interfacing to C is
very easy. If you have the necessary 3rd party compilers installed then you
can even inline functions written in C and a number of other languages and
call them just like any other Pure function. The ease with which you can
interface to 3rd party software makes Pure useful for a wide range of
applications from symbolic algebra and scientific programming to database, web
and multimedia applications.
The Pure language is implemented by the Pure interpreter program. Just like
other programming language interpreters, the Pure interpreter provides an
interactive environment in which you can type definitions and expressions,
which are executed as you type them at the interpreter’s command prompt.
However, despite its name the Pure interpreter never really “interprets” any
Pure code. Rather, it acts as a frontend to the Pure compiler, which takes
care of incrementally compiling Pure code to native (machine) code. This has
the benefit that the compiled code runs much faster than the usual kinds of
“bytecode” that you find in traditional programming language interpreters.
You can use the interpreter as a sophisticated kind of “desktop calculator”
program. Simply run the program from the shell as follows:
$ pure
Pure 0.46 (x86_64-unknown-linux-gnu) Copyright (c) 2008-2010 by Albert Graef
(Type 'help' for help, 'help copying' for license information.)
Loaded prelude from /usr/local/lib/pure/prelude.pure.
>
The interpreter prints its sign-on message and leaves you at its ‘> ‘ command
prompt, where you can start typing definitions and expressions to be
evaluated:
> 17/12+23;
24.4166666666667
> fact n = if n>0 then n*fact (n-1) else 1;
> map fact (1..10);
[1,2,6,24,120,720,5040,40320,362880,3628800]
Typing the quit command or the end-of-file character (Ctrl-d on
Unix systems) at the beginning of the command line exits the interpreter and
takes you back to the shell.
Instead of typing definitions and evaluating expressions in an interactive
fashion as shown above, you can also put the same code in an (ASCII or UTF-8)
text file called a Pure program or script which can then be executed by
the interpreter in “batch mode”, or compiled to a standalone executable which
can be run directly from the command line. As an aid for writing script files,
a bunch of syntax highlighting files and programming modes for various popular
text editors are included in the Pure sources.
More information about invoking the Pure interpreter can be found in the
Invoking Pure section below. This is followed by a description of the Pure
language in Pure Overview and subsequent sections. The interactive
facilities of the Pure interpreter are discussed in the Interactive Usage
section, while the Batch Compilation section explains how to translate Pure
programs to native executables and a number of other object file formats. The
Caveats and Notes section discusses useful tips and tricks, as well as
various pitfalls and how to avoid them. The manual concludes with some
authorship and licensing information and pointers to related software.
This manual is not intended as a general introduction to functional
programming, so at least some familiarity with this programming style is
assumed. If Pure is your first functional language then you might want to look
at the Functional Programming wikipedia article to see what it is all about
and find pointers to current literature on the subject. In any case we hope
that you’ll find Pure helpful in exploring functional programming, as it is
fairly easy to learn but a very powerful language.
As already mentioned, Pure uses term rewriting as its underlying computational
model, which goes well beyond functional programming in some ways. Term
rewriting has long been used in computer algebra systems, and Michael
O’Donnell pioneered its use as a programming language already in the
1980s. But until recently implementations have not really been efficient
enough to be useful as general-purpose programming languages; Pure strives to
change that. A good introduction to the theory of the term rewriting calculus
and its applications is the book by Baader and Nipkow.
Program examples are always set in typewriter font. Here’s how a typical code
sample may look like:
fact n = if n>0 then n*fact(n-1) else 1;
These can either be saved to a file and then loaded into the interpreter, or
you can also just type them directly in the interpreter. If some lines start
with the interpreter prompt ‘> ‘, this indicates an example interaction with
the interpreter. Everything following the prompt (excluding the ‘> ‘ itself)
is meant to be typed exactly as written. Lines lacking the ‘> ‘ prefix show
results printed by the interpreter. Example:
> fact n = if n>0 then n*fact(n-1) else 1;
> map fact (1..10);
[1,2,6,24,120,720,5040,40320,362880,3628800]
Similarly, lines starting with the ‘$ ’ prompt indicate shell interactions.
For instance,
indicates that you should type the command pure on your system’s command
line.
The grammar notation in this manual uses an extended form of BNF (Backus-Naur
form), which looks as follows:
expression ::= "{" expr_list (";" expr_list)* [";"] "}"
expr_list ::= expression (',' expression)*
Parentheses are used to group syntactical elements, while brackets denote
optional elements. We also use the regular expression operators * and
+ to denote repetitions (as usual, * denotes zero or more, + one
or more repetitions of the preceding element). Terminals (literal elements
such as keywords and delimiters) are enclosed in double or single quotes.
These EBNF rules are used for both lexical and syntactical elements, but note
that the former are concerned with entities formed from single characters and
thus tokens are meant to be typed exactly as written, whereas the latter deal
with larger syntactical structures where whitespace between tokens is
generally insignificant.
The Pure interpreter is invoked as follows:
pure [options ...] [script ...] [-- args ...]
pure [options ...] -x script [args ...]
Use pure -h to get help about the command line options. As already
mentioned, just the pure command without any command line parameters
invokes the interpreter in interactive mode, see Running Interactively
below for details. Some other important ways to invoke the interpreter are
summarized below.
- pure -g
- Runs the interpreter interactively, with debugging support.
- pure script ...
- Runs the given scripts in batch mode.
- pure -i script ...
- Runs the given scripts in batch mode as above, but then enters the
interactive command loop. (Add -g to also get debugging support,
and -q to suppress the sign-on message.)
- pure -x script [arg ...]
- Runs the given script with the given parameters. The script name and command
line arguments are available in the global argv variable.
- pure -c script [-o prog]
- Batch compilation: Runs the given script, compiling it to a native
executable prog (a.out by default).
Depending on your local setup, there may be additional ways to run the Pure
interpreter. In particular, if you have Emacs Pure mode installed, then you
can just open a script in Emacs and run it with the C-c C-c keyboard
command. For Emacs aficionados, this is probably the most convenient way to
execute a Pure script interactively in the interpreter.
The interpreter accepts various options which are described in more detail
below.
-
-c
Batch compilation.
-
--ctags
-
--etags
Create a tags file in ctags (vi) or etags (emacs) format.
-
--eager-jit
Enable eager JIT compilation. This requires LLVM 2.7 or later, otherwise
this flag will be ignored.
-
-fPIC
-
-fpic
Create position-independent code (batch compilation).
-
-g
Enable symbolic debugging.
-
-h, --help
Print help message and exit.
-
-i
Force interactive mode (read commands from stdin).
-
-I directory
Add a directory to be searched for included source scripts.
-
-L directory
Add a directory to be searched for dynamic libraries.
-
-l libname
Library to be linked in batch compilation.
-
--noediting
Disable command-line editing.
-
-n, --noprelude
Do not load the prelude.
-
--norc
Do not run the interactive startup files.
-
-o filename
Output filename for batch compilation.
-
-q
Quiet startup (suppresses sign-on message in interactive mode).
-
-T filename
Tags file to be written by --ctags or --etags.
-
-u
Do not strip unused functions in batch compilation.
-
-v[level]
Set verbosity level. See below for details.
-
--version
Print version information and exit.
-
-w
Enable compiler warnings.
-
-x
Execute script with given command line arguments.
-
--
Stop option processing and pass the remaining command line arguments in
the argv variable.
(Besides these, the interpreter also understands a number of other command
line switches for setting various code generation options; please see Code
Generation Options below for details.)
If any source scripts are specified on the command line, they are loaded and
executed, after which the interpreter exits. Otherwise the interpreter enters
the interactive read-eval-print loop, see Running Interactively below. You
can also use the -i option to enter the interactive loop (continue
reading from stdin) even after processing some source scripts.
Options and source files are processed in the order in which they are given on
the command line. Processing of options and source files ends when either the
-- or the -x option is encountered. The -x
option must be followed by the name of a script to be executed, which becomes
the “main script” of the application. In either case, any remaining parameters
are passed to the executing script by means of the global argc and
argv variables, denoting the number of arguments and the list of the
actual parameter strings, respectively. In the case of -x this also
includes the script name as argv!0. The -x option is useful, in
particular, to turn Pure scripts into executable programs by including a
“shebang” like the following as the first line in your main script. (This
trick only works with Unix shells, though.)
On startup, the interpreter also defines the version variable, which is
set to the version string of the Pure interpreter, and the sysinfo
variable, which provides a string identifying the host system. These are
useful if parts of your script depend on the particular version of the
interpreter and the system it runs on. (Moreover, Pure 0.21 and later also
define the variable compiling which indicates whether the program is
executed in a batch compilation, see Compiling Scripts below.)
If available, the prelude script prelude.pure is loaded by the interpreter
prior to any other definitions, unless the -n or
--noprelude option is specified. The prelude is searched for in the
directory specified with the PURELIB environment variable. If the
PURELIB variable is not set, a system-specific default is
used. Relative pathnames of other source scripts specified on the command line
are interpreted relative to the current working directory. In addition, the
executed program may load other scripts and libraries via a using
declaration in the source, which are searched for in a number of locations,
including the directories named with the -I and -L
options; see the Declarations and C Interface sections for details.
The interpreter compiles scripts, as well as definitions that you enter
interactively, automatically. This is done in an incremental fashion, as the
code is needed, and is therefore known as JIT (just in time) compilation.
Thus the interpreter never really “interprets” the source program or some
intermediate representation, it just acts as a frontend to the compiler,
taking care of compiling source code to native machine code before it gets
executed.
Pure’s LLVM backend does “lazy JIT compilation” by default, meaning that each
function (global or local) is compiled no sooner than it is run for the first
time. With the --eager-jit option, however, it will also compile all
other (global or local) functions that may be called by the compiled
function. (The PURE_EAGER_JIT environment variable, when set to any
value, has the same effect, so that you do not have to specify the
--eager-jit option each time you run the interpreter.) Eager JIT
compilation may be more efficient in some cases (since bigger chunks of
compilation work can be done in one go) and less efficient in others (e.g.,
eager JITing may compile large chunks of code which aren’t actually called
later, except in rare circumstances).
Note that the eager JIT mode is only available with LLVM 2.7 or later;
otherwise this option will be ignored.
It is also possible to compile your scripts to native code beforehand, using
the -c batch compilation option. This options forces the interpreter
to non-interactive mode (unless -i is specified as well, which
overrides -c). Any scripts specified on the command line are then
executed as usual, but after execution the interpreter takes a snapshot of the
program and compiles it to one of several supported output formats, LLVM
assembler (.ll) or bitcode (.bc), native assembler (.s) or object (.o), or a
native executable, depending on the output filename specified with
-o. If the output filename ends in the .ll extension, an LLVM
assembler file is created which can then be processed with the LLVM
toolchain. If the output filename is just ‘-‘, the assembler file is written
to standard output, which is useful if you want to pass the generated code to
the LLVM tools in a pipeline. If the output filename ends in the .bc
extension, an LLVM bitcode file is created instead.
The .ll and .bc formats are supported natively by the Pure interpreter, no
external tools are required to generate these. If the target is an .s, .o or
executable file, the Pure interpreter creates a temporary bitcode file on
which it invokes the LLVM tools opt and llc to create a
native assembler file, and then uses gcc to assemble and link the
resulting program (if requested). You can also specify additional libraries to
be linked into the executable with the -l option. If the output
filename is omitted, it defaults to a.out (a.exe on Windows).
The -c option provides a convenient way to quickly turn a Pure
script into a standalone executable which can be invoked directly from the
shell. One advantage of compiling your script is that this eliminates the JIT
compilation time and thus considerably reduces the startup time of the
program. Another reason to prefer a standalone executable is that it lets you
deploy the program on systems without a full Pure installation (usually only
the runtime library is required on the target system). On the other hand,
compiled scripts also have some limitations, mostly concerning the use of the
built-in eval function. Please see the Batch Compilation section
for details.
The -v64 (or -v0100) verbosity option can be used to have the
interpreter print the commands it executes during compilation, see Verbosity
and Debugging Options below. When creating an object file, this also prints
the suggested linker command (including all the dynamic modules loaded by the
script, which also have to be linked in to create a working executable), to
which you only have to add the options describing the desired output file.
Pure programs often have declarations and definitions of global symbols
scattered out over many different source files. The --ctags and
--etags options let you create a tags file which allows you to
quickly locate these items in text editors such as vi and
emacs which support this feature.
If --ctags or --etags is specified, the interpreter enters
a special mode in which it only parses source files without executing them and
collects information about the locations of global symbol declarations and
definitions. The collected information is then written to a tags file in the
ctags or etags format used by vi and emacs,
respectively. The desired name of the tags file can be specified with the
-T option; it defaults to tags for --ctags and TAGS for
--etags (which matches the default tags file names used by
vi and emacs, respectively).
The tags file contains information about the global constant, variable, macro,
function and operator symbols of all scripts specified on the command line, as
well as the prelude and other scripts included via a using clause.
Tagged scripts which are located in the same directory as the tags file (or,
recursively, in one of its subdirectories) are specified using relative
pathnames, while scripts outside this hierarchy (such as included scripts from
the standard library) are denoted with absolute pathnames. This scheme makes
it possible to move an entire directory together with its tags file and have
the tags information still work in the new location.
If the interpreter runs in interactive mode, it repeatedly prompts you for
input (which may be any legal Pure code or some special interpreter commands
provided for interactive usage), and prints computed results. This is also
known as the read-eval-print loop and is described in much more detail in
the Interactive Usage section. To exit the interpreter, just type the
quit command or the end-of-file character (Ctrl-d on Unix) at the
beginning of the command line.
The interpreter may also source a few additional interactive startup files
immediately before entering the interactive loop, unless the --norc
option is specified. First .purerc in the user’s home directory is read, then
.purerc in the current working directory. These are ordinary Pure scripts
which can be used to provide additional definitions for interactive
usage. Finally, a .pure file in the current directory (containing a dump from
a previous interactive session) is loaded if it is present.
When the interpreter is in interactive mode and reads from a tty, unless the
--noediting option is specified, commands are usually read using
readline or some compatible replacement, providing completion for
all commands listed under Interactive Usage, as well as for symbols defined
in the running program. When exiting the interpreter, the command history is
stored in ~/.pure_history, from where it is restored the next time you run the
interpreter.
The interpreter also provides a simple source level debugger when run in
interactive mode, see Debugging for details. To enable the debugger, you need
to specify the -g option when invoking the interpreter. This option
causes your script to run much slower, so you should only use this option if
you want to run the debugger.
The -v option is useful for debugging the interpreter, or if you are
interested in the code your program gets compiled to. The level argument is
optional; it defaults to 1. Seven different levels are implemented at this
time (one more bit is reserved for future extensions). Only the first two
levels will be useful for the average Pure programmer; the remaining levels
are mostly intended for maintenance purposes.
- 1 (0x1, 001)
- denotes echoing of parsed definitions and expressions.
- 2 (0x2, 002)
- adds special annotations concerning local bindings (de Bruijn indices,
subterm paths; this can be helpful to debug tricky variable binding
issues).
- 4 (0x4, 004)
- adds descriptions of the matching automata for the left-hand sides of
equations (you probably want to see this only when working on the guts of
the interpreter).
- 8 (0x8, 010)
- dumps the “real” output code (LLVM assembler, which is as close to the
native machine code for your program as it gets; you definitely don’t want
to see this unless you have to inspect the generated code for bugs or
performance issues).
- 16 (0x10, 020)
- adds debugging messages from the bison(1) parser; useful for debugging the
parser.
- 32 (0x20, 040)
- adds debugging messages from the flex(1) lexer; useful for debugging the
lexer.
- 64 (0x40, 0100)
- turns on verbose batch compilation; this is useful if you want to see
exactly which commands get executed during batch compilation
(-c).
These values can be or’ed together, and, for convenience, can be specified in
either decimal, hexadecimal or octal. Thus 0xff or 0777 always gives you full
debugging output (which isn’t likely to be used by anyone but the Pure
developers). Some useful flag combinations for experts are (in octal) 007
(echo definitions along with de Bruijn indices and matching automata), 011
(definitions and assembler code) and 021 (parser debugging output along with
parsed definitions).
Note that the -v option is only applied after the prelude has been
loaded. If you want to debug the prelude, use the -n option and
specify the prelude.pure file explicitly on the command line. Verbose output
is also suppressed for modules imported through a using clause. As
a remedy, you can use the interactive show command (see the Interactive
Usage section) to list definitions along with additional debugging
information.
The -w option enables some additional warnings which are useful to
check your scripts for possible errors. Right now it will report implicit
declarations of function symbols which might indicate missing or mistyped
symbols that need to be fixed, see Symbol Lookup and Creation for details.
Besides the options listed above, the interpreter also understands some
additional command line switches and corresponding environment variables to
control various code generation options. The options take the form --opt
and --noopt, respectively, where opt denotes the option name (see
below for a list of supported options). By default, these options are all
enabled; --noopt disables the option, --opt reenables it. In addition,
for each option opt there is also a corresponding environment variable
PURE_NOOPT (with the option name in uppercase) which, when set, disables
the option by default. (Setting this variable to any value will do, the
interpreter only checks whether the variable exists in the environment.)
For instance, the checks option controls stack and signal checks. Thus
--nochecks on the command line disables the option, and setting the
PURE_NOCHECKS environment variable makes this the default, in which case
you can use --checks on the command line to reenable the option.
Each code generation option can also be used as a pragma (compiler
directive) in source code so that you can control it on a per-rule basis. The
pragma must be on a line by itself, starting in column 1, and takes the
following form (using --nochecks as an example):
#! --nochecks // line-oriented comment may go here
Currently, the following code generation options are recognized:
-
--checks
-
--nochecks
Enable or disable various extra stack and signal checks. By default, the
interpreter checks for stack overflows (if the PURE_STACK
environment variable is set) and pending signals on entry to every
function, see Stack Size and Tail Recursion and Handling of
Asynchronous Signals for details. This is needed to catch these
conditions in a reliable way, so we recommend to leave this enabled.
However, these checks also make programs run a little slower (typically
some 5%, YMMV). If performance is critical then you can disable the checks
with the --nochecks option. (Even then, a minimal amount of
checking will be done, usually on entry to every global function.)
-
--const
-
--noconst
Enable or disable the precomputing of constant values in batch compilation
(cf. Compiling Scripts). If enabled (which is the default), the values
of constants in const definitions are precomputed at compile
time (if possible) and then stored in the generated executable. This
usually yields faster startup times but bigger executables. You can disable
this option with --noconst to get smaller executables at the
expense of slower startup times. Please see the Batch Compilation
section for an example.
-
--fold
-
--nofold
Enable or disable constant folding in the compiler frontend. This means
that constant expressions involving int and double values and the usual
arithmetic and logical operations on these are precomputed at compile
time. (This is mostly for cosmetic purposes; the LLVM backend will perform
this optimization anyway when generating machine code.) For instance:
> foo x = 2*3*x;
> show foo
foo x = 6*x;
Disabling constant folding in the frontend causes constant expressions to
be shown as you entered them:
> #! --nofold
> bar x = 2*3*x;
> show bar
bar x = 2*3*x;
-
--tc
-
--notc
Enable or disable tail call optimization (TCO). TCO is needed to make
tail-recursive functions execute in constant stack space, so we recommend
to leave this enabled. However, at the time of this writing LLVM’s TCO
support is still bug-ridden on some platforms, so the --notc
option allows you to disable it. (Note that TCO can also be disabled when
compiling the Pure interpreter, in which case these options have no effect;
see the installation documentation for details.)
Besides these, there are the following special pragmas affecting the code
generation of some given function, which is specified in the pragma. These
pragmas can only be used in source code, there are no command line options for
them.
-
--eager fun
Instruct the interpreter to JIT-compile the given function eagerly. This
means that native code will be created for the function, as well as all
other (global or local) functions that may be called by the compiled
function, as soon as the function gets recompiled. This avoids the hiccups
you get when a function is compiled on the fly if it is run for the first
time, which is particularly useful for functions which are to be run in
realtime (typically in multimedia applications). Please note that, in
difference to the --eager-jit option, this feature is available
for all LLVM versions (it doesn’t require LLVM 2.7 or later).
-
--required fun
Inform the batch compiler (cf. Compiling Scripts) that the given
function symbol fun should never be stripped from the program. This is
useful, e.g., if a function is never called explicitly but only through
eval. Adding a --required pragma for the function then
makes sure that the function is always linked into the program. Please see
the Batch Compilation section for an example.
The interpreter may source various files during its startup. These are:
-
~/.pure_history
Interactive command history.
-
~/.purerc, .purerc, .pure
Interactive startup files. The latter is usually a dump from a previous
interactive session.
-
prelude.pure
Standard prelude. If available, this script is loaded before any other
definitions, unless -n was specified.
Various aspects of the interpreter can be configured through the following
shell environment variables:
-
BROWSER
If the PURE_HELP variable is not set (see below), this specifies
a colon-separated list of browsers to try for reading the online
documentation. See http://catb.org/~esr/BROWSER/.
-
PURELIB
Directory to search for library scripts, including the prelude. If
PURELIB is not set, it defaults to some location specified at
installation time.
-
PURE_EAGER_JIT
Enable eager JIT compilation (same as --eager-jit), see
Compiling Scripts for details.
-
PURE_HELP
Command used to browse the Pure manual. This must be a browser capable of
displaying html files. Default is w3m.
-
PURE_INCLUDE
Additional directories (in colon-separated format) to be searched for
included scripts.
-
PURE_LIBRARY
Additional directories (in colon-separated format) to be searched for
dynamic libraries.
-
PURE_MORE
Shell command to be used for paging through output of the show command,
when the interpreter runs in interactive mode. PURE_LESS does the
same for evaluation results printed by the interpreter.
-
PURE_PS
Command prompt used in the interactive command loop (“> ” by default).
-
PURE_STACK
Maximum stack size in kilobytes (default: 0 = unlimited).
Besides these, the interpreter also understands a number of other environment
variables for setting various code generation options (see Code Generation
Options above) and commands to invoke different LLVM compilers on inline
code (see Inline Code).
Pure is a fairly simple yet powerful language. Programs are basically
collections of term rewriting rules, which are used to reduce expressions to
normal form in a symbolic fashion. For convenience, Pure also offers some
extensions to the basic term rewriting calculus, like global variables and
constants, nested scopes of local function and variable definitions, anonymous
functions (lambdas), exception handling and a built-in macro facility. These
are all described below and in the following sections.
Most basic operations are defined in the standard prelude.
This includes the usual arithmetic and logical operations, as well as the
basic string, list and matrix functions. The prelude is always loaded by the
interpreter, so that you can start using the interpreter as a sophisticated
kind of desktop calculator right away. Other useful operations are provided
through separate library modules. Some of these, like the system interface and
the container data structures, are distributed with the interpreter, others
are available as separate add-on packages from the Pure website. A (very)
brief overview of some of the modules distributed with the Pure interpreter
can be found in the Standard Library section.
Here’s a first example which demonstrates how to define a simple recursive
function in Pure, entered interactively in the interpreter (note that the ‘> ‘
symbol at the beginning of each input line is the interpreter’s default
command prompt):
> // my first Pure example
> fact 0 = 1;
> fact n::int = n*fact (n-1) if n>0;
> let x = fact 10; x;
3628800
Pure is a free-format language, i.e., whitespace is insignificant (unless it
is used to delimit other symbols). Thus, in contrast to “layout-based”
languages like Haskell, you must use the proper delimiters (;) and
keywords (end) to terminate definitions and block structures. In
particular, as shown in the example above, definitions and expressions at the
toplevel have to be terminated with a semicolon, even in interactive mode.
Comments use the same syntax as in C++: // for line-oriented, and
/* ... */ for multiline comments. The latter must not be nested. Lines
beginning with #! are treated as comments, too; as already discussed
above, on Unix-like systems this allows you to add a “shebang” to your main
script in order to turn it into an executable program.
A few ASCII symbols are reserved for special uses, namely the semicolon, the
“at” symbol @, the equals sign =, the backslash \, the Unix pipe
symbol |, parentheses (), brackets [] and curly braces {}.
(Among these, only the semicolon is a “hard delimiter” which is always a
lexeme by itself; the other symbols can be used inside operator symbols.)
Moreover, there are some keywords which cannot be used as identifiers:
case const def else end extern if
infix infixl infixr let namespace nonfix of
otherwise outfix postfix prefix private public then
using when with
Pure fully supports the Unicode character set or, more precisely, UTF-8.
This is an ASCII extension capable of representing all Unicode characters,
which provides you with thousands of characters from most of the languages of
the world, as well as an abundance of special symbols for almost any purpose.
If your text editor supports the UTF-8 encoding (most editors do nowadays),
you can use all Unicode characters in your Pure programs, not only inside
strings, but also for denoting identifiers and special operator and constant
symbols.
The customary notations for identifiers, numbers and strings are all
provided. In addition, Pure also allows you to define your own operator
symbols. Identifiers and other symbols are described by the following grammar
rules in EBNF format:
symbol ::= identifier | special
identifier ::= letter (letter | digit)*
special ::= punct+
letter ::= "A"|...|"Z"|"a"|...|"z"|"_"|...
digit ::= "0"|...|"9"
punct ::= "!"|"#"|"$"|"%"|"&"|...
Pure uses the following rules to distinguish “punctuation” (which may only
occur in declared operator and constant symbols) and “letters” (identifier
constituents). In addition to the punctuation symbols in the 7 bit ASCII
range, the following code points in the Unicode repertoire are considered as
punctuation: U+00A1 through U+00BF, U+00D7, U+00F7, and U+20D0 through
U+2BFF. This comprises the special symbols in the Latin-1 repertoire, as well
as the Combining Diacritical Marks for Symbols, Letterlike Symbols, Number
Forms, Arrows, Mathematical Symbols, Miscellaneous Technical Symbols, Control
Pictures, OCR, Enclosed Alphanumerics, Box Drawing, Blocks, Geometric Shapes,
Miscellaneous Symbols, Dingbats, Miscellaneous Mathematical Symbols A,
Supplemental Arrows A, Supplemental Arrows B, Miscellaneous Mathematical
Symbols B, Supplemental Mathematical Operators, and Miscellaneous Symbols and
Arrows. This should cover almost everything you’d ever want to use in an
operator symbol. All other extended Unicode characters are effectively treated
as “letters” which can be used as identifier constituents. (Charts of all
Unicode symbols can be found at the Code Charts page of the Unicode
Consortium.)
The following are examples of valid identifiers: foo, foo_bar,
FooBar, BAR, bar99. Case is significant in identifiers, so Bar
and bar are distinct identifiers, but otherwise the case of letters
carries no meaning. Special symbols consist entirely of punctuation, such as
::=. These may be used as operator symbols, but have to be declared before
they can be used (see Symbol Declarations).
Pure also has a notation for qualified symbols which carry a namespace prefix.
These take the following format (note that no whitespace is permitted between
the namespace prefix and the symbol):
qualified_symbol ::= [qualifier] symbol
qualified_identifier ::= [qualifier] identifier
qualifier ::= [identifier] "::" (identifier "::")*
Example: foo::bar.
Number literals come in three flavours: integers, bigints (denoted with an
L suffix) and floating point numbers (indicated by the presence of the
decimal point and/or a base 10 scaling factor). Integers and bigints may be
written in different bases (decimal, binary, octal and hexadecimal), while
floating point numbers are always denoted in decimal.
number ::= integer | integer "L" | float
integer ::= digit+
| "0" ("X"|"x") hex_digit+
| "0" ("B"|"b") bin_digit+
| "0" oct_digit+
oct_digit ::= "0"|...|"7"
hex_digit ::= "0"|...|"9"|"A"|...|"F"|"a"|...|"f"
bin_digit ::= "0"|"1"
float ::= digit+ ["." digit+] exponent
| digit* "." digit+ [exponent]
exponent ::= ("E"|"e") ["+"|"-"] digit+
Examples: 4711, 4711L, 1.2e-3. Numbers in different bases:
1000 (decimal), 0x3e8 (hexadecimal), 01750 (octal),
0b1111101000 (binary).
String literals are arbitrary sequences of characters enclosed in double
quotes, such as "Hello, world!".
string ::= '"' char* '"'
Special escape sequences may be used to denote double quotes and backslashes
(\", \\), control characters (\b, \f, \n, \r, \t,
these have the same meaning as in C), and arbitrary Unicode characters given
by their number or XML entity name (e.g., \169, \0xa9 and
\© all denote the Unicode copyright character, code point U+00A9). As
indicated, numeric escapes can be specified in any of the supported bases for
integer literals. For disambiguating purposes, these can also be enclosed in
parentheses. E.g., "\(123)4" is a string consisting of the character
\123 followed by the digit 4.
On the surface, Pure is quite similar to other modern functional languages
like Haskell and ML. But under the hood it is a much more dynamic language,
more akin to Lisp. In particular, Pure is dynamically typed, so functions can
be fully polymorphic and you can add to the definition of an existing function
at any time. For instance, we can extend our first example above to make the
fact function work with floating point numbers, too:
> fact 0.0 = 1.0;
> fact n::double = n*fact (n-1) if n>0;
> fact 10.0;
3628800.0
> fact 10;
3628800
Note the n::double construct on the left-hand side of the second
equation, which means that the equation is only to be applied for (double
precision) floating point values n. This construct is also called a “type
tag” in Pure parlance, which is actually a simple form of pattern matching
(see below). Similarly, our previous definition at the beginning of this
section employed the int tag to indicate that the n parameter is an
integer value. The int and double types are built into the Pure
language, but it is also possible to introduce your own type tags for
user-defined data structures. This will be explained in more detail under
Type Tags in the Rule Syntax section below.
Expressions are generally evaluated from left to right, innermost expressions
first, i.e., using call by value semantics. Pure also has a few built-in
special forms (most notably, conditional expressions, the short-circuit
logical connectives && and ||, the sequencing
operator $$, the lazy evaluation operator &, and the
quote) which take some or all of their arguments unevaluated, using
call by name.
Like in Haskell and ML, functions are often defined by pattern matching, i.e.,
the left-hand side of a definition is compared to the target expression,
binding the variables in the pattern to their actual values accordingly:
> foo (bar x) = x-1;
> foo (bar 99);
98
Due to its term rewriting semantics, Pure goes beyond most other functional
languages in that it can do symbolic evaluations just as well as “normal”
computations:
> square x = x*x;
> square 4;
16
> square (a+b);
(a+b)*(a+b)
In fact, leaving aside the built-in support for some common data structures
such as numbers and strings, all the Pure interpreter really does is evaluate
expressions in a symbolic fashion, rewriting expressions using the equations
supplied by the programmer, until no more equations are applicable. The result
of this process is called a normal form which represents the “value” of the
original expression. Keeping with the tradition of term rewriting, there’s no
distinction between “defined” and “constructor” function symbols in Pure.
Consequently, any function symbol or operator can be used anywhere on the
left-hand side of an equation, and may act as a constructor symbol if it
happens to occur in a normal form term. This enables you to work with
algebraic rules like associativity and distributivity in a direct fashion:
> (x+y)*z = x*z+y*z; x*(y+z) = x*y+x*z;
> x*(y*z) = (x*y)*z; x+(y+z) = (x+y)+z;
> square (a+b);
a*a+a*b+b*a+b*b
Here’s another basic symbolic algebra example, which lets you compute the
disjunctive normal form of logical expressions:
// eliminate double negations:
~~a = a;
// de Morgan's laws:
~(a || b) = ~a && ~b;
~(a && b) = ~a || ~b;
// distributivity:
a && (b || c) = a && b || a && c;
(a || b) && c = a && c || b && c;
// associativity:
(a && b) && c = a && (b && c);
(a || b) || c = a || (b || c);
Example:
> a || ~(b || (c && ~d));
a||~b&&~c||~b&&d
Note that the above isn’t possible in languages like Haskell and ML which
always enforce the so-called “constructor discipline”, which stipulates that
only pure constructor symbols (without any defining equations) may occur as a
subterm on the left-hand side of a definition. Thus equational definitions
like the above are forbidden in these languages. It’s possible to work around
this, but only at the cost of an extra layer of interpretation, which treats
the expressions to be evaluated as data manipulated by an evaluation function.
In Pure this extra layer is not necessary, you can just add equations like the
above to your Pure program. In addition, you can also reduce an expression in
a local context of algebraic equations specified in a with clause.
This can be done with the reduce macro defined in the prelude:
expand = reduce with
(a+b)*c = a*c+b*c;
a*(b+c) = a*b+a*c;
end;
factor = reduce with
a*c+b*c = (a+b)*c;
a*b+a*c = a*(b+c);
end;
Example:
> expand ((a+b)*2);
a*2+b*2
> factor (a*2+b*2);
(a+b)*2
Taking a look at the above examples, you might have been wondering how the
Pure interpreter figures out what the parameters (a.k.a. “variables”) in an
equation are. This is quite obvious in rules involving just variables and
special operator symbols, such as (x+y)*z = x*z+y*z. However, what about
an equation like foo (foo bar) = bar? Since most of the time we don’t
declare any symbols in Pure, how does the interpreter know that foo is a
literal function symbol here, while bar is a variable?
The answer is that the interpreter considers the different positions in the
left-hand side expression of an equation. Basically, a Pure expression is just
a tree formed by applying expressions to other expressions, with the atomic
subexpressions like numbers and symbols at the leaves of the tree. (This is
true even for infix expressions like x+y, since in Pure these are always
equivalent to a function application of the form (+) x y which has the
atomic subterms (+), x and y at its leaves.)
Now the interpreter divides the leaves of the expression tree into “head” (or
“function”) and “parameter” (or “variable”) positions based on which leaves
are leftmost in a function application or not. Thus, in an expression like f
x y z, f is in the head or function position, while x, y and
z are in parameter or variable positions. (Note that in an infix
expression like x+y, (+) is the head symbol, not x, as the
expression is really parsed as (+) x y, see above.)
Identifiers in head positions are taken as literal function symbols by the
interpreter, while identifiers in variable positions denote, well,
variables. We also refer to this convention as the head = function rule. It
is quite intuitive and lets us get away without declaring the variables in
equations. (There are some corner cases not covered here, however. In
particular, Pure allows you to declare special constant symbols, if you need a
symbol to be recognized as a literal even if it occurs in a variable
position. This is done by means of a nonfix declaration, see
Symbol Declarations for details.)
Like in other functional languages, expressions are the central ingredient of
all Pure programs. All computation performed by a Pure program consists in the
evaluation of expressions, and expressions also form the building blocks of
the equational rules which are used to define the constants, variables,
functions and macros of a Pure program.
Pure’s expression syntax can be summarized in the following grammar rules:
expr ::= "\" prim_expr+ "->" expr
| "case" expr "of" rules "end"
| expr "when" simple_rules "end"
| expr "with" rules "end"
| "if" expr "then" expr "else" expr
| simple_expr
simple_expr ::= simple_expr op simple_expr
| op simple_expr
| simple_expr op
| application
application ::= application prim_expr
| prim_expr
rules ::= rule (";" rule)* [";"]
simple_rules ::= simple_rule (";" simple_rule)* [";"]
(Note that the rule and simple_rule elements are part of the
definition syntax, which is explained in the Rule Syntax section.)
prim_expr ::= qualified_symbol
| number
| string
| "(" op ")"
| "(" left_op right_op ")"
| "(" simple_expr op ")"
| "(" op simple_expr ")"
| "(" expr ")"
| left_op expr right_op
| "[" exprs "]"
| "{" exprs (";" exprs)* [";"] "}"
| "[" expr "|" simple_rules "]"
| "{" expr "|" simple_rules "}"
exprs ::= expr ("," expr)*
op ::= qualified_symbol
left_op ::= qualified_symbol
right_op ::= qualified_symbol
Typical examples of the different expression types are summarized in the
following table. Note that lambdas bind most weakly, followed by the special
case, when and with constructs, followed by
conditional expressions (if-then-else),
followed by the simple expressions. Operators are a part of the simple
expression syntax, and are parsed according to their declared precedences and
associativities (cf. Symbol Declarations). Function application binds
stronger than all operators. Parentheses can be used to group expressions and
override default precedences as usual.
Type |
Example |
Description |
Lambda |
\x->x+1 |
anonymous function |
Block |
case x of y = z; ... end |
pattern-matching conditional |
|
x when y = z; ... end |
local variable definition |
|
x with f y = z; ... end |
local function definition |
Conditional |
if x then y else z |
conditional expression |
Simple |
x+y, -x, x mod y |
operator application |
|
sin x, max a b |
function application |
Primary |
4711, 1.2e-3 |
number |
|
"Hello, world!\n" |
string |
|
foo, x, (+) |
function or variable symbol |
|
[1,2,3], {1,2;3,4} |
list and matrix |
|
[x,-y | x=1..n; y=1..m; x<y] |
list comprehension |
|
{i==j | i=1..n; j=1..m} |
matrix comprehension |
The Pure language provides built-in support for machine integers (32 bit),
bigints (implemented using GMP), floating point values (double precision IEEE
754) and character strings (UTF-8 encoded). These can all be denoted using the
corresponding literals described in Lexical Matters. Truth values are
encoded as machine integers; as you might expect, zero denotes false and any
non-zero value true, and the prelude also provides symbolic constants
false and true to denote these. Pure also supports generic C
pointers, but these don’t have a syntactic representation in Pure, except that
the predefined constant NULL may be used to denote a generic null
pointer; other pointer values need to be created with external C functions.
Finally, Pure also provides some built-in support for compound primaries in
the form of lists and matrices, although most of the corresponding operations
are actually defined in the prelude.
Together, these “atomic” types of expressions make up Pure’s primary
expression syntax. Here is a brief rundown of the primary expression types.
- Numbers: 4711, 4711L, 1.2e-3
- The usual C notations for integers (decimal: 1000, hexadecimal:
0x3e8, octal: 01750) and floating point values are all provided.
Integers can also be denoted in base 2 by using the 0b or 0B
prefix: 0b1111101000. Integer constants that are too large to fit into
machine integers are promoted to bigints automatically. Moreover, integer
literals immediately followed by the uppercase letter L are always
interpreted as bigint constants, even if they fit into machine integers.
This notation is also used when printing bigint constants, to distinguish
them from machine integers.
- Strings: "Hello, world!\n"
- String constants are double-quoted and terminated with a null character,
like in C. In contrast to C, strings are always encoded in UTF-8, and
character escapes in Pure strings have a more flexible syntax (borrowed
from the author’s Q language) which provides notations to specify any
Unicode character. Please refer to Lexical Matters for details.
- Function and variable symbols: foo, foo_bar, BAR, foo::bar
- These consist of the usual sequence of letters (including the underscore)
and digits, starting with a letter. Case is significant, thus foo,
Foo and FOO are distinct identifiers. The ‘_‘ symbol, when
occurring on the left-hand side of an equation, is special; it denotes the
anonymous variable which matches any value without actually binding a
variable. Identifiers can also be prefixed with a namespace identifier,
like in foo::bar. (This requires that the given namespace has already
been created, as explained under Namespaces in the Declarations section.)
- Operator and constant symbols: +, ==, not
For convenience, Pure also provides you with a limited means to extend the
syntax of the language with special operator and constant symbols by means
of a corresponding fixity declaration, as discussed in section Symbol
Declarations. Besides the usual infix, prefix and postfix operators,
Pure also provides outfix (bracket) and nonfix (constant) symbols. (Nonfix
symbols actually work more or less like ordinary identifiers, but the
nonfix attribute tells the compiler that when such a symbol
occurs on the left-hand side of an equation, it is always to be
interpreted as a literal constant, cf. Variables in Equations.)
Operator and constant symbols may take the form of an identifier or a
sequence of punctuation characters. They must always be declared before
use. Once declared, they are always special, and can’t be used as ordinary
identifiers any more. However, like in Haskell, by enclosing an operator
in parentheses, such as (+) or (not), you can turn it into an
ordinary function symbol. Also, operators and constant symbols can be
qualified with a namespace just like normal identifiers.
Note
The common operator symbols like +, -, *, /
etc. are all declared at the beginning of the prelude, see the
Pure Library Manual for a list of these. Arithmetic and relational operators
mostly follow C conventions. However, out of necessity (!, &
and | are used for other purposes in Pure) the logical and bitwise
operations, as well as the negated equality predicates are named a bit
differently: ~, && and || denote logical negation,
conjunction and disjunction, while the corresponding bitwise operations
are named not, and and or. Moreover, following these
conventions, inequality is denoted ~=. Also note that && and
|| are special forms which are evaluated in short-circuit mode (see
Special Forms below), whereas the bitwise connectives receive their
arguments using call-by-value, just like the other arithmetic
operations.
- Lists: [x,y,z], x:xs, x..y, x:y..z
Pure’s basic list syntax is the same as in Haskell, thus [] is the
empty list and x:xs denotes a list with head element x and tail
list xs. The infix constructor symbol ‘:‘ is declared in
the prelude. The usual syntactic sugar for list values in brackets is
provided, thus [x,y,z] is exactly the same as x:y:z:[].
There’s also a way to denote arithmetic sequences such as 1..5, which
denotes the list [1,2,3,4,5]. Haskell users should note the missing
brackets. In contrast to Haskell, Pure doesn’t use any special syntax for
arithmetic sequences, the ‘..‘ symbol is just an ordinary
infix operator declared and defined in the prelude. Sequences with
arbitrary stepsizes can be written by denoting the first two sequence
elements using the ‘:‘ operator, as in 1.0:1.2..3.0. To
prevent unwanted artifacts due to rounding errors, the upper bound in a
floating point sequence is always rounded to the nearest grid point. Thus,
e.g., 0.0:0.1..0.29 actually yields [0.0,0.1,0.2,0.3], as does
0.0:0.1..0.31.
- Tuples: x,y,z
Pure’s tuples are a bit unusual: They are constructed by just “pairing”
things using the ‘,‘ operator, for which the empty tuple
() acts as a neutral element (i.e., (),x is just x, as is
x,()). Pairs always associate to the right, meaning that x,y,z ==
x,(y,z) == (x,y),z, where x,(y,z) is the normalized representation.
This implies that tuples are always flat, i.e., there are no nested tuples
(tuples of tuples); if you need such constructs then you should use lists
instead.
Note
Syntactically, tuples aren’t really primary expressions, but we
still include them here because they are closely related to lists which
are also defined in the prelude. Also, tuples are often used as a
simpler replacement for lists, in particular in function arguments and
return values, when no elaborate hierarchical structure is needed.
Also note that parentheses are generally only used to group expressions
and are not part of the tuple syntax in Pure, although they will be
needed to include a tuple in a list or matrix. E.g.,
[(1,2),3,(4,5)] is a three element list consisting of the tuple
1,2, the integer 3, and another tuple 4,5. Likewise,
[(1,2,3)] is a list with a single element, the tuple 1,2,3.
- Matrices: {1.0,2.0,3.0}, {1,2;3,4}, {1L,y+1;foo,bar}
Pure also offers matrices, a kind of two-dimensional arrays, as a built-in
data structure which provides efficient storage and element access. These
work more or less like their Octave/MATLAB equivalents, but using curly
braces instead of brackets. As indicated, commas are used to separate the
columns of a matrix, semicolons for its rows. In fact, the {...}
construct is rather general and allows you to construct new matrices from
any collection of individual elements (“scalars”) and submatrices,
provided that all dimensions match up. Here, any expression which doesn’t
yield a matrix denotes a scalar, which is considered to be a 1x1 matrix
for the purpose of matrix construction. The comma arranges submatrices in
columns, while the semicolon arranges them in rows. So, if both x and
y are nxm matrices, then {x,y} becomes an n x
2*m matrix consisting of all the columns of x followed by all the
columns of y. Likewise, {x;y} becomes a 2*n x m matrix
(all the rows of x above of all rows of y). In addition, {...}
constructs can be nested to an arbitrary depth. Thus {{1;3},{2;4}} is
another way to write the 2x2 matrix {1,2;3,4} in a kind of
“column-major” format (however, internally all matrices are stored in C’s
row-major format).
Pure supports both numeric and symbolic matrices. The former are
homogeneous arrays of double, complex double or (machine) int matrices,
while the latter can contain any mixture of Pure expressions. Pure will
pick the appropriate type for the data at hand. If a matrix contains
values of different types, or Pure values which cannot be stored in a
numeric matrix, then a symbolic matrix is created instead (this also
includes the case of bigints, which are considered as symbolic values as
far as matrix construction is concerned). Numeric matrices use an internal
data layout that is fully compatible with the GNU Scientific Library
(GSL), and can readily be passed to GSL routines via the C interface. (The
Pure interpreter does not require GSL, however, so numeric matrices will
work even if GSL is not installed.)
More information about matrices and corresponding examples can be found in
the Examples section below.
Note
While the [...] and {...} constructs look superficially
similar, they work in very different ways. The former is just syntactic
sugar for a corresponding constructor term and can thus be used as a
pattern on the left-hand side of an equation, cf. Patterns. In
contrast, the latter is special syntax for a built-in operation which
creates objects of a special matrix type. Thus matrix expressions can
not be used as patterns (instead, matrix values can be matched as a
whole using the special matrix type tag).
- Comprehensions: [x,y | x=1..n; y=1..m; x<y], {f x | x=1..n}
Pure provides both list and matrix comprehensions as a convenient means to
construct list and matrix values from a “template” expression and one or
more “generator” and “filter” clauses. The former bind a pattern to values
drawn from a list or matrix, the latter are just predicates determining
which generated elements should actually be added to the result. Both list
and matrix comprehensions are in fact syntactic sugar for a combination of
nested lambdas, conditional expressions and “catmaps” (a collection of
operations which combine list or matrix construction and mapping a
function over a list or matrix, defined in the prelude), but they are
often much easier to write.
Matrix comprehensions work pretty much like list comprehensions, but
produce matrices instead of lists. List generators in matrix
comprehensions alternate between row and column generation so that most
common mathematical abbreviations carry over quite easily. Examples of
both kinds of comprehensions can be found in the Examples section below.
The rest of Pure’s expression syntax mostly revolves around the notion of
function applications. For convenience, Pure also allows you to declare pre-,
post-, out- and infix operator symbols, but these are in fact just syntactic
sugar for function applications; see Symbol Declarations for details.
Function and operator applications are used to combine primary expressions to
compound terms, also referred to as simple expressions; these are the data
elements which are manipulated by Pure programs.
As in other modern FPLs, function applications are written simply as
juxtaposition (i.e., in “curried” form) and associate to the left. This means
that in fact all functions only take a single argument. Multi-argument
functions are represented as chains of single-argument functions. For
instance, in f x y = (f x) y first the function f is applied to the
first argument x, yielding the function f x which in turn gets applied
to the second argument y. This makes it possible to derive new functions
from existing ones using partial applications which only specify some but
not all arguments of a function. For instance, taking the max function
from the prelude as an example, max 0 is the function which, for a given
x, returns x itself if it is nonnegative and zero otherwise. This
works because (max 0) x = max 0 x is the maximum of 0 and x.
Note
The major advantage of having curried function applications is that,
without any further ado, functions become first-class objects. That is,
they can be passed around freely both as parameters and as function return
values. Functions which take other functions as arguments and/or yield them
as results are also known as higher-order functions (HOFs). Much of the
power of functional programming languages stems from this feature, so the
treatment of functions as first-class values is generally considered as one
of the defining characteristics of functional languages.
Operator applications are written using prefix, postfix, outfix or infix
notation, as the declaration of the operator demands, but are just ordinary
function applications in disguise. As already mentioned, enclosing an operator
in parentheses turns it into an ordinary function symbol, thus x+y is
exactly the same as (+) x y. For convenience, partial applications of
infix operators can also be written using so-called operator sections. A
left section takes the form (x+) which is equivalent to the partial
application (+) x. A right section takes the form (+x) and is
equivalent to the term flip (+) x. (This uses the flip combinator
from the prelude which is defined as flip f x y = f y x.) Thus (x+) y
is equivalent to x+y, while (+x) y reduces to y+x. For instance,
(1/) denotes the reciprocal and (+1) the successor function. (Note
that, in contrast, (-x) always denotes an application of unary minus; the
section (+-x) can be used to indicate a function which subtracts x
from its argument.)
Some special notations are provided for conditional expressions as well as
anonymous functions (lambdas) and blocks of local function and variable
definitions.
- Conditional expressions: if x then y else z
- Evaluates to y or z depending on whether x is “true” (i.e., a
nonzero integer). An exception is raised if the condition is not an
integer.
- Lambdas: \x -> y
- These denote anonymous functions and work pretty much like in Haskell.
Pure supports multiple-argument lambdas (e.g, \x y -> x*y), as well as
pattern-matching lambda abstractions which match one or more patterns
against the lambda arguments, such as \(x,y) -> x*y. An exception is
raised if the actual lambda arguments do not match the given patterns.
- Case expressions: case x of rule; ... end
- Matches an expression, discriminating over a number of different cases,
similar to the Haskell case construct. The expression x is matched in
turn against each left-hand side pattern in the rule list, and the first
pattern which matches x gives the value of the entire expression, by
evaluating the corresponding right-hand side with the variables in the
pattern bound to their corresponding values. An exception is raised if the
target expression doesn’t match any of the patterns.
- When expressions: x when rule; ... end
- An alternative way to bind local variables by matching a collection of
subject terms against corresponding patterns, similar to Aardappel‘s
when construct. A single binding such as x when u = v end is
equivalent to case v of u = x end, but the former is often more
convenient to write. A when clause may contain multiple definitions,
which are processed from left to right, so that later definitions may
refer to the variables in earlier ones. This is exactly the same as
several nested single definitions, with the first binding being the
“outermost” one.
- With expressions: x with rule; ... end
- Defines local functions. Like Haskell’s where construct, but it can be
used anywhere inside an expression (just like Aardappel’s where, but
Pure uses the keyword with which better lines up with case and
when). Several functions can be defined in a single with clause,
and the definitions can be mutually recursive and consist of as many
equations as you want.
At the toplevel, a Pure program basically consists of rewriting rules (which
are used to define functions and macros), constant and variable definitions,
and expressions to be evaluated:
script ::= item*
item ::= "let" simple_rule ";"
| "const" simple_rule ";"
| "def" simple_rule ";"
| rule ";"
| expr ";"
(The syntax of the rule and simple_rule elements is
discussed in the Rule Syntax section below. Also, a few additional toplevel
elements are provided in the declaration syntax, see Declarations.)
-
lhs = rhs;
Rewriting rules always combine a left-hand side pattern (which must be a
simple expression) and a right-hand side (which can be any kind of Pure
expression described above). The same format is also used in
with, when and case expressions. In
toplevel rules, with and case expressions, this basic
form can also be augmented with a condition if guard tacked on to the
end of the rule, where guard is an integer expression which determines
whether the rule is applicable. Moreover, the keyword otherwise
may be used to denote an empty guard which is always true (this is
syntactic sugar to point out the “default” case of a definition; the
interpreter just treats this as a comment). Pure also provides some
abbreviations for factoring out common left-hand or right-hand sides in
collections of rules; see the Rule Syntax section for details.
-
def lhs = rhs;
A rule starting with the keyword def defines a macro
function. No guards or multiple left-hand and right-hand sides are
permitted here. Macro rules are used to preprocess expressions on the
right-hand side of other definitions at compile time, and are typically
employed to implement user-defined special forms and simple kinds of
optimization rules. See the Macros section below for details and examples.
-
let lhs = rhs;
Binds every variable in the left-hand side pattern to the corresponding
subterm of the right-hand side (after evaluating it). This works like a
when clause, but serves to bind global variables occurring free
on the right-hand side of other function and variable definitions.
-
const lhs = rhs;
An alternative form of let which defines constants rather than
variables. (These are not to be confused with nonfix symbols which simply
stand for themselves!) Like let, this construct binds the
variable symbols on the left-hand side to the corresponding values on the
right-hand side (after evaluation). The difference is that const
symbols can only be defined once, and thus their values do not change
during program execution. This also allows the compiler to apply some
special optimizations such as constant folding.
-
expr;
A singleton expression at the toplevel, terminated with a semicolon, simply
causes the given value to be evaluated (and the result to be printed, when
running in interactive mode).
A few remarks about the scope of identifiers and other symbols are in order
here. Like most modern functional languages, Pure uses lexical or static
binding for local functions and variables. What this means is that the binding
of a local name is completely determined at compile time by the surrounding
program text, and does not change as the program is being executed. In
particular, if a function returns another (anonymous or local) function, the
returned function captures the environment it was created in, i.e., it becomes
a (lexical) closure. For instance, the following function, when invoked with
a single argument x, returns another function which adds x to its
argument:
> foo x = bar with bar y = x+y end;
> let f = foo 99; f;
bar
> f 10, f 20;
109,119
This works the same no matter what other bindings of x may be in effect
when the closure is invoked:
> let x = 77; f 10, (f 20 when x = 88 end);
109,119
Global bindings of variable and function symbols work a bit differently,
though. Like many languages which are to be used interactively, Pure binds
global symbols dynamically, so that the bindings can be changed easily at
any time during an interactive session. This is mainly a convenience for
interactive usage, but works the same no matter whether the source code is
entered interactively or being read from a script, in order to ensure
consistent behaviour between interactive and batch mode operation.
So, for instance, you can easily bind a global variable to a new value by just
entering a corresponding let command:
> foo x = c*x;
> foo 99;
c*99
> let c = 2; foo 99;
198
> let c = 3; foo 99;
297
This works pretty much like global variables in imperative languages, but note
that in Pure the value of a global variable can only be changed with a
let command at the toplevel. Thus referential transparency is
unimpaired; while the value of a global variable may change between different
toplevel expressions, it will always take the same value in a single
evaluation.
Similarly, you can also add new equations to an existing function at any
time:
> fact 0 = 1;
> fact n::int = n*fact (n-1) if n>0;
> fact 10;
3628800
> fact 10.0;
fact 10.0
> fact 1.0 = 1.0;
> fact n::double = n*fact (n-1) if n>1;
> fact 10.0;
3628800.0
> fact 10;
3628800
(In interactive mode, it is even possible to completely erase a definition,
see section Interactive Usage for details.)
So, while the meaning of a local symbol never changes once its definition has
been processed, toplevel definitions may well evolve while the program is
being processed, and the interpreter will always use the latest definitions at
a given point in the source when an expression is evaluated. This means that,
even in a script file, you have to define all symbols needed in an evaluation
before entering the expression to be evaluated.
Basically, the same rule syntax is used in all kinds of global and local
definitions. However, some constructs (specifically, when,
let, const and def) use a restricted rule
syntax where no guards or multiple left-hand and right-hand sides are
permitted. The syntax of these elements is captured by the following grammar
rules:
rule ::= pattern ("|" pattern)* "=" expr [guard]
(";" "=" expr [guard])*
simple_rule ::= pattern = expr | expr
pattern ::= simple_expr
guard ::= "if" simple_expr
| "otherwise"
| guard "when" simple_rules "end"
| guard "with" rules "end"
When matching against a function or macro call, or the subject term in a
case expression, the rules are always considered in the order in
which they are written, and the first matching rule (whose guard evaluates to
a nonzero value, if applicable) is picked. (Again, the when
construct is treated differently, because each rule is actually a separate
definition.)
The left-hand side of a rule is a special kind of simple expression, called a
pattern. Patterns consist of function and operator applications as well as
any of the “atomic” expression types (symbols, numbers, strings and list
values). Not permitted are any of the special expression types (lambda,
case, when, with, conditional expressions, as
well as list and matrix comprehensions). For technical reasons, the current
implementation also forbids matrix values in patterns, but it is possible to
match a matrix value as a whole using the matrix type tag, see below.
As already mentioned, the ‘_‘ symbol is special in patterns; it denotes
the anonymous variable which matches an arbitrary value (independently for
all occurrences) without actually binding a variable. For instance:
This will match the application of foo to any combination of two
arguments (and just ignore the values of these arguments).
Constants in patterns must be matched literally. For instance:
This will only match an application of foo to the machine integer 0,
not 0.0 or 0L (even though these compare equal to 0 using the
‘==‘ operator).
In contrast to Haskell, patterns may contain repeated variables (other than
the anonymous variable), i.e., they may be non-linear. Thus rules like the
following are legal in Pure, and will only be matched if all occurrences of
the same variable in the left-hand side pattern are matched to the same
value:
> foo x x = x;
> foo 1 1;
1
> foo 1 2;
foo 1 2
Non-linear patterns are particularly useful for computer algebra where you
will frequently encounter rules such as the following:
> x*y+x*z = x*(y+z);
> a*(3*4)+a*5;
a*17
The notion of “sameness” employed here is that of syntactical identity, which
means that the matched subterms must be identical in structure and content.
The prelude provides syntactic equality as a function same and a
comparison predicate ‘===‘. Thus the above definition of foo
is roughly equivalent to the following:
It is important to note the differences between syntactic equality embodied by
same and ‘===‘, and the “semantic” equality operator
‘==‘. The former are always defined on all terms, whereas
‘==‘ is only available on data where it has been defined
explicitly, either in the prelude or by the programmer. Also note that
‘==‘ may assert that two terms are equal even if they are
syntactically different. Consider, e.g.:
This distinction is actually quite useful. It gives the programmer the
flexibility to define ‘==‘ in any way that he sees fit, which is
consistent with the way the other comparison operators like ‘<‘
and ‘>‘ are handled in Pure.
Patterns may also contain the following special elements which are not
permitted in right-hand side expressions:
- A Haskell-style “as” pattern of the form variable @ pattern
binds the given variable to the expression matched by the subpattern
pattern (in addition to the variables bound by pattern itself). This
is convenient if the value matched by the subpattern is to be used on the
right-hand side of an equation.
- A left-hand side variable (including the anonymous variable) may be followed
by a type tag of the form :: name, where name is either one
of the built-in type symbols int, bigint, double, string,
matrix, pointer, or an existing identifier denoting a custom
constructor symbol for a user-defined data type. The variable can then match
only values of the designated type. Thus, for instance, ‘x::int‘
only matches machine integers. See the Type Tags section below for
details.
To these ends, the expression syntax is augmented with the following grammar
rule (but note that this form of expression is in fact only allowed on the
left-hand side of a rule):
prim_expr ::= qualified_identifier
("::" qualified_identifier | "@" prim_expr)
As shown, both “as” patterns and type tags are primary expressions, and the
subpattern of an “as” pattern is a primary expression, too. Thus, if a
compound expression is to be used as the subpattern, it must be
parenthesized. For instance, the following function duplicates the head
element of a list:
Note that if you accidentally forget the parentheses around the subpattern
x:_, you still get a syntactically correct definition:
But this gets parsed as (foo xs@x):_ = x:xs, which is most certainly not
what you want. It is thus a good idea to just always enclose the subpattern
with parentheses in order to prevent such glitches.
Another potential pitfall is that the notation foo::bar is also used to
denote “qualified symbols” in Pure, cf. Namespaces. Usually this will be
resolved correctly, but if foo happens to also be a valid namespace then
most likely you’ll get an error message about an undeclared symbol. You can
always work around this by adding spaces around the ‘::‘ symbol, as in
foo :: bar. Spaces are never permitted in qualified symbols, so this makes
it clear that the construct denotes a type tag.
The most general type of rule, used in function definitions and
case expressions, consists of a left-hand side pattern, a
right-hand side expression and an optional guard. The left-hand side of a rule
can be omitted if it is the same as for the previous rule. This provides a
convenient means to write out a collection of equations for the same left-hand
side which discriminates over different conditions:
lhs = rhs if guard;
= rhs if guard;
...
= rhs otherwise;
For instance:
fact n = n*fact (n-1) if n>0;
= 1 otherwise;
Pure also allows a collection of rules with different left-hand sides but the
same right-hand side(s) to be abbreviated as follows:
This is useful if you need different specializations of the same rule which
use different type tags on the left-hand side variables. For instance:
fact n::int |
fact n::double |
fact n = n*fact(n-1) if n>0;
= 1 otherwise;
In fact, the left-hand sides don’t have to be related at all, so that you can
also write something like:
However, this construct is most useful when using an “as” pattern to bind a
common variable to a parameter value after checking that it matches one of
several possible argument patterns (which is slightly more efficient than
using an equivalent type-checking guard). E.g., the following definition binds
the xs variable to the parameter of foo, if it is either the empty
list or a list starting with an integer:
foo xs@[] | foo xs@(_::int:_) = ... xs ...;
The same construct also works in case expressions, which is
convenient if different cases should be mapped to the same value, e.g.:
case ans of "y" | "Y" = 1; _ = 0; end;
Sometimes it is useful if local definitions (when and
with) can be shared by the right-hand side and the guard of a
rule. This can be done by placing the local definitions behind the guard, as
follows (we only show the case of a single when clause here, but of
course there may be any number of when and with clauses
behind the guard):
lhs = rhs if guard when defns end;
Note that this is different from the following, which indicates that the
definitions only apply to the guard but not the right-hand side of the rule:
lhs = rhs if (guard when defns end);
Conversely, definitions placed before the guard only apply to the right-hand
side but not the guard (no parentheses are required in this case):
lhs = rhs when defns end if guard;
An example showing the use of a local variable binding spanning both the
right-hand side and the guard of a rule is the following quadratic equation
solver, which returns the (real) solutions of the equation x^2+p*x+q = 0
if the discriminant d = p^2/4-q is nonnegative:
> using math;
> solve p q = -p/2+sqrt d,-p/2-sqrt d if d>=0 when d = p^2/4-q end;
> solve 4 2; solve 2 4;
-0.585786437626905,-3.41421356237309
solve 2 4
Note that the above definition leaves the case of a negative discriminant
undefined.
As already mentioned, when, let and const use
a simplified kind of rule syntax which just consists of a left-hand and a
right-hand side separated by the equals sign. In this case the meaning of the
rule is to bind the variables in the left-hand side of the rule to the
corresponding subterms of the value of the right-hand side. This is also
called a pattern binding.
Guards or multiple left-hand or right-hand sides are not permitted in these
rules. However, it is possible to omit the left-hand side if it is just the
anonymous variable ‘_‘ by itself, indicating that you don’t care about the
result. The right-hand side is still evaluated, if only for its side-effects,
which is handy, e.g., for adding debugging statements to your code. For
instance, here is a variation of the quadratic equation solver which also
prints the discriminant after it has been computed:
> using math, system;
> solve p q = -p/2+sqrt d,-p/2-sqrt d if d>=0
> when d = p^2/4-q; printf "The discriminant is: %g\n" d; end;
> solve 4 2;
The discriminant is: 2
-0.585786437626905,-3.41421356237309
> solve 2 4;
The discriminant is: -3
solve 2 4
Note that simple rules of the same form lhs = rhs are also used in macro
definitions (def), to be discussed in the Macros section. In
this case, however, the rule denotes a real rewriting rule, not a pattern
binding, hence the left-hand side is mandatory in these rules.
Here are a few examples of simple Pure programs.
The factorial:
fact n = n*fact (n-1) if n>0;
= 1 otherwise;
let facts = map fact (1..10); facts;
The Fibonacci numbers:
fib n = a when a,b = fibs n end
with fibs n = 0,1 if n<=0;
= case fibs (n-1) of
a,b = b,a+b;
end;
end;
let fibs = map fib (1..30); fibs;
It is worth noting here that Pure performs tail call optimization so that
tail-recursive definitions like the following will be executed in constant
stack space (see Stack Size and Tail Recursion in the Caveats and Notes
section for more details on this):
// tail-recursive factorial using an "accumulating parameter"
fact n = loop 1 n with
loop p n = if n>0 then loop (p*n) (n-1) else p;
end;
Here is an example showing how constants are defined and used. Constant
definitions take pretty much the same form as variable definitions with
let (see above), but work more like the definition of a
parameterless function whose value is precomputed at compile time:
> extern double atan(double);
> const pi = 4*atan 1.0;
> pi;
3.14159265358979
> foo x = 2*pi*x;
> show foo
foo x = 6.28318530717959*x;
Note that the compiler normally computes constant subexpression at compile
time, such as 2*pi in the foo function. This works with all simple
scalars (machine ints and doubles), see Constant Definitions for details.
List comprehensions are Pure’s main workhorse for generating and processing
all kinds of list values. Here’s a well-known example, a variation of
Erathosthenes’ classical prime sieve:
primes n = sieve (2..n) with
sieve [] = [];
sieve (p:qs) = p : sieve [q | q = qs; q mod p];
end;
(This definition is actually rather inefficient, there are much better albeit
more complicated implementations of this sieve.)
For instance:
> primes 100;
[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
If you dare, you can actually have a look at the catmap-lambda-if-then-else
expression the comprehension expanded to:
> show primes
primes n = sieve (2..n) with sieve [] = []; sieve (p:qs) = p:sieve
(catmap (\q -> if q mod p then [q] else []) qs) end;
List comprehensions are also a useful device to organize backtracking
searches. For instance, here’s an algorithm for the n queens problem, which
returns the list of all placements of n queens on an n x n board (encoded as
lists of n pairs (i,j) with i = 1..n), so that no two queens hold each other
in check:
queens n = search n 1 [] with
search n i p = [reverse p] if i>n;
= cat [search n (i+1) ((i,j):p) | j = 1..n; safe (i,j) p];
safe (i,j) p = ~any (check (i,j)) p;
check (i1,j1) (i2,j2)
= i1==i2 || j1==j2 || i1+j1==i2+j2 || i1-j1==i2-j2;
end;
(Again, this algorithm is rather inefficient, see the examples included in the
Pure distribution for a much better algorithm by Libor Spacek.)
As already mentioned, lists can also be evaluated in a “lazy” fashion, by just
turning the tail of a list into a future. This special kind of list is also
called a stream. Streams enable you to work with infinite lists (or finite
lists which are so huge that you would never want to keep them in memory in
their entirety). E.g., here’s one way to define the infinite stream of all
Fibonacci numbers:
> let fibs = fibs 0L 1L with fibs a b = a : fibs b (a+b) & end;
> fibs;
0L:#<thunk 0xb5d54320>
Note the & on the tail of the list in the definition of the local
fibs function. This turns the result of fibs into a stream, which is
required to prevent the function from recursing into samadhi. Also note that
we work with bigints in this example because the Fibonacci numbers grow quite
rapidly, so with machine integers the values would soon start wrapping around
to negative integers.
Streams like these can be worked with in pretty much the same way as with
lists. Of course, care must be taken not to invoke “eager” operations such as
# (which computes the size of a list) on infinite streams, to prevent
infinite recursion. However, many list operations work with infinite streams
just fine, and return the appropriate stream results. E.g., the take
function (which retrieves a given number of elements from the front of a list)
works with streams just as well as with “eager” lists:
> take 10 fibs;
0L:#<thunk 0xb5d54350>
Hmm, not much progress there, but that’s just how streams work (or rather they
don’t, they’re lazy bums indeed!). Nevertheless, the stream computed with
take is in fact finite and we can readily convert it to an ordinary
list, forcing its evaluation:
> list (take 10 fibs);
[0L,1L,1L,2L,3L,5L,8L,13L,21L,34L]
An easier way to achieve this is to cut a “slice” from the stream:
> fibs!!(0..10);
[0L,1L,1L,2L,3L,5L,8L,13L,21L,34L,55L]
Also note that since we bound the stream to a variable, the already computed
prefix of the stream has been memoized, so that this portion of the stream is
now readily available in case we need to have another look at it later. By
these means, possibly costly reevaluations are avoided, trading memory for
execution speed:
> fibs;
0L:1L:1L:2L:3L:5L:8L:13L:21L:34L:55L:#<thunk 0xb5d54590>
Let’s take a look at some of the other convenience operations for generating
stream values. The prelude defines infinite arithmetic sequences, using
inf or -inf to denote an upper (or lower) infinite bound for the
sequence, e.g.:
> let u = 1..inf; let v = -1.0:-1.2..-inf;
> u!!(0..10); v!!(0..10);
[1,2,3,4,5,6,7,8,9,10,11]
[-1.0,-1.2,-1.4,-1.6,-1.8,-2.0,-2.2,-2.4,-2.6,-2.8,-3.0]
Other useful stream generator functions are iterate, which keeps
applying the same function over and over again, repeat, which just
repeats its argument forever, and cycle, which cycles through the
elements of the given list:
> iterate (*2) 1!!(0..10);
[1,2,4,8,16,32,64,128,256,512,1024]
> repeat 1!!(0..10);
[1,1,1,1,1,1,1,1,1,1,1]
> cycle [0,1]!!(0..10);
[0,1,0,1,0,1,0,1,0,1,0]
Moreover, list comprehensions can draw values from streams and return the
appropriate stream result:
> let rats = [m,n-m | n=2..inf; m=1..n-1; gcd m (n-m) == 1]; rats;
(1,1):#<thunk 0xb5d54950>
> rats!!(0..10);
[(1,1),(1,2),(2,1),(1,3),(3,1),(1,4),(2,3),(3,2),(4,1),(1,5),(5,1)]
Finally, let’s rewrite our prime sieve so that it generates the infinite
stream of all prime numbers:
all_primes = sieve (2..inf) with
sieve (p:qs) = p : sieve [q | q = qs; q mod p] &;
end;
Note that we can omit the empty list case of sieve here, since the sieve
now never becomes empty. Example:
> let P = all_primes;
> P!!(0..20);
[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73]
> P!299;
1987
You can also just print the entire stream. This will run forever, so hit
Ctrl-c when you get bored:
> using system;
> do (printf "%d\n") all_primes;
2
3
5
...
(Make sure that you really use the all_primes function instead of the
P variable to print the stream. Otherwise, because of memoization the
stream stored in P will grow with the number of elements printed until
memory is exhausted. Calling do on a fresh instance of the stream of
primes allows do to get rid of each “cons” cell after having printed
the corresponding stream element.)
Pure offers a number of basic matrix operations, such as matrix construction,
indexing, slicing, as well as getting the size and dimensions of a matrix
(these are briefly described in the Standard Library section). However, it
does not supply built-in support for matrix arithmetic and other linear
algebra algorithms. The idea is that these can and should be provided through
separate libraries (please check the Pure website for the pure-gsl module
which is an ongoing project to provide a full GSL interface for the Pure
language).
But Pure’s facilities for matrix and list processing also make it easy to roll
your own, if desired. First, the prelude provides matrix versions of the
common list operations like map, fold, zip etc., which
provide a way to implement common matrix operations. E.g., multiplying a
matrix x with a scalar a amounts to mapping the function (a*) to
x, which can be done as follows:
> a * x::matrix = map (a*) x if ~matrixp a;
> 2*{1,2,3;4,5,6};
{2,4,6;8,10,12}
Likewise, matrix addition and other element-wise operations can be realized
using zipwith, which combines corresponding elements of two matrices
using a given binary function:
> x::matrix + y::matrix = zipwith (+) x y;
> {1,2,3;4,5,6}+{1,2,1;3,2,3};
{2,4,4;7,7,9}
Second, matrix comprehensions make it easy to express a variety of algorithms
which would typically be implemented using for loops in conventional
programming languages. To illustrate the use of matrix comprehensions, here is
how we can define an operation to create a square identity matrix of a given
dimension:
> eye n = {i==j | i = 1..n; j = 1..n};
> eye 3;
{1,0,0;0,1,0;0,0,1}
Note that the i==j term is just a Pure idiom for the Kronecker
symbol. Another point worth mentioning here is that the generator clauses of
matrix comprehensions alternate between row and column generation
automatically, if values are drawn from lists as in the example above. (More
precisely, the last generator, which varies most quickly, yields a row, the
next-to-last one a column of these row vectors, and so on.) This makes matrix
comprehensions resemble customary mathematical notation very closely.
Of course, matrix comprehensions can also draw values from other matrices
instead of lists. In this case the block layout of the component matrices is
preserved. For instance:
> {x,y|x={1,2};y={a,b;c,d}};
{(1,a),(1,b),(2,a),(2,b);(1,c),(1,d),(2,c),(2,d)}
Note that a matrix comprehension involving filters may fail because the
filtered result isn’t a rectangular matrix any more. E.g.,
{2*x|x={1,2,3,-4};x>0} works, as does {2*x|x={-1,2;3,-4};x>0}, but
{2*x|x={1,2;3,-4};x>0} doesn’t because the rows of the result matrix have
different lengths.
As a slightly more comprehensive example (no pun intended!), here is a
definition of matrix multiplication in Pure. The building block here is the
“dot” product of two vectors which can be defined as follows:
> sum = foldl (+) 0;
> dot x::matrix y::matrix = sum $ zipwith (*) (rowvector x) (rowvector y);
> dot {1,2,3} {1,0,1};
4
The general matrix product now boils down to a simple matrix comprehension
which just computes the dot product of all rows of x with all columns of
y (the rows and cols functions are prelude operations
found in matrices.pure):
> x::matrix * y::matrix = {dot u v | u = rows x; v = cols y};
> {0,1;1,0;1,1}*{1,2,3;4,5,6};
{4,5,6;1,2,3;5,7,9}
(For the sake of simplicity, this doesn’t do much error checking. In
production code you’d check at least the conformance of matrix dimensions, of
course.)
Well, that was easy. So let’s take a look at a more challenging example,
Gaussian elimination, which can be used to solve systems of linear
equations. The algorithm brings a matrix into “row echelon” form, a
generalization of triangular matrices. The resulting system can then be solved
quite easily using back substitution.
Here is a Pure implementation of the algorithm. Note that the real meat is in
the pivoting and elimination step (step function) which is iterated over
all columns of the input matrix. In each step, x is the current matrix,
i the current row index, j the current column index, and p keeps
track of the current permutation of the row indices performed during
pivoting. The algorithm returns the updated matrix x, row index i and
row permutation p.
gauss_elimination x::matrix = p,x
when n,m = dim x; p,_,x = foldl step (0..n-1,0,x) (0..m-1) end;
// One pivoting and elimination step in column j of the matrix:
step (p,i,x) j
= if max_x==0 then p,i,x
else
// updated row permutation and index:
transp i max_i p, i+1,
{// the top rows of the matrix remain unchanged:
x!!(0..i-1,0..m-1);
// the pivot row, divided by the pivot element:
{x!(i,l)/x!(i,j) | l=0..m-1};
// subtract suitable multiples of the pivot row:
{x!(k,l)-x!(k,j)*x!(i,l)/x!(i,j) | k=i+1..n-1; l=0..m-1}}
when
n,m = dim x; max_i, max_x = pivot i (col x j);
x = if max_x>0 then swap x i max_i else x;
end with
pivot i x = foldl max (0,0) [j,abs (x!j)|j=i..#x-1];
max (i,x) (j,y) = if x<y then j,y else i,x;
end;
Please refer to any good textbook on numerical mathematics for a closer
description of the algorithm. But here is a brief rundown of what happens in
each elimination step: First we find the pivot element in column j of the
matrix. (We’re doing partial pivoting here, i.e., we only look for the element
with the largest absolute value in column j, starting at row i. That’s
usually good enough to achieve numerical stability.) If the pivot is zero then
we’re done (the rest of the pivot column is already zeroed out). Otherwise, we
bring it into the pivot position (swapping row i and the pivot row),
divide the pivot row by the pivot, and subtract suitable multiples of the
pivot row to eliminate the elements of the pivot column in all subsequent
rows. Finally we update i and p accordingly and return the result.
In order to complete the implementation, we still need the following little
helper functions to swap two rows of a matrix (this is used in the pivoting
step) and to apply a transposition to a permutation (represented as a list):
swap x i j = x!!(transp i j (0..n-1),0..m-1) when n,m = dim x end;
transp i j p = [p!tr k | k=0..#p-1]
with tr k = if k==i then j else if k==j then i else k end;
Finally, let us define a convenient print representation of double matrices a
la Octave (the meaning of the __show__ function is explained in The
__show__ Function):
using system;
__show__ x::matrix
= strcat [printd j (x!(i,j))|i=0..n-1; j=0..m-1] + "\n"
with printd 0 = sprintf "\n%10.5f"; printd _ = sprintf "%10.5f" end
when n,m = dim x end if dmatrixp x;
Example:
> let x = dmatrix {2,1,-1,8; -3,-1,2,-11; -2,1,2,-3};
> x; gauss_elimination x;
2.00000 1.00000 -1.00000 8.00000
-3.00000 -1.00000 2.00000 -11.00000
-2.00000 1.00000 2.00000 -3.00000
[1,2,0],
1.00000 0.33333 -0.66667 3.66667
0.00000 1.00000 0.40000 2.60000
0.00000 0.00000 1.00000 -1.00000
As already mentioned, matrices may contain not just numbers but any kind of
Pure value, in which case they become symbolic matrices. Symbolic matrices
are a convenient data structure for storing arbitrary collections of values
which provides fast random access to its members. In particular, symbolic
matrices can also be nested, and thus arrays of arbitrary dimension can be
realized as nested symbolic vectors. However, you have to be careful when
constructing such values, as the {...} construct normally combines
submatrices to larger matrices. For instance:
> {{1,2},{3,4}};
{1,2,3,4}
One way to inhibit this “splicing” of the submatrices in a larger matrix is to
use the “quote” operator (cf. The Quote):
> '{{1,2},{3,4}};
{{1,2},{3,4}}
(Note that this result is really different from {1,2;3,4}. The latter is a
2x2 integer matrix, while the former is a symbolic vector a.k.a. 1x2 matrix
whose elements happen to be two integer vectors.)
Unfortunately, the quote operator in fact inhibits evaluation of all
embedded subterms which may be undesirable if the matrix expression contains
arithmetic (as in '{{1+1,2*3}}), so this method works best for constant
matrices. A more general way to create a symbolic vector of matrices is
provided by the vector function from the prelude, which is applied to
a list of the vector elements as follows:
> vector [{1,2},{3,4}];
{{1,2},{3,4}}
Calls to the vector function can be nested to an arbitrary depth to
obtain higher-dimensional “arrays”:
> vector [vector [{1,2}],vector [{3,4}]];
{{{1,2}},{{3,4}}}
This obviously becomes a bit unwieldy for higher dimensions, but in Pure you
can easily define yourself some more convenient notation if you like. For
instance, the following macro may be used to define a pair of “non-splicing”
vector brackets:
> outfix {: :};
> def {: xs@(_,_) :} = vector (list xs);
> def {: x :} = vector [x];
> {:{:{1,2}:},{:{3,4}:}:};
{{{1,2}},{{3,4}}}
(Both macros and outfix symbol declarations are described later in
the appropriate sections, see Macros and Symbol Declarations.)
Symbolic matrices also provide a means to represent simple record-like data,
by encoding records as symbolic vectors consisting of “hash pairs” of the form
key => value. This kind of data structure is very convenient to represent
aggregates with lots of different components. Since the components of records
can be accessed by indexing with key values, you don’t have to remember which
components are stored in which order, just knowing the keys of the required
members is enough. In contrast, tuples, lists and other kinds of constructor
terms quickly become unwieldy for such purposes.
The keys used for indexing the record data must be either symbols or strings,
while the corresponding values may be arbitrary Pure values. The prelude
provides some operations on these special kinds of matrices, which let you
retrieve vector elements by indexing and perform non-destructive updates, see
the Record Functions section in the Pure Library Manual for details. Here
are a few examples which illustrate how to create records and work with them:
> let r = {x=>5, y=>12};
> recordp r, member r x;
1,1
> r!y; r!![y,x];
12
{12,5}
> insert r (x=>99);
{x=>99,y=>12}
> insert ans (z=>77);
{x=>99,y=>12,z=>77}
> delete ans z;
{x=>99,y=>12}
Note the use of the “hash rocket” => which denotes the key=>value
associations in a record. The hash rocket is a constructor declared as an
infix operator in the prelude, see the Prelude section in the
Pure Library Manual. There’s one caveat here, however. Since neither ‘=>‘ nor
‘!‘ treat their key operand in a special way, you’ll have to take care
that the key symbols do not evaluate to something else, as might be the case
if they are bound to a global or local variable or parameterless function:
> let u = 99;
> {u=>u};
{99=>99}
In the case of global variables and function symbols, you might also protect
the symbol with a quote (see The Quote):
However, even the quote doesn’t save you from local variable substitution, so
you’ll just have to rename the local variable in such cases:
> {'u=>u} when u = 99 end;
{99=>99}
> {'u=>v} when v = 99 end;
{u=>99}
It’s also possible to use strings as keys instead, which may actually be more
convenient in some cases:
> let r = {"x"=>5, "y"=>12};
> keys r; vals r;
{"x","y"}
{5,12}
> update r "y" (r!"y"+1);
{"x"=>5,"y"=>13}
You can also mix strings and symbols as keys in the same record (but note that
strings and symbols are always distinct, so y and "y" are really two
different keys here):
> insert r (y=>99);
{"x"=>5,"y"=>12,y=>99}
As records are in fact just special kinds of matrices, the standard matrix
operations can be used on record values as well. For instance, the matrix
constructor provides an alternative way to quickly augment a record with a
collection of new key=>value associations:
> let r = {x=>5, y=>12};
> let r = {r, x=>7, z=>3}; r;
{x=>5,y=>12,x=>7,z=>3}
> r!x, r!z;
7,3
> delete r x;
{x=>5,y=>12,z=>3}
> ans!x;
5
As the example shows, this may produce duplicate keys, but these are handled
gracefully; indexing and updates will always work with the last association
for a given key in the record. If necessary, you can remove duplicate entries
from a record as follows; this will only keep the last association for each
key:
> record r;
{x=>7,y=>12,z=>3}
In fact, the record operation not only removes duplicates, but also
orders the record entries by keys. This produces a kind of normalized
representation which is useful if you want to compare or combine two record
values irrespective of the ordering of the fields. For instance:
> record {x=>5, y=>12} === record {y=>12, x=>5};
1
The record function can also be used to construct a normalized record
directly from a list or tuple of hash pairs:
> record [x=>5, x=>7, y=>12];
{x=>7,y=>12}
Other matrix operations such as map, foldl, etc., and matrix
comprehensions can be applied to records just as easily. This enables you to
perform bulk updates of record data in a straightforward way. For instance,
here’s how you can define a function maprec which applies a function to
all values stored in a record:
> maprec f = map (\(u=>v) -> u=>f v);
> maprec (*2) {x=>5,y=>12};
{x=>10,y=>24}
Another example: The following ziprec function collects pairs of values
stored under common keys in two records (we also normalize the result here so
that duplicate keys are always removed):
> ziprec x y = record {u=>(x!u,y!u) | u = keys x; member y u};
> ziprec {a=>3,x=>5,y=>12} {x=>10,y=>24,z=>7};
{x=>(5,10),y=>(12,24)}
Thus the full power of generic matrix operations is available for records,
which turns them into a very versatile data structure, much more powerful than
records in conventional programming languages which are usually limited to
constructing records and accessing or modifying their components. Note that
since the values stored in records can be arbitrary Pure values, you can also
have mutable records by making use of Pure’s expression references (see
Expression References in the library manual). And of course records can
be nested, too:
> let r = {a => {b=>1,c=>2}, b => 2};
> r!a, r!b, r!a!b;
{b=>1,c=>2},2,1
As already mentioned in Special Forms, the quote operation quotes
an expression, so that it can be passed around and manipulated freely until
its value is needed, in which case you can pass it to the eval
function to obtain its value. For instance:
> let x = '(2*42+2^12); x;
2*42+2^12
> eval x;
4180.0
The quote also inhibits evaluation inside matrix values, including the
“splicing” of embedded submatrices:
> '{1,2+3,2*3};
{1,2+3,2*3}
> '{1,{2,3},4};
{1,{2,3},4}
Lisp programmers will be well familiar with this operation which enables some
powerful metaprogramming techniques. However, there are some notable
differences to Lisp’s quote. First, quote only inhibits the evaluation
of global variables, local variables are substituted as usual:
> (\x -> '(2*x+1)) 99;
2*99+1
> foo x = '(2*x+1);
> foo 99; foo $ '(7/y);
2*99+1
2*(7/y)+1
> '(x+1) when x = '(2*3) end;
2*3+1
> '(2*42+2^n) when n = 12 end;
2*42+2^12
This may come as a surprise (or even annoyance) to real Lisp weenies, but it
does have its advantages. In particular, it makes it easy to fill in the
variable parts in a quoted “template” expression, without any need for an
arguably complex tool like Lisp’s “quasiquote”. (It is quite easy to define
quasiquote in Pure if you want it, however. See the Recursive Macros
section for a simplified version; a full implementation can be found in the
Pure library sources.)
Another useful feature of Lisp’s quasiquote is the capability to splice
arguments into a function application. It is possible to achieve pretty much
the same in Pure with the following variation of the $ operator which
“curries” its second (tuple) operand:
infixr 0 $@ ;
f $@ () = f;
f $@ (x,xs) = f x $@ xs;
f $@ x = f x;
Now you can write, e.g.:
> '(foo 1 2) $@ '(2/3,3/4);
foo 1 2 (2/3) (3/4)
Second, it is in fact possible to perform arbitrary computations right in the
middle of a quoted expression. This is because quote only ever quotes
simple expressions, embedded special expressions (conditionals, lambda
and the case, when and with constructs) are
evaluated as usual. Example:
> '(x+1 when x = '(2*3) end);
2*3+1
> '(2*42+(2^n when n = 2*6 end));
2*42+4096.0
The downside of this is that there is no way to quote special expressions.
Macro expansion is inhibited in quoted expressions, however, so it is
possible to work around this limitation by defining a custom special form (see
Macros) to be used as a symbolic representation for, say, a lambda
expression, which reduces to a real lambda when evaluated. To these ends, the
eval function can be invoked with a string argument as follows:
def lambda x y = eval $ "\\ "+str ('x)+" -> "+str ('y);
Example:
> let l = 'lambda x (x+1); l;
lambda x (x+1)
> let f = eval l; f; f 9;
#<closure 0x7fdc3ca45be8>
10
Other special constructs, such as case, when and
with can be handled in a similar fashion.
Pure is a very terse language by design. Usually you don’t declare much stuff,
you just define it and be done with it. However, there are a few constructs
which let you declare symbols with special attributes and manage programs
consisting of several source modules:
- symbol declarations determine “scope” and “fixity” of a symbol;
- extern declarations specify external C functions (described in
the C Interface section);
- using clauses let you include other scripts in a Pure script;
- namespace declarations let you avoid name clashes and thereby
make it easier to manage large programs consisting of many separate modules.
These are toplevel elements (cf. Toplevel):
item ::= symbol_decl | extern_decl | using_decl | namespace_decl
The syntax of each of these is described in the following subsections, except
extern_decl which can be found in the C Interface section.
symbol_decl ::= scope symbol+ ";"
| [scope] fixity symbol+ ";"
scope ::= "public" | "private"
fixity ::= "nonfix" | "outfix"
| ("infix"|"infixl"|"infixr"|"prefix"|"postfix") precedence
precedence ::= integer | "(" op ")"
Scope declarations take the following form:
-
public symbol ...;
-
private symbol ...;
This declares the listed symbols as public or private, respectively. Each
symbol must either be an identifier or a sequence of punctuation
characters. The latter kind of symbols must always be declared before use,
whereas ordinary identifiers can be used without a prior declaration in which
case they are declared implicitly and default to public scope, meaning that
they are visible everywhere in a program. An explicit public declaration of
ordinary identifiers is thus rarely needed (unless you want to declare symbols
as members of a specific namespace, see Namespaces below). Symbols can also
be declared private, meaning that the symbol is visible only in the namespace
it belongs to. This is explained in more detail under Private Symbols in
the Namespaces section below.
Note that to declare several symbols in a single declaration, you can list
them all with whitespace in between. The same syntax applies to the other
types of symbol declarations discussed below. (Commas are not allowed as
delimiters here, as they may occur as legal symbol constituents in the list of
symbols.) The public and private keywords can also be
used as a prefix in any of the special symbol declarations discussed below, to
specify the scope of the declared symbols (if the scope prefix is omitted, it
defaults to public).
The following “fixity” declarations are available for introducing special
operator and constant symbols. This changes the way that these symbols are
parsed and thus provides you with a limited means to extend the Pure language
at the lexical and syntactical level.
-
infix level symbol ...;
-
infixl level symbol ...;
-
infixr level symbol ...;
-
prefix level symbol ...;
-
postfix level symbol ...;
Pure provides you with a theoretically unlimited number of different
precedence levels for user-defined infix, prefix and postfix operators.
Precedence levels are numbered starting at 0; larger numbers indicate higher
precedence. (For practical reasons, the current implementation does require
that precedence numbers can be encoded as 24 bit unsigned machine integers,
giving you a range from 0 to 16777215, but this should be large enough to
incur no real limitations on applications. Also, the operator declarations in
the prelude have been set up to leave enough “space” between the “standard”
levels so that you can easily sneak in new operator symbols at low, high or
intermediate precedences.)
On each precedence level, you can declare (in order of increasing precedence)
infix (binary non-associative), infixl (binary
left-associative), infixr (binary right-associative),
prefix (unary prefix) and postfix (unary postfix)
operators. For instance, here is a typical excerpt from the prelude (the full
table can be found in the Prelude section of the Pure Library Manual):
infix 1800 < > <= >= == ~= ;
infixl 2200 + - ;
infixl 2300 * / div mod ;
infixr 2500 ^ ;
prefix 2600 # ;
Instead of denoting the precedence by an explicit integer value, you can also
specify an existing operator symbol enclosed in parentheses. Thus the
following declaration gives the ++ operator the same precedence as +:
The given symbol may be of a different fixity than the declaration, but it
must have a proper precedence level (i.e., it must be an infix, prefix or
postfix symbol). E.g., the following declaration gives ^^ the same
precedence level as the infix ^ symbol, but turns it into a postfix
operator:
Pure also provides unary outfix operators, which work like in Wm Leler’s
constraint programming language Bertrand. These can be declared as follows:
-
outfix left right ...;
Outfix operators let you define your own bracket structures. The operators
must be given as pairs of matching left and right symbols (which must be
distinct). For instance:
After this declaration you can write bracketed expressions like |:x:| or
BEGIN foo, bar END. These are always at the highest precedence level
(i.e., syntactically they work like parenthesized expressions). Just like
other operators, you can turn outfix symbols into ordinary functions by
enclosing them in parentheses, but you have to specify the symbols in matching
pairs, such as (BEGIN END).
Pure also has a notation for “nullary” operators, i.e., “operators without
operands”, which are used to denote special constants. These are introduced
using a nonfix declaration:
-
nonfix symbol ...;
For instance:
Syntactically, these work just like ordinary identifiers, so they may stand
whereever an identifier is allowed (no parentheses are required to “escape”
them). The difference to ordinary identifiers is that nonfix symbols are
always interpreted as literals, even if they occur in a variable position on
the left-hand side of a rule. So, with the above declaration, you can write
something like:
> foo x = case x of red = green; green = blue; blue = red end;
> map foo [red,green,blue];
[green,blue,red]
Thus nonfix symbols are pretty much like nullary constructor symbols in
languages like Haskell. Non-fixity is just a syntactic attribute,
however. Pure doesn’t enforce that such values are really “constant”, so you
can still write a “constructor equation” like the following:
> red = blue;
> map foo [red,green,blue];
[blue,blue,blue]
Examples for all types of symbol declarations can be found in the
prelude which declares a bunch of standard (arithmetic,
relational, logical) operator symbols as well as the list and pair
constructors ‘:‘ and ‘,‘, and a few nonfix symbols (mostly for
denoting different kinds of exceptions).
One final thing worth noting here is that unary minus plays a special role in
the syntax. Like in Haskell and following mathematical tradition, unary minus
is the only prefix operator symbol which is also used as an infix operator,
and is always on the same precedence level as binary minus, whose precedence
may be chosen freely in the prelude. (The minus operator is the only symbol
which gets that special treatment; all other operators must have distinct
lexical representations.) Thus, with the standard prelude, -x+y will be
parsed as (-x)+y, whereas -x*y is the same as -(x*y). Also note
that the notation (-) always denotes the binary minus operator; the unary
minus operation can be denoted using the built-in neg function.
using_decl ::= "using" name ("," name)* ";"
name ::= qualified_identifier | string
While Pure doesn’t offer separate compilation, the using
declaration provides a simple but effective way to assemble a Pure program
from several source modules. It takes the following form (note that in
contrast to symbol declarations, the comma is used as a delimiter symbol
here):
-
using name, ...;
This causes each given script to be included in the Pure program at the given
point (if it wasn’t already included before), which makes available all the
definitions of the included script in your program. Note that each included
script is loaded only once, when the first using clause for the
script is encountered. Nested imports are allowed, i.e., an imported module
may itself import other modules, etc. A Pure program then basically is the
concatenation of all the source modules given as command line arguments, with
other modules listed in using clauses inserted at the corresponding
source locations.
(The using clause also has an alternative form which allows dynamic
libraries and LLVM bitcode modules to be loaded, this will be discussed in the
C Interface section.)
For instance, the following declaration causes the math.pure script from the
standard library to be included in your program:
You can also import multiple scripts in one go:
Moreover, Pure provides a notation for qualified module names which can be
used to denote scripts located in specific package directories, e.g.:
using examples::libor::bits;
In fact this is equivalent to the following using clause which
spells out the real filename of the script between double quotes (the
.pure suffix can also be omitted in which case it is added
automatically):
using "examples/libor/bits.pure";
Both notations can be used interchangeably; the former is usually more
convenient, but the latter allows you to denote scripts whose names aren’t
valid Pure identifiers.
Script identifiers are translated to the corresponding filenames by replacing
the ‘::‘ symbol with the pathname separator ‘/‘ and tacking on the
‘.pure‘ suffix. The following table illustrates this with a few examples.
Script identifier |
Filename |
math |
"math.pure" |
examples::libor::bits |
"examples/libor/bits.pure" |
::pure::examples::hello |
"/pure/examples/hello.pure" |
Note the last example, which shows how an absolute pathname can be denoted
using a qualifier starting with ‘::‘.
Unless an absolute pathname is given, the interpreter performs a search to
locate the script. The search algorithm considers the following directories in
the given order:
- the directory of the current script, which is the directory of the script
containing the using clause, or the current working directory if
the clause was read from standard input (as is the case, e.g., in an
interactive session);
- the directories named in -I options on the command line (in the
given order);
- the colon-separated list of directories in the PURE_INCLUDE
environment variable (in the given order);
- finally the directory named by the PURELIB environment variable.
Note that the current working directory is not searched by default (unless the
using clause is read from standard input), but of course you can
force this by adding the option -I. to the command line, or by
including ‘.’ in the PURE_INCLUDE variable.
The directory of the current script (the first item above) can be skipped by
specifying the script to be loaded as a filename in double quotes, prefixed
with the special sys: tag. The search then starts with the “system”
directories (-I, PURE_INCLUDE and PURELIB)
instead. This is useful, e.g., if you want to provide your own custom version
of a standard library script which in turn imports that library script. For
instance, a custom version of math.pure might employ the following
using clause to load the math.pure script from the Pure library:
using "sys:math";
// custom definitions go here
log2 x = ln x/ln 2;
The interpreter compares script names (to determine whether two scripts are
actually the same) by using the canonicalized full pathname of the script,
following symbolic links to the destination file (albeit only one level). Thus
different scripts with the same basename, such as foo/utils.pure and
bar/utils.pure can both be included in the same program (unless they link to
the same file).
More precisely, canonicalizing a pathname involves the following steps:
- relative pathnames are expanded to absolute ones, using the search rules
discussed above;
- the directory part of the pathname is normalized to the form returned by the
getcwd system call;
- the ”.pure” suffix is added if needed;
- if the resulting script name is actually a symbolic link, the interpreter
follows that link to its destination, albeit only one level. (This is only
done on Unix-like systems.)
The directory of the canonicalized pathname is also used when searching other
scripts included in a script. This makes it possible to have an executable
script with a shebang line in its own directory, which is then executed via a
symbolic link placed on the system PATH. In this case the script
search performed in using clauses will use the real script
directory and thus other required scripts can be located there. This is the
recommended practice for installing standalone Pure applications in source
form which are to be run directly from the shell.
namespace_decl ::= "namespace" [name] ";"
| "namespace" name "with" item+ "end" ";"
| "using" "namespace" [name_spec ("," name_spec)*] ";"
name_spec ::= name ["(" symbol+ ")"]
To facilitate modular development, Pure also provides namespaces as a means to
avoid name clashes between symbols, and to keep the global namespace tidy and
clean. Namespaces serve as containers holding groups of related identifiers
and other symbols. Inside each namespace, symbols must be unique, but the same
symbol may be used to denote different objects (variables, functions, etc.) in
different namespaces. (Pure’s namespace system was heavily inspired by C++ and
works in a very similar fashion. So if you know C++ you should feel right at
home and skimming this section to pick up Pure’s syntax of the namespace
constructs should be enough to start using it.)
The global namespace is always available. By default, new symbols are created
in this namespace, which is also called the default namespace. Additional
namespaces can be created with the namespace declaration, which
also switches to the given namespace (makes it the current namespace), so
that new symbols are then created in that namespace rather than the default
one. The current namespace also applies to all kinds of symbol declarations,
including operator and constant symbol declarations, as well as
extern declarations (the latter are described in the C Interface
section).
The basic form of the namespace declaration has the following
syntax (there’s also a “scoped” form of the namespace declaration
which will be discussed in Scoped Namespaces at the end of this section):
namespace name;
// declarations and definitions in namespace 'name'
namespace;
The second form switches back to the default namespace. For instance, in order
to define two symbols with the same print name foo in two different
namespaces foo and bar, you can write:
namespace foo;
foo x = x+1;
namespace bar;
foo x = x-1;
namespace;
We can now refer to the symbols we just defined using qualified symbols of
the form namespace::symbol:
> foo::foo 99;
100
> bar::foo 99;
98
This avoids any potential name clashes, since the qualified identifier
notation always makes it clear which namespace the given identifier belongs
to.
A namespace can be “reopened” at any time to add new symbols and definitions
to it. This allows namespaces to be created that span several source
modules. You can also create several different namespaces in the same module.
Similar to the using declaration, a namespace
declaration accepts either identifiers or double-quoted strings as namespace
names. E.g., the following two declarations are equivalent:
namespace foo;
namespace "foo";
The latter form also allows more descriptive labels which aren’t identifiers,
e.g.:
namespace "Private stuff, keep out!";
Note that the namespace prefix in a qualified identifier must be a legal
identifier, so it isn’t possible to access symbols in namespaces with such
descriptive labels in a direct fashion. The only way to get at the symbols in
this case is to use a namespace or using namespace
declaration (for the latter see Using Namespaces below).
Since it is rather inconvenient if you always have to write identifiers in
their qualified form outside of their “home” namespace, Pure allows you to
specify a list of search namespaces which are used to look up symbols not in
the default or the current namespace. This is done with the using
namespace declaration, which takes the following form:
using namespace name1, name2, ...;
// ...
using namespace;
(As with namespace declarations, the second form without any
namespace arguments gets you back to the default empty list of search
namespaces.)
For instance, consider this example:
namespace foo;
foo x = x+1;
namespace bar;
foo x = x-1;
bar x = x+1;
namespace;
The symbols in these namespaces can be accessed unqualified as follows:
> using namespace foo;
> foo 99;
100
> using namespace bar;
> foo 99;
98
> bar 99;
100
This method is often to be preferred over opening a namespace with the
namespace declaration, since using namespace only gives
you “read access” to the imported symbols, so you can’t accidentally mess up
the definitions of the namespace you’re using. Another advantage is that the
using namespace declaration also lets you search multiple
namespaces at once:
using namespace foo, bar;
Be warned, however, that this brings up the very same issue of name clashes
again:
> using namespace foo, bar;
> foo 99;
<stdin>, line 15: symbol 'foo' is ambiguous here
In such a case you’ll have to resort to using namespace qualifiers again, in
order to resolve the name clash:
To avoid this kind of mishap, you can also selectively import just a few
symbols from a namespace instead. This can be done with a declaration of the
following form:
using namespace name1 ( sym1 sym2 ... ), name2 ... ;
As indicated, the symbols to be imported can optionally be placed as a
whitespace-delimited list inside parentheses, following the corresponding
namespace name. For instance:
> using namespace foo, bar (bar);
> foo 99;
100
> bar 99;
100
> bar::foo 99;
98
Note that now we have no clash on the foo symbol any more, because we
restricted the import from the bar namespace to the bar symbol, so
that bar::foo has to be denoted with a qualified symbol now.
Pure’s rules for looking up and creating symbols are fairly straightforward
and akin to those in other languages featuring namespaces. However, there are
some intricacies involved, because the rewriting rule format of definitions
allows “referential” use of symbols not only in the “body” (right-hand side)
of a definition, but also in the left-hand side patterns. We discuss this in
detail below.
The compiler searches for symbols first in the current namespace (if any),
then in the currently active search namespaces (if any), and finally in the
default (i.e., the global) namespace, in that order. This automatic lookup can
be bypassed by using an absolute namespace qualifier of the form
::foo::bar. In particular, ::bar always denotes the symbol bar in
the default namespace, while ::foo::bar denotes the symbol bar in the
foo namespace. (Normally, the latter kind of notation is only needed if
you have to deal with nested namespaces, see Hierarchical Namespaces
below.)
If no existing symbol is found, a new symbol is created automatically, by
implicitly declaring a public symbol with default attributes. New
unqualified symbols are always created in the current namespace, while new
qualified symbols are created in the namespace given by the namespace prefix
of the symbol. However, note that in the latter case the compiler always
checks that the given namespace prefix matches the current namespace:
> namespace foo;
> namespace;
> foo::bar x = 1/x;
<stdin>, line 3: undeclared symbol 'foo::bar'
Thus it’s only possible to introduce a new symbol in a given namespace if that
namespace is the current one. These error messages are somewhat annoying, but
they provide at least some protection against typos and other silly mistakes
and prevent you from accidentally clobbering the contents of other
namespaces. To make these errors go away it’s enough to just declare the
symbols in their proper namespaces.
New symbols are also created if a global unqualified (and yet undeclared)
symbol is being “defined” in a rewriting rule or
let/const definition, even if a symbol with the same
print name from another namespace is already visible in the current scope. To
distinguish “defining” from “referring” uses of a global symbol, Pure uses the
following (purely syntactic) notions:
- A defining occurrence of a global function or macro symbol is any
occurrence of the symbol as the head symbol on the left-hand side of a
rewriting rule.
- A defining occurrence of a global variable or constant symbol is any
occurrence of the symbol in a variable position (as given by the “head =
function” rule, cf. Variables in Equations) on the left-hand side of a
let or const definition.
- All other occurrences of global symbols on the left-hand side, as well as
all symbol occurrences on the right-hand side of a definition are
referring occurrences.
The following example illustrates these notions:
namespace foo;
bar (bar x) = bar x;
let x,y = 1,2;
namespace;
Here, the first occurrence of bar on the left-hand side bar (bar x) of
the first rule is a defining occurrence, as are the occurrences of x and
y on the left-hand side of the let definition. Hence these
symbols are created as new symbols in the namespace foo. On the other
hand, the other occurrences of bar in the first rule, as well as the ‘,‘ symbol on the left-hand side of the let definition are
referring occurrences. In the former case, bar refers to the bar
symbol defined by the rule, while in the latter case the ‘,‘ operator
is actually declared in the prelude and thus imported from the global
namespace.
As an additional safety measure against missing or mistyped symbols, the
interpreter provides the option -w (see Invoking Pure) to check
your scripts for non-defining uses of undeclared unqualified function
symbols. For instance:
$ pure -w
> puts "bla"; // missing import of system module
<stdin>, line 1: warning: implicit declaration of 'puts'
puts "bla"
For legitimate uses (such as forward uses of a symbol which is defined later),
you can make these warnings go away by declaring the symbol before using it.
Note that special operator (and nonfix) symbols always require an explicit
declaration. This works as already discussed in the Symbol Declarations
section, except that you first switch to the appropriate namespace before
declaring the symbols. For instance, here is how you can create a new +
operation which multiplies its operands rather than adding them:
> namespace my;
> infixl 2200 +;
> x+y = x*y;
> 5+7;
35
Note that the new + operation really belongs to the namespace we
created. The + operation in the default namespace works as before, and in
fact you can use qualified symbols to pick the version that you need:
> namespace;
> 5+7;
12
> 5 ::+ 7;
12
> 5 my::+ 7;
35
Here’s what you get if you happen to forget the declaration of the +
operator:
> namespace my;
> x+y = x*y;
<stdin>, line 2: infixl symbol '+' was not declared in this namespace
Thus the compiler will never create a new instance of an operator symbol on
the fly, an explicit declaration is always needed in such cases.
Note that if you really wanted to redefine the global + operator, you
can do this even while the my namespace is current. You just have to use a
qualified identifier in this case, as follows:
> namespace my;
> x ::+ y = x*y;
> a+b;
a*b
This should rarely be necessary (in the above example you might just as well
enter this rule while in the global namespace), but it can be useful in some
circumstances. Specifically, you might want to “overload” a global function
or operator with a definition that makes use of private symbols of a namespace
(which are only visible inside that namespace; see Private Symbols
below). For instance:
> namespace my;
> private bar;
> bar x y = x*y;
> x ::+ y = bar x y;
> a+b;
a*b
(The above is a rather contrived example, since the very same functionality
can be accomplished much easier, but there are some situations where this
approach is necessary.)
Pure also allows you to have private symbols, as a means to hide away internal
operations which shouldn’t be accessed directly outside the namespace in which
they are declared. The scope of a private symbol is confined to its namespace,
i.e., the symbol is only visible when its “home” namespace is current. Symbols
are declared private by using the private keyword in the symbol
declaration:
> namespace secret;
> private baz;
> // 'baz' is a private symbol in namespace 'secret' here
> baz x = 2*x;
> // you can use 'baz' just like any other symbol here
> baz 99;
198
> namespace;
Note that, at this point, secret::baz is now invisible, even if you have
secret in the search namespace list:
> using namespace secret;
> // this actually creates a 'baz' symbol in the default namespace:
> baz 99;
baz 99
> secret::baz 99;
<stdin>, line 27: symbol 'secret::baz' is private here
The only way to bring the symbol back into scope is to make the secret
namespace current again:
> namespace secret;
> baz 99;
198
> secret::baz 99;
198
Namespace identifiers can themselves be qualified identifiers in Pure, which
enables you to introduce a hierarchy of namespaces. This is useful, e.g., to
group related namespaces together under a common “umbrella” namespace:
namespace my;
namespace my::old;
foo x = x+1;
namespace my::new;
foo x = x-1;
Note that the namespace my, which serves as the parent namespace, must be
created before the my::old and my::new namespaces, even if it does not
contain any symbols of its own. After these declarations, the my::old and
my::new namespaces are part of the my namespace and will be considered
in name lookup accordingly, so that you can write:
> using namespace my;
> old::foo 99;
100
> new::foo 99;
98
This works pretty much like a hierarchy of directories and files, where the
namespaces play the role of the directories (with the default namespace as the
root directory), the symbols in each namespace correspond to the files in a
directory, and the using namespace declaration functions similar to
the shell’s PATH variable.
Sometimes it is necessary to tell the compiler to use a symbol in a specific
namespace, bypassing the usual symbol lookup mechanism. For instance, suppose
that we introduce another global old namespace and define yet another
version of foo in that namespace:
namespace old;
foo x = 2*x;
namespace;
Now, if we want to access that function, with my still active as the
search namespace, we cannot simply refer to the new function as old::foo,
since this name will resolve to my::old::foo instead. As a remedy, the
compiler accepts an absolute qualified identifier of the form
::old::foo. This bypasses name lookup and thus always yields exactly the
symbol in the given namespace (if it exists; as mentioned previously, the
compiler will complain about an undeclared symbol otherwise):
> old::foo 99;
100
> ::old::foo 99;
198
Also note that, as a special case of the absolute qualifier notation,
::foo always denotes the symbol foo in the default namespace.
Pure also provides an alternative scoped namespace construct which
makes nested namespace definitions more convenient. This construct takes the
following form:
namespace name with ... end;
The part between with and end may contain arbitrary
declarations and definitions, using the same syntax as the toplevel. These are
processed in the context of the given namespace, as if you had written:
namespace name;
...
namespace;
However, the scoped namespace construct always returns you to the namespace
which was active before, and thus these declarations may be nested:
namespace foo with
// declarations and definitions in namespace foo
namespace bar with
// declarations and definitions in namespace bar
end;
// more declarations and definitions in namespace foo
end;
Note that this kind of nesting does not necessarily imply a namespace
hierarchy as discussed in Hierarchical Namespaces. However, you can achieve
this by using the appropriate qualified namespace names:
namespace foo with
// ...
namespace foo::bar with
// ...
end;
// ...
end;
Another special feature of the scoped namespace construct is that
using namespace declarations are always local to the current
namespace scope (and other nested namespace scopes inside it). Thus the
previous setting is restored at the end of each scope:
using namespace foo;
namespace foo with
// still using namespace foo here
using namespace bar;
// now using namespace bar
namespace bar with
// still using namespace bar here
using namespace foo;
// now using namespace foo
end;
// back to using namespace bar
end;
// back to using namespace foo at toplevel
Finally, here’s a more concrete example which shows how scoped namespaces
might be used to declare two namespaces and populate them with various
functions and operators:
namespace foo with
infixr (::^) ^;
foo x = x+1;
bar x = x-1;
x^y = 2*x+y;
end;
namespace bar with
outfix <: :>;
foo x = x+2;
bar x = x-2;
end;
using namespace foo(^ foo), bar(bar <: :>);
// namespace foo
foo x;
x^y;
// namespace bar
bar x;
<: x,y :>;
Pure’s namespaces can thus be used pretty much like “modules” or “packages” in
languages like Ada or Modula-2. They provide a structured way to describe
program components offering collections of related data and operations, which
can be brought into scope in a controlled way by making judicious use of
using namespace declarations. They also provide an abstraction
barrier, since internal operations and data structures can be hidden away
employing private symbols.
Please note that these facilities are not Pure’s main focus and thus they are
somewhat limited compared to programming languages specifically designed for
big projects and large teams of developers. Nevertheless they should be useful
if your programs grow beyond a small collection of simple source modules, and
enable you to manage most Pure projects with ease.
Macros are a special type of functions to be executed as a kind of
“preprocessing stage” at compile time. In Pure these are typically used to
define custom special forms and to perform inlining of function calls and
other simple kinds of source-level optimizations.
Whereas the macro facilities of most programming languages simply provide a
kind of textual substitution mechanism, Pure macros operate on symbolic
expressions and are implemented by the same kind of rewriting rules that are
also used to define ordinary functions in Pure. In contrast to these, macro
rules start out with the keyword def, and only simple kinds of
rules without any guards or multiple left-hand and right-hand sides are
permitted.
Syntactically, a macro definition looks just like a variable or constant
definition, using def in lieu of let or
const, but they are processed in a different way. Macros are
substituted into the right-hand sides of function, constant and variable
definitions. All macro substitution happens before constant substitutions and
the actual compilation step. Macros can be defined in terms of other macros
(also recursively), and are evaluated using call by value (i.e., macro calls
in macro arguments are expanded before the macro gets applied to its
parameters).
Pure macros also have their limitations. Specifically, the left-hand side of a
macro rule must be a simple expression, just like in ordinary function
definitions. This restricts the kinds of expressions which can be rewritten by
a macro. But Pure macros are certainly powerful enough for most common
preprocessing purposes, while still being robust and easy to use.
Here is a simple example, showing a rule which expands saturated calls of the
succ function (defined in the prelude) at compile time:
> def succ x = x+1;
> foo x::int = succ (succ x);
> show foo
foo x::int = x+1+1;
Rules like these can be useful to help the compiler generate better code. Note
that a macro may have the same name as an ordinary Pure function, which is
essential if you want to optimize calls to an existing function, as in the
previous example. (Just like ordinary functions, the number of parameters in
each rule for a given macro must be the same, but a macro may have a different
number of arguments than the corresponding function.)
A somewhat more practical example is the following rule from the prelude,
which eliminates saturated instances of the right-associative function
application operator:
Like in Haskell, this low-priority operator is handy to write cascading
function calls. With the above macro rule, these will be “inlined” as ordinary
function applications automatically. Example:
> foo x = bar $ bar $ 2*x;
> show foo
foo x = bar (bar (2*x));
Here are two slightly more tricky rules from the prelude, which optimize the
case of “throwaway” list comprehensions. This is useful if a list
comprehension is evaluated solely for its side effects:
def void (catmap f x) = do f x;
def void (listmap f x) = do f x;
Note that the void function simply throws away its argument and
returns () instead. The do function applies a function to
every member of a list (like map), but throws away all intermediate
results and just returns (), which is much more efficient if you don’t
need those results anyway. These are both defined in the prelude.
Before we delve into this example, a few remarks are in order about the way
list comprehensions are implemented in Pure. As already mentioned, list
comprehensions are just syntactic sugar; the compiler immediately transforms
them to an equivalent expression involving only lambdas and a few other list
operations. Note that list comprehensions are essentially equivalent to piles
of nested lambdas, filters and maps, but for various reasons they are actually
implemented using two special helper operations, catmap and
listmap.
The catmap operation combines map and cat; this is
needed, in particular, to accumulate the results of nested generators, such as
[i,j | i = 1..n; j = 1..m]. The same operation is also used to implement
filter clauses, you can see this below in the examples. However, for
efficiency simple generators like [2*i | i = 1..n] are translated to a
listmap instead (which is basically just map, but works with
different aggregate types, so that list comprehensions can draw values from
aggregates other than lists, such as matrices).
Now let’s see how the rules above transform a list comprehension if we
“voidify” it:
> using system;
> f = [printf "%g\n" (2^x+1) | x=1..5; x mod 2];
> g = void [printf "%g\n" (2^x+1) | x=1..5; x mod 2];
> show f g
f = catmap (\x -> if x mod 2 then [printf "%g\n" (2^x+1)] else []) (1..5);
g = do (\x -> if x mod 2 then [printf "%g\n" (2^x+1)] else []) (1..5);
Ok, so the catmap got replaced with a do which is just what we
need to make this code go essentially as fast as a for loop in
conventional programming languages (up to constant factors, of course). Here’s
how it looks like when we run the g function:
It’s not all roses, however, since the above macro rules will only get rid of
the outermost catmap if the list comprehension binds multiple
variables:
> u = void [puts $ str (x,y) | x=1..2; y=1..3];
> show u
u = do (\x -> listmap (\y -> puts (str (x,y))) (1..3)) (1..2);
If you’re bothered by this, you’ll have to apply void recursively,
creating a nested list comprehension which expands to a nested do:
> v = void [void [puts $ str (x,y) | y=1..3] | x=1..2];
> show v
v = do (\x -> do (\y -> puts (str (x,y))) (1..3)) (1..2);
(It would be nice to have this handled automatically, but the left-hand side
of a macro definition must be a simple expression, and thus it’s not possible
to write a macro which descends recursively into the lambda argument of
catmap.)
Macros can also be recursive, in which case they usually consist of multiple
rules and make use of pattern-matching like ordinary function definitions. As
a simple example, let’s implement a Pure version of Lisp’s quasiquote which
allows you to create a quoted expression from a “template” while substituting
variable parts of the template. (For the sake of brevity, our definition is
somewhat simplified and does not cover some corner cases. See the Pure
distribution for a full version of this example.)
def quasiquote (unquote x) = x;
def quasiquote (f@_ (splice x)) = foldl ($) (quasiquote f) x;
def quasiquote (f@_ x) = quasiquote f (quasiquote x);
def quasiquote x = quote x;
(Note the f@_, which is an anonymous “as” pattern forcing the compiler to
recognize f as a function variable, rather than a literal function
symbol. See Head = Function in the Caveats and Notes section for an
explanation of this trick.)
The first rule above takes care of “unquoting” embedded subterms. The second
rule “splices” an argument list into an enclosing function application. The
third rule recurses into subterms of a function application, and the fourth
and last rule takes care of quoting the “atomic” subterms. Note that
unquote and splice themselves are just passive constructor symbols,
the real work is done by quasiquote, using foldl at runtime to
actually perform the splicing. (Putting off the splicing until runtime makes
it possible to splice argument lists computed at runtime.)
If we want, we can also add some syntactic sugar for Lisp weenies. (Note that
we cannot have ‘,‘ for unquoting, so we use ‘,$‘ instead.)
prefix 9 ` ,$ ,@ ;
def `x = quasiquote x; def ,$x = unquote x; def ,@x = splice x;
Examples:
> `(2*42+2^12);
2*42+2^12
> `(2*42+,$(2^12));
2*42+4096.0
> `foo 1 2 (,@'[2/3,3/4]) (5/6);
foo 1 2 (2/3) (3/4) (5/6)
> `foo 1 2 (,@'args) (5/6) when args = '[2/3,3/4] end;
foo 1 2 (2/3) (3/4) (5/6)
We mention in passing here that, technically, Pure macros are just as powerful
as (unconditional) term rewriting systems and thus they are
Turing-complete. This implies that a badly written macro may well send the
Pure compiler into an infinite recursion, which results in a stack overflow at
compile time. See Stack Size and Tail Recursion in the Caveats and Notes
section for information on how to deal with these by setting the
PURE_STACK environment variable.
The quasiquote macro in the preceding subsection also provides an example
of how you can use macros to define your own special forms. This works because
the actual evaluation of macro arguments is put off until runtime, and thus we
can safely pass them to built-in special forms and other constructs which
defer their evaluation at runtime. In fact, the right-hand side of a macro
rule may be an arbitrary Pure expression involving conditional expressions,
lambdas, binding clauses, etc. These are never evaluated during macro
substitution, they just become part of the macro expansion (after substituting
the macro parameters).
Here is another useful example of a user-defined special form, the macro
timex which employs the system function clock to report the cpu time
in seconds needed to evaluate a given expression, along with the computed
result:
> using system;
> def timex x = (clock-t0)/CLOCKS_PER_SEC,y when t0 = clock; y = x end;
> sum = foldl (+) 0L;
> timex $ sum (1L..100000L);
0.43,5000050000L
Note that the above definition of timex wouldn’t work as an ordinary
function definition, since by virtue of Pure’s basic eager evaluation strategy
the x parameter would have been evaluated already before it is passed to
timex, making timex always return a zero time value. Try it!
Here’s yet another example, which is handy if you need to trace function
calls. (Note that the interpreter also has its own built-in debugging
facility, see Debugging. However, the following macro allows you to trace
functions using your own custom output format, and may thus be useful in
situations where the built-in debugger is not appropriate.)
using system;
def trace f x y = printf "** exit %s: %s -> %s\n" (str f,str x,str y) $$ y
when y = printf "** call %s: %s\n: " (str f,str x) $$ gets $$ y end;
This macro is invoked with the function to be traced, the arguments (or
whatever you want to be printed as additional debugging information) and the
actual function call as parameters. (This is a rather simplistic version,
which just prints a prompt on function entry and the final reduction after the
call. You can easily make this as elaborate as you like. E.g., you might want
to keep track of recursive levels and profiling information, add various
interactive commands to selectively enable and disable tracing during the
evaluation, etc.)
We can still make this a bit more convenient by introducing the following
ordinary function definition:
trace f x = trace f x (f x);
This lets us patch up a call to trace a given function, as shown below,
without having to change the definition of the function at all. This trick
only works with global functions; for local functions you’ll have to add an
explicit call of the trace macro to the local definition yourself. Also
note that the definition above only works with functions taking a single
parameter; see the trace.pure example in the distribution for the full version
which can deal with any number of arguments.
// Uncomment this line to trace calls to the 'fact' function.
def fact n = trace fact n;
// Sample function to be traced.
fact n = if n>0 then n*fact(n-1) else 1;
Here’s a trace of the fact function obtained in this fashion (hit carriage
return after each ‘:‘ prompt to proceed with the computation):
> fact 2;
** call fact: 2
:
** call fact: 1
:
** call fact: 0
:
** exit fact: 0 -> 1
** exit fact: 1 -> 1
** exit fact: 2 -> 2
2
Note that by just removing the macro definition for fact above, you can
make the function run untraced as usual again. This scheme is quite flexible,
the only real drawback is that you have to explicitly add some code for each
function you want to trace.
Pure macros are lexically scoped, i.e., the binding of symbols in the
right-hand-side of a macro definition is determined statically by the text of
the definition, and macro parameter substitution also takes into account
binding constructs, such as with and when clauses, in
the right-hand side of the definition. Macro facilities with these pleasant
properties are also known as hygienic macros. They are not susceptible to
so-called “name capture,” which makes macros in less sophisticated languages
bug-ridden and hard to use.
Macro hygiene is a somewhat esoteric topic for most programmers, so let us
take a brief look at what it’s all about. The problem avoided by hygienic
macros is that of name capture. There are actually two kinds of name capture
which may occur in unhygienic macro systems:
- A free symbol in the macro body inadvertently becomes bound to the value
of a local symbol in the context in which the macro is called.
- A free symbol in the macro call inadvertently becomes bound to the value
of a local symbol in the macro body.
Pure’s hygienic macros avoid both pitfalls. Here is an example for the first
form of name capture:
> def G x = x+y;
> G 10 when y = 99 end;
10+y
Note that the expansion of the G macro correctly uses the global instance
of y, even though y is locally defined in the context of the macro
call. (In some languages this form of name capture is sometimes used
deliberately in order to make the macro use the binding of the symbol which is
active at the point of the macro call. This never works in Pure, hence in such
cases you will have to explicitly pass such symbols to the macro.)
In contrast, the second form of name capture is usually not intended, and is
therefore more dangerous. Consider the following example:
> def F x = x+y when y = x+1 end;
> F y;
y+(y+1)
Pure again gives the correct result here. You’d have to be worried if you got
(y+1)+(y+1) instead, which would result from the literal expansion y+y
when y = y+1 end, where the (free) variable y passed to F gets
captured by the local binding of y. In fact, that’s exactly what you get
with C macros:
#define F(x) { int y = x+1; return x+y; }
Here F(y) expands to { int y = y+1; return y+y; } which is usually
not what you want.
Pure makes it very easy to call C functions (as well as functions in a number
of other languages supported by the GNU compiler collection). To call an
existing C function, you just need an extern declaration of the
function, as described below. By these means, all functions in the standard C
library and the Pure runtime are readily available to Pure scripts. Functions
can also be loaded from dynamic libraries and LLVM bitcode files at
runtime. In the latter case, you don’t even need to write any
extern declarations, the interpreter will do that for you. As of
Pure 0.45, you can also add inline C/C++ and Fortran code to your Pure scripts
and have the Pure interpreter compile them on the fly, provided that you have
the corresponding compilers from the LLVM project installed.
In some cases you will still have to rely on big and complicated third-party
and system libraries which aren’t readily available in bitcode form. It goes
without saying that writing all the extern declarations for such
libraries can be a daunting task. Fortunately, there is a utility to help with
this, by extracting the extern declarations automatically from C
headers. Please see External C Functions in the Caveats and Notes
section for details.
To access an existing C function in Pure, you need an extern
declaration of the function, which is a simplified kind of C prototype. The
syntax of these declarations is described by the following grammar rules:
extern_decl ::= [scope] "extern" prototype ("," prototype) ";"
prototype ::= c_type identifier "(" [parameters] ")" ["=" identifier]
parameters ::= parameter ("," parameter)*
parameter ::= c_type [identifier]
c_type ::= identifier "*"*
Extern functions can be called in Pure just like any other. For instance, the
following commands, entered interactively in the interpreter, let you use the
sin function from the C library (of course you could just as well put the
extern declaration into a script):
> extern double sin(double);
> sin 0.3;
0.29552020666134
An extern declaration can also be prefixed with a
public/private scope specifier:
private extern double sin(double);
Multiple prototypes can be given in one extern declaration,
separating them with commas:
extern double sin(double), double cos(double), double tan(double);
For clarity, the parameter types can also be annotated with parameter names
(these only serve informational purposes and are for the human reader; they
are effectively treated as comments by the compiler):
extern double sin(double x);
Pointer types are indicated by following the name of the element type with one
or more asterisks, as in C. For instance:
> extern char* strchr(char *s, int c);
> strchr "foo bar" (ord "b");
"bar"
As you can see in the previous example, some pointer types get special
treatment, allowing you to pass certain kinds of Pure data (such as Pure
strings as char* in this example). This is discussed in more detail in C
Types below.
The interpreter makes sure that the parameters in a call match; if not, then
by default the call is treated as a normal form expression:
> extern double sin(double);
> sin 0.3;
0.29552020666134
> sin 0;
sin 0
This gives you the opportunity to augment the external function with your own
Pure equations. To make this work, you have to make sure that the
extern declaration of the function comes first. For instance, we
might want to extend the sin function with a rule to handle integers:
> sin x::int = sin (double x);
> sin 0;
0.0
Sometimes it is preferable to replace a C function with a wrapper function
written in Pure. In such a case you can specify an alias under which the
original C function is known to the Pure program, so that you can still call
the C function from the wrapper. An alias is introduced by terminating the
extern declaration with a clause of the form = alias. For instance:
> extern double sin(double) = c_sin;
> sin x::double = c_sin x;
> sin x::int = c_sin (double x);
> sin 0.3; sin 0;
0.29552020666134
0.0
As an alternative, you can also declare the C function in a special namespace
(cf. Namespaces in the Declarations section):
> namespace c;
> extern double sin(double);
> c::sin 0.3;
0.29552020666134
Note that the namespace qualification only affects the Pure side; the
underlying C function is still called under the unqualified name as usual. The
way in which such qualified externs are accessed is the same as for ordinary
qualified symbols. In particular, the using namespace declaration
applies as usual, and you can declare such symbols as private if
needed. It is also possible to combine a namespace qualifier with an alias:
> namespace c;
> extern double sin(double) = mysin;
> c::mysin 0.3;
0.29552020666134
Different aliases of the same external function (or, equivalently, instances
of the function declared in different namespaces) can be declared in slightly
different ways, which makes it possible to adjust the interpretation of
pointer values on the Pure side. This is particularly useful for string
arguments which, as described below, may be passed both as char* (which
implies copying and conversion to or from the system encoding) and as
void* (which simply passes through the character pointers). For instance:
> extern char *strchr(char *s, int c) = foo;
> extern void *strchr(void *s, int c) = bar;
> foo "foo bar" 98; bar "foo bar" 98;
"bar"
#<pointer 0x12c2f24>
Also note that, as far as Pure is concerned, different aliases of an external
function are really different functions. In particular, they can each have
their own set of augmenting Pure equations. For instance:
> extern double sin(double);
> extern double sin(double) = mysin;
> sin === sin;
1
> sin === mysin;
0
> sin 1.0; mysin 1.0;
0.841470984807897
0.841470984807897
> sin x::int = sin (double x);
> sin 1; mysin 1;
0.841470984807897
mysin 1
As indicated in the previous section, the data types in extern
declarations are either C type names or pointer types derived from these. The
special expr* pointer type is simply passed through; this provides a means
to deal with Pure data in C functions in a direct fashion. For all other C
types, Pure values are “marshalled” (converted) from Pure to C when passed as
arguments to C functions, and the result returned by the C function is then
converted back from C to Pure. All of this is handled by the runtime system in
a transparent way, of course.
Note that, to keep things simple, Pure does not provide any notations for C
structs or function types, although it is possible to represent pointers to
such objects using void* or some other appropriate pointer types. In
practice, this simplified system should cover most kinds of calls that need to
be done when interfacing to C libraries, but there are ways to work around
these limitations if you need to access C structs or call back from C to Pure,
see External C Functions in the Caveats and Notes section for details.
Pure supports the usual range of basic C types: void, bool, char,
short, int, long, float, double, and converts between
these and the corresponding Pure data types (machine ints, bigints and double
values) in a straightforward way.
The void type is only allowed in function results. It is converted to the
empty tuple ().
Both float and double are supported as floating point types. Single
precision float arguments and return values are converted from/to Pure’s
double precision floating point numbers.
A variety of C integer types (bool, char, short, int,
long) are provided which are converted from/to the available Pure integer
types in a straightforward way. In addition, the synonyms int8, int16
and int32 are provided for char, short and int, respectively,
and int64 denotes 64 bit integers (a.k.a. ISO C99 long long). Note
that long is equivalent to int32 on 32 bit systems, whereas it is the
same as int64 on most 64 bit systems. To make it easier to interface to
various system routines, there’s also a special size_t integer type which
usually is 4 bytes on 32 bit and 8 bytes on 64 bit systems.
All integer parameters take both Pure ints and bigints as actual arguments;
truncation or sign extension is performed as needed, so that the C interface
behaves as if the argument was “cast” to the C target type. Returned integers
use the smallest Pure type capable of holding the result, i.e., int for the C
char, short and int types, bigint for int64.
Pure considers all integers as signed quantities, but it is possible to pass
unsigned integers as well (if necessary, you can use a bigint to pass positive
values which are too big to fit into a machine int). Also note that when an
unsigned integer is returned by a C routine, which is too big to fit into the
corresponding signed integer type, it will “wrap around” and become
negative. In this case, depending on the target type, you can use the
ubyte, ushort, uint, ulong and uint64
functions provided by the prelude to convert the result back to an unsigned
quantity.
The use of pointer types is also fairly straightforward, but Pure has some
special rules for the conversion of certain pointer types which make it easy
to pass aggregate Pure data to and from C routines, while also following the
most common idioms for pointer usage in C. The following types of pointers are
recognized both as arguments and return values of C functions.
Bidirectional pointer conversions:
- char* is used for string arguments and return values which are converted
from Pure’s internal utf-8 based string representation to the system
encoding and vice versa. (Thus a C routine can never modify the raw Pure
string data in-place; if this is required then you’ll have to pass the
string argument as a void*, see below.)
- void* is for any generic pointer value, which is simply passed through
unchanged. When used as an argument, you can also pass Pure strings,
matrices and bigints. In this case the raw underlying data pointer
(char* in the case of strings, int*, double* or expr* in the
case of numeric and symbolic matrices, and the GMP type mpz_t in the
case of bigints) is passed, which allows the data to be modified in place
(with care). In particular, passing bigints as void* makes it possible
to call most GMP integer routines directly from Pure.
- dmatrix*, cmatrix* and imatrix* allow you to pass numeric Pure
matrices of the appropriate types (double, complex, int). Here a pointer to
the underlying GSL matrix structure is passed (not just the data itself).
This makes it possible to transfer GSL matrices between Pure and GSL
routines in a direct fashion without any overhead. (For convenience, there
are also some other pointer conversions for marshalling matrix arguments to
numeric C vectors, which are described in Pointers and Matrices below.)
- expr* is for any kind of Pure value. A pointer to the expression node is
passed to or from the C function. This type is to be used for C routines
which are prepared to deal with pristine Pure data, using the corresponding
functions provided by the runtime. You can find many examples of this in the
standard library.
All other pointer types are simply taken at face value, allowing you to pass
Pure pointer values as is, without any conversions. This also includes
pointers to arbitrary named types which don’t have a predefined meaning in
Pure, such as FILE*. As of Pure 0.45, the interpreter keeps track of the
actual names of all pointer types and checks (at runtime) that the types match
in an external call, so that you can’t accidentally get a core dump by
passing, say, a FILE* for a char*. (The call will then simply fail and
yield a normal form, which gives you the opportunity to hook into the function
with your own Pure definitions which may supply any desired data conversions.)
Typing information about pointer values is also available to Pure scripts by
means of corresponding library functions, please see the Tagged Pointers section in the Pure Library Manual for details.
The following additional pointer conversions are provided to deal with Pure
matrix values in arguments of C functions, i.e., on the input side. These
enable you to pass Pure matrices for certain kinds of C vectors. Note that in
any case, you can also simply pass a suitable plain pointer value instead.
Also, these types aren’t special in return values, where they will simply
yield a pointer value (with the exception of char* which gets special
treatment as explained in the previous subsection). Thus you will have to
decode such results manually if needed. The standard library provides various
routines to do this, please see the String Functions and Matrix Functions sections in the Pure Library Manual for details.
Numeric pointer conversions (input only):
- char*, short*, int*, int64*, float*, double* can be
used to pass numeric matrices as C vectors. This kind of conversion passes
just the matrix data (not the GSL matrix structure, as the dmatrix* et
al conversions do) and does conversions between integer or floating point
data of different sizes on the fly. You can either pass an int matrix as a
char*, short* int* or int64* argument, or a double or
complex matrix as a float* or double* argument (complex values are
then represented as two separate double numbers, first the real, then the
imaginary part, for each matrix element).
- char**, short**, int**, int64**, float**, double**
provide yet another way to pass numeric matrix arguments. This works
analogously to the numeric vector conversions above, but here a temporary C
vector of pointers is passed to the C function, whose elements point to the
rows of the matrix.
Argv-style conversions (input only):
- char** and void** can be used to pass argv-style vectors as
arguments to C functions. In this case, the Pure argument must be a symbolic
vector of strings or generic pointer values. char** converts the string
elements to the system encoding, whereas void** passes through character
string data and other pointers unchanged (and allows in-place modification
of the data). A temporary C vector of these elements is passed to the C
function, which is always NULL-terminated and can thus be used for
almost any purpose which requires such argv-style vectors.
Note that in the numeric pointer conversions, the matrix data is passed “per
reference” to C routines, i.e., the C function may modify the data “in
place”. This is true even for target data types such as short* or
float** which involve automatic conversions and hence need temporary
storage. In this case the data from the temporary storage is written back to
the original matrix when the function returns, to maintain the illusion of
in-place modification. Temporary storage is also needed when the GSL matrix
has the data in non-contiguous storage. You may want to avoid this if
performance is critical, by always using “packed” matrices (see pack
in Matrix Functions) of the appropriate types.
Let’s finally have a look at some instructive examples to explain some of the
trickier pointer types.
First, the matrix pointer types dmatrix*, cmatrix* and imatrix*
can be used to pass double, complex double and int matrices to GSL functions
taking pointers to the corresponding GSL types (gsl_matrix,
gsl_matrix_complex and gsl_matrix_int) as arguments or returning them
as results. (Note that there is no special marshalling of Pure’s symbolic
matrix type, as these aren’t supported by GSL anyway.) Also note that matrices
are always passed by reference. Thus, if you need to pass a matrix as an
output parameter of a GSL matrix routine, you should either create a zero
matrix or a copy of an existing matrix to hold the result. The prelude
provides various operations for that purpose (in particular, see the
dmatrix, cmatrix, imatrix and pack functions
in matrices.pure). For instance, here is how you can quickly wrap up GSL’s
double matrix addition function in a way that preserves value semantics:
> using "lib:gsl";
> extern int gsl_matrix_add(dmatrix*, dmatrix*);
> x::matrix + y::matrix = gsl_matrix_add x y $$ x when x = pack x end;
> let x = dmatrix {1,2,3}; let y = dmatrix {2,3,2}; x; y; x+y;
{1.0,2.0,3.0}
{2.0,3.0,2.0}
{3.0,5.0,5.0}
Most GSL matrix routines can be wrapped in this fashion quite easily. A
ready-made GSL interface providing access to all of GSL’s numeric functions is
in the works; please check the Pure website for details.
For convenience, it is also possible to pass any kind of numeric matrix for a
char*, short*, int*, int64*, float* or double*
parameter. This requires that the pointer and the matrix type match up;
conversions between char, short, int64 and int data and,
likewise, between float and double are handled automatically,
however. For instance, here is how you can call the puts routine from the
C library with an int matrix encoding the string "Hello, world!" as byte
values (ASCII codes):
> extern int puts(char*);
> puts {72,101,108,108,111,44,32,119,111,114,108,100,33,0};
Hello, world!
14
Pure 0.45 and later also support char**, short**, int**,
int64**, float** and double** parameters which encode a matrix as
a vector of row pointers instead. This kind of matrix representation is often
found in audio and video processing software (where the rows of the matrix
might denote different audio channels, display lines or video frames), but
it’s also fairly convenient to do any kind of matrix processing in C. For
instance, here’s how to do matrix multiplication (the naive algorithm):
void matmult(int n, int l, int m, double **x, double **y, double **z)
{
int i, j, k;
for (i = 0; i < n; i++)
for (j = 0; j < m; j++) {
z[i][j] = 0.0;
for (k = 0; k < l; k++)
z[i][j] += x[i][k]*y[k][j];
}
}
As you can see, this multiplies a n times l matrix x with a l
times m matrix y and puts the result into the n times m matrix
z:
> extern void matmult(int, int, int, double**, double**, double**);
> let x = {0.11,0.12,0.13;0.21,0.22,0.23};
> let y = {1011.0,1012.0;1021.0,1022.0;1031.0,1032.0};
> let z = dmatrix (2,2);
> matmult 2 3 2 x y z $$ z;
{367.76,368.12;674.06,674.72}
Also new in Pure 0.45 is the support for passing argv-style vectors as
arguments. For instance, here is how you can use fork and execvp to
implement a poor man’s version of the C system function. (This is
UNIX-specific and doesn’t do much error-checking, but you get the idea.)
extern int fork();
extern int execvp(char *path, char **argv);
extern int waitpid(int pid, int *status, int options);
system cmd::string = case fork of
// child: execute the program, bail out if error
0 = execvp "/bin/sh" {"/bin/sh","-c",cmd} $$ exit 1;
// parent: wait for the child and return its exit code
pid = waitpid pid status 0 $$ status!0 >> 8
when status = {0} end if pid>=0;
end;
system "echo Hello, world!";
system "ls -l *.pure";
system "exit 1";
By default, external C functions are resolved by the LLVM runtime, which first
looks for the symbol in the C library and Pure’s runtime library (or the
interpreter executable, if the interpreter was linked statically). Thus all C
library and Pure runtime functions are readily available in Pure programs.
Other functions can be provided by adding them to the runtime, or by linking
them into the runtime or the interpreter executable. Better yet, you can just
“dlopen” shared libraries at runtime with a special form of the
using clause:
using "lib:libname[.ext]";
For instance, if you want to call the functions from library libxyz directly
from Pure:
After this declaration the functions from the given library will be ready to
be imported into your Pure program by means of corresponding extern
declarations.
Shared libraries opened with using clauses are searched for in the same way as
source scripts (see section Modules and Imports above), using the
-L option and the PURE_LIBRARY environment variable in
place of -I and PURE_INCLUDE. If the library isn’t found
by these means, the interpreter will also consider other platform-specific
locations searched by the dynamic linker, such as the system library
directories and LD_LIBRARY_PATH on Linux. The necessary filename
suffix (e.g., .so on Linux or .dll on Windows) will be supplied automatically
when needed. Of course you can also specify a full pathname for the library if
you prefer that. If a library file cannot be found, or if an extern
declaration names a function symbol which cannot be resolved, an appropriate
error message is printed.
As of Pure 0.44, the interpreter also provides a direct way to import LLVM
bitcode modules in Pure scripts. The main advantage of this method over the
“plain” C interface explained above is that the bitcode loader knows all the
call interfaces and generates the necessary extern declarations
automatically. This is more than just a convenience, as it also eliminates at
least some of the mistakes in extern declarations that may arise
when importing functions manually from dynamic libraries.
LLVM bitcode is loaded in a Pure script using the following special format of
the using clause:
(Here the bc tag indicates a bitcode file, and the default .bc bitcode
filename extension is supplied automatically. Also, the bitcode file is
searched for on the usual library search path.)
That’s it, no explicit extern declarations are required on the Pure
side. The Pure interpreter automatically creates extern
declarations (in the current namespace) for all the external functions defined
in the LLVM bitcode module, and generates the corresponding wrappers to make
the functions callable from Pure. (This also works when batch-compiling a Pure
script. In this case, the bitcode file actually gets linked into the output
code, so the loaded bitcode module only needs to be present at compile time.)
By default the imported symbols will be public. You can also specify the
desired scope of the symbols explicitly, by placing the public or
private keyword before the module name. For instance:
using private "bc:modname";
You can also import the same bitcode module several times, possibly in
different namespaces. This will not actually reload the module, but it will
create aliases for the external functions in different namespaces:
namespace foo;
using "bc:modname";
namespace bar;
using private "bc:modname";
You can load any number of bitcode modules along with shared libraries in a
Pure script, in any order. The JIT will try to satisfy external references in
modules and libraries from other loaded libraries and bitcode modules. This is
deferred until the code is actually JIT-compiled, so that you can make sure
beforehand that all required libraries and bitcode modules have been loaded.
If the JIT fails to resolve a function, the interpreter will print its name
and also raise an exception at runtime when the function is being called from
other C code. (You can then run your script in the debugger to locate the
external visible in Pure from which the unresolved function is called.)
Let’s take a look at a concrete example to see how this actually
works. Consider the following C code which defines a little function to
compute the greatest common divisor of two (machine) integers:
int mygcd(int x, int y)
{
if (y == 0)
return x;
else
return mygcd(y, x%y);
}
Let’s say that this code is in the file mygcd.c, then you’d compile it to
a bitcode module using llvm-gcc as follows:
llvm-gcc -emit-llvm -c mygcd.c -o mygcd.bc
Or, if you prefer to use clang, the new LLVM-based C/C++ compiler:
clang -emit-llvm -c mygcd.c -o mygcd.bc
Note that the -emit-llvm -c options instruct llvm-gcc or clang to build an
LLVM bitcode module. Of course, you can also add optimizations and other
options to the compile command as desired.
You can now load the resulting bitcode module and run the mygcd function
in the Pure interpreter simply as follows:
> using "bc:mygcd";
> mygcd 75 105;
15
To actually see the generated extern declaration of the imported
function, you can use the interactive show command:
> show mygcd
extern int mygcd(int, int);
Some more examples showing how to use the bitcode interface can be found in
the Pure sources. In particular, the interface also works with Fortran (using
llvm-gfortran), and there is special support for interfacing to Grame’s
functional DSP programming language Faust (the latter uses a special variant
of the bitcode loader, which is selected with the dsp tag in the
using clause). Please refer to the corresponding examples in the
distribution for further details.
Please note that at this time the LLVM bitcode interface is still somewhat
experimental, and there are some known limitations:
- LLVM doesn’t distinguish between char* and void* in bitcode, so all
void* parameters and return values in C code will be promoted to
char* on the Pure side. Also, pointers to types which neither have a
predefined meaning in Pure nor a proper type name in the bitcode file, will
become a generic pointer type (void*, void**, etc.) in Pure. If this
is a problem then you can just redeclare the corresponding functions under
an alias after loading the bitcode module, giving the proper argument and
result types (see Extern Declarations above).
- The bitcode interface is limited to the same range of C types as Pure’s
plain C interface. In practice, this should cover most C code, but it’s
certainly possible that you run into unsupported types for arguments and
return values. The compiler will then print a warning; the affected
functions will still be linked in, but they will not be callable from Pure.
Also note that calling conventions for passing C structs by value depend
on the host ABI, so you should have a look at the resulting
extern declaration (using show) to determine how the function
is actually to be called from Pure.
Instead of manually compiling source files to bitcode modules, you can also
just place the source code into a Pure script, enclosing it in %{ ... %}.
(Optionally, the opening brace may also be preceded with a public
or private scope specifier, which is used in the same way as the
scope specifier following the using keyword when importing bitcode
files.)
For instance, here is a little script showing inline code for the mygcd
function from the previous subsection:
%{
int mygcd(int x, int y)
{
if (y == 0)
return x;
else
return mygcd(y, x%y);
}
%}
mygcd 75 105;
The interpreter automatically compiles the inlined code to LLVM bitcode which
is then loaded as usual. (Of course, this will only work if you have the
corresponding LLVM compilers installed.) This method has the advantage that
you don’t have to write a Makefile and you can create self-contained Pure
scripts which include all required external functions. The downside is that
the inline code sections will have to be recompiled every time you run the
script with the interpreter which may considerably increase startup times. If
this is a problem then it’s usually better to import a separate bitcode module
instead (see Importing LLVM Bitcode), or at least batch-compile your script
to an executable (see Batch Compilation).
Currently, C, C++, Fortran and Faust are supported as foreign source
languages, with llvm-gcc, llvm-g++, llvm-gfortran and faust2 as the
corresponding compilers. Alternatively, the LLVM clang and clang++ compilers
can be used for C/C++ compilation (this will actually be default if the Pure
interpreter itself was compiled with clang). Examples for all of these can be
found in the Pure sources.
C is the default language. The desired source language can be selected by
placing an appropriate tag into the inline code section, immediately after the
opening brace. (The tag is removed before the code is submitted to
compilation.) For instance:
%{ -*- Fortran90 -*-
function fact(n) result(p)
integer n, p
p = 1
do i = 1, n
p = p*i
end do
end function fact
%}
fact n::int = fact_ {n};
map fact (1..10);
As indicated, the language tag takes the form -*- lang -*- where lang
can currently be any of c, c++, fortran and dsp (the latter
indicates the Faust language). Case is insignificant here, so you can also
write C, C++, Fortran, DSP etc. For the fortran tag, you
may also have to specify the appropriate language standard, such as
fortran90 which is used in the example above. The language tag can also be
followed by a module name, using the format -*- lang:name -*-. This is
optional for all languages except Faust (where the module name specifies the
namespace for the interface routines of the Faust module). So, e.g., a Faust
DSP named test would be specified with a dsp:test tag. Case is
significant in the module name.
The Pure interpreter has some built-in knowledge on how to invoke the LLVM
compilers to produce a working bitcode file ready to be loaded by the
interpreter, so the examples above should work out of the box if you have the
required compilers installed on your PATH. However, there are also
some environment variables you can set for customization purposes.
Specifically, PURE_CC is the command to invoke the C compiler. This
variable lets you specify the exact name of the executable along with any
debugging and optimization options that you may want to add. Likewise,
PURE_CXX, PURE_FC and PURE_FAUST are used for
the C++, Fortran and Faust compilers, respectively.
For instance, if you prefer to use clang as your C compiler, and you’d like to
invoke it with the -O3 optimization option, you would set
PURE_CC to "clang -O3". (To verify the settings you made, you
can have the interpreter echo the compilation commands which are actually
executed, by running Pure with the -v0100 option, see Verbosity and
Debugging Options.)
In interactive mode, the interpreter reads definitions and expressions and
processes them as usual. You can use the -i option to force
interactive mode when invoking the interpreter with some script files.
Additional scripts can be loaded interactively using either a using
declaration or the interactive run command (see the description of the
run command below for the differences between these). Or you can just
start typing away, entering your own definitions and expressions to be
evaluated.
The input language is just the same as for source scripts, and hence
individual definitions and expressions must be terminated with a semicolon
before they are processed. For instance, here is a simple interaction which
defines the factorial and then uses that definition in some evaluations. Input
lines begin with ‘> ‘, which is the interpreter’s default command prompt:
> fact 1 = 1;
> fact n = n*fact (n-1) if n>1;
> let x = fact 10; x;
3628800
> map fact (1..10);
[1,2,6,24,120,720,5040,40320,362880,3628800]
As indicated, in interactive mode the normal forms of toplevel expressions are
printed after each expression is entered. We also call this the
read-eval-print loop. Normal form expressions are usually printed in the
same form as you’d enter them. However, there are a few special kinds of
objects like anonymous closures, thunks (“lazy” values to be evaluated when
needed) and pointers which don’t have a textual representation in the Pure
syntax and will be printed in the format #<object description> by
default. It is also possible to override the print representation of any kind
of expression by means of the __show__ function, see The __show__
Function for details.
Online help is available in the interpreter with the interactive help
command, see Interactive Commands below. You need to have a html browser
installed for that. By default, the help command uses w3m, but
you can change this by setting either the PURE_HELP or the
BROWSER environment variable accordingly.
As of Pure 0.46, the interpreter provides a much improved help command
which gives you access to all the available documentation in html format,
which includes this manual, the Pure Library Manual, as well as all manuals of the
addon modules available from the Pure website.
When invoked without arguments, the help command displays an overview of
the available documentation, from which you can follow the links to the
provided manuals:
(If the interpreter gives you an error message when you do this then you
haven’t installed the documentation yet. The complete set of manuals is
provided as a separate package at the Pure website, please see the Pure
installation instructions for details.)
The help command also accepts a parameter which lets you specify a search
term which is looked up in the global index, e.g.:
If the search term doesn’t appear in the index, it is assumed to be a topic (a
link target) in the Pure manual. Note that the docutils tools used to
generate the html source of the Pure documentation mangle the section titles
so that they are in lowercase and blanks are replaced with hyphens. So to look
up the present section in this manual you’d have to type:
The help files are in html format and located in the docs subdirectory of the
Pure library directory (i.e., /usr/local/lib/pure/docs by default). You can
look up topics in any of the help files with a command like the following:
Here pure-gsl is the basename of the help file (library path and .html
suffix are supplied automatically), and matrices is a link target in that
document. To just read the pure-gsl.html file without specifying a target,
type the following:
(Note that just help pure-gsl won’t work, since it would look for a search
term in the index or a topic in the Pure manual.)
Last but not least, you can also point the help browser to any html document
(either a local file or some website) denoted by a proper URL, provided that
your browser program can handle these. For instance:
> help file:mydoc.html#foo
> help http://pure-lang.googlecode.com
When running interactively, the interpreter accepts a number of special
commands useful for interactive purposes. Here is a quick rundown of the
currently supported operations:
-
! command
Shell escape.
-
break [symbol ...]
Sets breakpoints on the given function or operator symbols. All symbols
must be specified in fully qualified form, see the remarks below. If
invoked without arguments, prints all currently defined breakpoints. This
requires that the interpreter was invoked with the -g option to
enable debugging support. See Debugging below for details.
-
bt
Prints a full backtrace of the call sequence of the most recent evaluation,
if that evaluation ended with an unhandled exception. This requires that
the interpreter was invoked with the -g option to enable
debugging support. See Debugging below for details.
-
cd dir
Change the current working dir.
-
clear [option ...] [symbol ...]
Purge the definitions of the given symbols (functions, macros, constants or
global variables). All symbols must be specified in fully qualified form,
see the remarks below. If invoked as clear ans, clears the ans
value (see Last Result below). When invoked without any arguments,
clear purges all definitions at the current interactive “level” (after
confirmation) and returns you to the previous level, if any. (It might be a
good idea to first check your current definitions with show or back
them up with dump before you do that.) The desired level can be
specified with the -t option. See the description of the save
command and Definition Levels below for further details. A description
of the common options accepted by the clear, dump and show
commands can be found in Specifying Symbol Selections below.
-
del [symbol ...]
Deletes breakpoints and tracepoints on the given function or operator
symbols. All symbols must be specified in fully qualified form, see the
remarks below. If invoked without arguments, clears all currently defined
breakpoints and tracepoints (after confirmation). See Debugging below for
details.
-
dump [-n filename] [option ...] [symbol ...]
Dump a snapshot of the current function, macro, constant and variable
definitions in Pure syntax to a text file. All symbols must be specified in
fully qualified form, see the remarks below. This works similar to the
show command (see below), but writes the definitions to a file. The
default output file is .pure in the current directory, which is then
reloaded automatically the next time the interpreter starts up in
interactive mode in the same directory. This provides a quick-and-dirty way
to save an interactive session and have it restored later, but note that
this isn’t perfect. In particular, declarations of extern
symbols won’t be saved unless they’re specified explicitly, and some
objects like closures, thunks and pointers don’t have a textual
representation from which they could be reconstructed. To handle these,
you’ll probably have to prepare a corresponding .purerc file yourself, see
Interactive Startup below.
A different filename can be specified with the -n option, which expects
the name of the script to be written in the next argument, e.g: dump -n
myscript.pure. You can then edit that file and use it as a starting point
for an ordinary script or a .purerc file, or you can just run the file with
the run command (see below) to restore the definitions in a subsequent
interpreter session.
-
help [topic]
Display online documentation. If a topic is given, it is looked up in the
index. Alternatively, you can also specify a link target in any of the
installed help files, or any other html document denoted by a proper URL.
Please see Online Help above for details.
-
ls [args]
List files (shell ls command).
-
mem
Print current memory usage. This reports the number of expression cells
currently in use by the program, along with the size of the freelist (the
number of allocated but currently unused expression cells). Note that the
actual size of the expression storage may be somewhat larger than this,
since the runtime always allocates expression memory in bigger chunks.
Also, this figure does not reflect other heap-allocated memory in use by
the program, such as strings or malloc’ed pointers.
-
override
Enter “override” mode. This allows you to add equations “above” existing
definitions in the source script, possibly overriding existing
equations. See Definition Levels below for details.
-
pwd
Print the current working dir (shell pwd command).
-
quit
Exits the interpreter.
-
run [-g|script]
When invoked without arguments or with the -g option, run does a
“cold” restart of the interpreter, with the scripts and options given on
the interpreter’s original command line. If just -g is specified as the
argument, the interpreter is run with debugging enabled. Otherwise the
interpreter is invoked without debugging support. (This overrides the
corresponding option from the interpreter’s command line.) This command
provides a quick way to rerun the interpreter after changes in some of the
loaded script files, or if you want to enable or disable debugging on the
fly (which requires a restart of the interpreter). You’ll also loose any
definitions that you entered interactively in the interpreter, so you may
want to back them up with dump beforehand.
When invoked with a script name as argument, run loads the given script
file and adds its definitions to the current environment. This works more
or less like a using clause, but only searches for the script in
the current directory and places the definitions in the script at the
current temporary level, so that clear can be used to remove them
again. Also note that namespace and pragma settings of scripts loaded with
run stick around after loading the script. This allows you to quickly
set up your environment by just running a script containing the necessary
namespace declarations and compiler directives. (Alternatively, you can
also use the interpreter’s startup files for that purpose, see Interactive
Startup below.)
-
save
Begin a new level of temporary definitions. A subsequent clear command
(see above) will purge the definitions made since the most recent save
command. See Definition Levels below for details.
-
show [option ...] [symbol ...]
Show the definitions of symbols in various formats. See The show Command
below for details. All symbols must be specified in fully qualified form,
see the remarks below. A description of the common options accepted by the
clear, dump and show commands can be found in Specifying
Symbol Selections below.
-
stats [-m] [on|off]
Enables (default) or disables “stats” mode, in which some statistics are
printed after an expression has been evaluated. Invoking just stats or
stats on only prints the cpu time in seconds for each evaluation. If
the -m option is specified, memory usage is printed along with the cpu
time, which indicates the maximum amount of expression memory (in terms of
expression cells) used during the computation. Invoking stats off
disables stats mode, while stats -m off just disables the printing of
the memory usage statistics.
-
trace [symbol ...]
Sets tracepoints on the given function or operator symbols. This works like
the break command (see above) but only prints rule invocations and
reductions without actually interrupting the evaluation. See Debugging
below for details.
-
underride
Exits “override” mode. This returns you to the normal mode of operation,
where new equations are added “below” previous rules of an existing
function. See Definition Levels below for details.
Note that these special commands are only recognized at the beginning of the
interactive command line (they are not reserved keywords of the Pure
language). Thus it’s possible to “escape” identifiers looking like commands by
entering a space at the beginning of the line.
Also note that symbols (identifiers, operators etc.) must always be specified
in fully qualified form. No form of namespace lookup is performed by these
commands, so they always work the same no matter what namespace and
using namespace declarations are currently in effect.
Another convenience for interactive usage is the ans function, which
retrieves the most recent result printed in interactive mode. For instance:
> fact n = if n<=1 then 1 else n*fact (n-1);
> map fact (1..10);
[1,2,6,24,120,720,5040,40320,362880,3628800]
> scanl (+) 0 ans;
[0,1,3,9,33,153,873,5913,46233,409113,4037913]
Note that ans is just an ordinary function, defined in the prelude,
not a special command. However, there is a special clear ans command which
purges the ans value. This is useful, e.g., if you got a huge result which
you want to erase from memory before starting the next computation.
The clear, dump and show commands all accept the following options
for specifying a subset of symbols and definitions on which to operate. All
symbols must be specified in fully qualified form. Options may be combined,
thus, e.g., show -mft is the same as show -m -f -t. Some options
specify optional numeric parameters; these must follow immediately behind the
option character if present, as in -t0.
-c |
Selects defined constants. |
-f |
Selects defined functions. |
-g |
Indicates that the following symbols are actually shell glob patterns and
that all matching symbols should be selected. |
-m |
Select defined macros. |
-pflag |
Select only private symbols if flag is nonzero (the default), otherwise
(flag is zero) select only public symbols. If this option is omitted
then both private and public symbols are selected. |
-tlevel |
Select symbols and definitions at the given “level” of definitions and
above. This is described in more detail below. Briefly, the executing
program and all imported modules (including the prelude) are at level 0,
while “temporary” definitions made interactively in the interpreter are at
level 1 and above. Thus a level of 1 restricts the selection to all
temporary definitions, whereas 0 indicates all definitions (i.e.,
everything, including the prelude). If level is omitted, it defaults to
the current definitions level. |
-v |
Select defined variables. |
In addition, the -h option prints a short help message describing all
available options of the command at hand.
If none of the -c, -f, -m and -v options are specified, then
all kinds of symbols (constants, functions, macros and variables) are
selected, otherwise only the specified categories will be considered.
A reasonable default is used if the -t option is omitted. By default, if
no symbols are specified, only temporary definitions are considered, which
corresponds to -t1. Otherwise the command applies to all corresponding
definitions, no matter whether they belong to the executing program, the
prelude, or some temporary level, which has the same effect as -t0. This
default choice can be overridden by specifying the desired level explicitly.
As a special case, just clear (without any other options or symbol
arguments) always backs out to the previous definitions level (instead of
level #1). This is inconsistent with the rules set out above, but is
implemented this way for convenience and backward compatibility. Thus, if you
really want to delete all your temporary definitions, use clear -t1
instead. When used in this way, the clear command will only remove
temporary definitions; if you need to remove definitions at level #0, you must
specify those symbols explicitly.
Note that clear -g * will have pretty much the same disastrous
consequences as the Unix command rm -rf *, so don’t do that. Also note
that a macro or function symbol may well have defining equations at different
levels, in which case a command like clear -tn foo might only affect some
part of foo‘s definition. The dump and show commands work
analogously (albeit less destructively). See Definition Levels below for
some examples.
The show command can be used to obtain information about defined symbols
in various formats. Besides the common selection options discussed above, this
command recognizes the following additional options for specifying the content
to be listed and the format to use.
-a |
Disassembles pattern matching automata. Works like the -v4 option of
the interpreter. |
-d |
Disassembles LLVM IR, showing the generated LLVM assembler code of a
function. Works like the -v8 option of the interpreter. |
-e |
Annotate printed definitions with lexical environment information (de
Bruijn indices, subterm paths). Works like the -v2 option of the
interpreter. |
-l |
Long format, prints definitions along with the summary symbol
information. This implies -s. |
-s |
Summary format, print just summary information about listed symbols. |
Symbols are always listed in lexicographic order. Note that some of the
options (in particular, -a and -d) may produce excessive amounts of
information. By setting the PURE_MORE environment variable, you can
specify a shell command to be used for paging, usually more or
less.
For instance, to list all temporary definitions made in an interactive
session, simply say:
You can also list a specific symbol, no matter whether it comes from the
interactive command line, the executing script or the prelude:
> show foldl
foldl f a x::matrix = foldl f a (list x);
foldl f a s::string = foldl f a (chars s);
foldl f a [] = a;
foldl f a (x:xs) = foldl f (f a x) xs;
Wildcards can be used with the -g option, which is useful if you want to
print an entire family of related functions, e.g.:
> show -g foldl*
foldl f a x::matrix = foldl f a (list x);
foldl f a s::string = foldl f a (chars s);
foldl f a [] = a;
foldl f a (x:xs) = foldl f (f a x) xs;
foldl1 f x::matrix = foldl1 f (list x);
foldl1 f s::string = foldl1 f (chars s);
foldl1 f (x:xs) = foldl f x xs;
Or you can just specify multiple symbols as follows (this also works with
multiple glob patterns when you add the -g option):
> show min max
max x y = if x>=y then x else y;
min x y = if x<=y then x else y;
You can also select symbols by category. E.g., the following command shows
summary information about all the variable symbols along with their current
values (using the “long” format):
> show -lvg *
argc var argc = 0;
argv var argv = [];
compiling var compiling = 0;
sysinfo var sysinfo = "x86_64-unknown-linux-gnu";
version var version = "0.46";
5 variables
Or you can list just private symbols of the namespace foo, as follows:
The following command will list each and every symbol that’s currently defined
(instead of -g * you can also use the -t0 option):
This usually produces a lot of output and is rarely needed, unless you’d like
to browse through an entire program including all library imports. (In that
case you might consider to use the dump command instead, which writes the
definitions to a file which can then be loaded into a text editor for easier
viewing. This may occasionally be useful for debugging purposes.)
Finally, there are two alternate forms of the show command: show
namespace which lists the current and search namespaces, and show
namespaces which lists all declared namespaces. These come in handy if you
have forgotten what namespaces are currently active and which other namespaces
are available in your program. For instance:
> show namespace
> show namespaces
namespace C;
namespace matrix;
> using namespace C;
> namespace my;
> show namespace
namespace my;
using namespace C;
To help with incremental development, the interpreter offers some commands to
manipulate the current set of definitions interactively. To these ends,
definitions are organized into different subsets called levels. As already
mentioned, the prelude, as well as other source programs specified when
invoking the interpreter, are always at level 0, while the interactive
environment starts at level 1. Each save command introduces a new
temporary level, and each subsequent clear command (without any arguments)
“pops” the definitions on the current level and returns you to the previous
one (if any). This gives you a “stack” of temporary environments which enables
you to “plug and play” in a (more or less) safe fashion, without affecting the
rest of your program.
For all practical purposes, this stack is unlimited, so that you can create as
many levels as you like. However, this facility also has its limitations. The
interpreter doesn’t really keep a full history of everything you entered
interactively, it only records the level a variable, constant, and function or
macro rule belongs to so that the corresponding definitions can be removed
again when the level is popped. On the other hand, intermediate changes in
variable values are not recorded anywhere and cannot be undone. Moreover,
global declarations (which encompasses using clauses,
extern declarations and special symbol declarations) always apply
to all levels, so they can’t be undone either.
That said, the temporary levels can still be pretty useful when you’re playing
around with the interpreter. Here’s a little example which shows how to use
clear to quickly get rid of a definition that you entered interactively:
> foo (x:xs) = x+foo xs;
> foo [] = 0;
> show
foo (x:xs) = x+foo xs;
foo [] = 0;
> foo (1..10);
55
> clear
This will clear all temporary definitions at level #1.
Continue (y/n)? y
> show
> foo (1..10);
foo [1,2,3,4,5,6,7,8,9,10]
We’ve seen already that normally, if you enter a sequence of equations, they
will be recorded in the order in which they were written. However, it is also
possible to override definitions in lower levels with the override
command:
> foo (x:xs) = x+foo xs;
> foo [] = 0;
> show
foo (x:xs) = x+foo xs;
foo [] = 0;
> foo (1..10);
55
> save
save: now at temporary definitions level #2
> override
> foo (x:xs) = x*foo xs;
> show
foo (x:xs) = x*foo xs;
foo (x:xs) = x+foo xs;
foo [] = 0;
> foo (1..10);
warning: rule never reduced: foo (x:xs) = x+foo xs;
0
Note that the equation foo (x:xs) = x*foo xs was inserted before the
previous rule foo (x:xs) = x+foo xs, which is at level #1. (The latter
equation is now “shadowed” by the rule we just entered, hence the compiler
warns us that this rule can’t be reduced any more.)
Even in override mode, new definitions will be added after other definitions
at the current level. This allows us to just continue adding more
high-priority definitions overriding lower-priority ones:
> foo [] = 1;
> show
foo (x:xs) = x*foo xs;
foo [] = 1;
foo (x:xs) = x+foo xs;
foo [] = 0;
> foo (1..10);
warning: rule never reduced: foo (x:xs) = x+foo xs;
warning: rule never reduced: foo [] = 0;
3628800
Again, the new equation was inserted above the existing lower-priority rules,
but below our previous equation foo (x:xs) = x*foo xs entered at the same
level. As you can see, we have now effectively replaced our original
definition of foo with a version that calculates list products instead of
sums, but of course we can easily go back one level to restore the previous
definition:
> clear
This will clear all temporary definitions at level #2.
Continue (y/n)? y
clear: now at temporary definitions level #1
clear: override mode is on
> show
foo (x:xs) = x+foo xs;
foo [] = 0;
> foo (1..10);
55
Note that clear reminded us that override mode is still enabled (save
will do the same if override mode is on while pushing a new definitions
level). To turn it off again, use the underride command. This will revert
to the normal behaviour of adding new equations below existing ones:
It’s also possible to use clear to back out multiple levels at once, if
you specify the target level to be cleared with the -t option. For instance:
> save
save: now at temporary definitions level #2
> let bar = 99;
> show
let bar = 99;
foo (x:xs) = x+foo xs;
foo [] = 0;
> // this scraps all our scribblings!
> clear -t1
This will clear all temporary definitions at level #1 and above.
Continue (y/n)? y
clear: now at temporary definitions level #1
> show
>
Finally, it is worth noting here that the facilities described above are also
available to Pure programs, as the save and clear commands can also be
executed under program control using the evalcmd primitive; see
Reflection in the Caveats and Notes section for details.
The interpreter provides a simple but reasonably convenient symbolic debugging
facility when running interactively. To make this work, you have to specify
the -g option when invoking the interpreter (pure -g). If you’re
already at the interpreter’s command line, you can also use the run -g
command to enable the debugger. The -g option disables tail call
optimization (see Stack Size and Tail Recursion) to make it easier to debug
programs. It also causes special debugging code to be generated which will
make your program run much slower. Therefore the -g option should
only be used if you actually need the debugger.
One common use of the debugger is “post mortem” debugging after an evaluation
ended with an unhandled exception. In such a case, the bt command of the
interpreter prints a backtrace of the call sequence which caused the
exception. Note that this only works if debugging mode was enabled. For
instance:
> [1,2]!3;
<stdin>, line 2: unhandled exception 'out_of_bounds' while evaluating '[1,2]!3'
> bt
[1] (!): (x:xs)!n::int = xs!(n-1) if n>0;
n = 3; x = 1; xs = [2]
[2] (!): (x:xs)!n::int = xs!(n-1) if n>0;
n = 2; x = 2; xs = []
[3] (!): []!n::int = throw out_of_bounds;
n = 1
>> [4] throw: extern void pure_throw(expr*) = throw;
x1 = out_of_bounds
The last call, which is also marked with the >> symbol, is the call that
raised the exception. The format is similar to the p command of the
debugger, see below, but bt always prints a full backtrace. (As with the
show command of the interpreter, you can set the PURE_MORE
environment variable to pipe the output through the corresponding command, or
use evalcmd to capture the output of bt in a string.)
The debugger can also be used interactively. To these ends, you can set
breakpoints on functions with the break command. The debugger then gets
invoked as soon as a rule for one of the given functions is
executed. Example:
> fact n::int = if n>0 then n*fact (n-1) else 1;
> break fact
> fact 1;
** [1] fact: fact n::int = if n>0 then n*fact (n-1) else 1;
n = 1
(Type 'h' for help.)
:
** [2] fact: fact n::int = if n>0 then n*fact (n-1) else 1;
n = 0
:
++ [2] fact: fact n::int = if n>0 then n*fact (n-1) else 1;
n = 0
--> 1
** [2] (*): x::int*y::int = x*y;
x = 1; y = 1
:
++ [2] (*): x::int*y::int = x*y;
x = 1; y = 1
--> 1
++ [1] fact: fact n::int = if n>0 then n*fact (n-1) else 1;
n = 1
--> 1
1
Lines beginning with ** indicate that the evaluation was interrupted to
show the rule (or external) which is currently being considered, along with
the current depth of the call stack, the invoked function and the values of
parameters and other local variables in the current lexical environment. In
contrast, the prefix ++ denotes reductions which were actually performed
during the evaluation and the results that were returned by the function call
(printed as --> return value).
Sometimes you might also see funny symbols like #<closure>, #<case> or
#<when> instead of the function name. These indicate lambdas and the
special variable-binding environments, which are all implemented as anonymous
closures in Pure. Also note that the debugger doesn’t know about the argument
names of external functions (which are optional in Pure and not recorded
anywhere), so it will display the generic names x1, x2 etc. instead.
At the debugger prompt ‘:‘ you can enter various special debugger
commands, or just keep on hitting the carriage return key to walk through an
evaluation step by step, as we did in the example above. (Command line editing
works as usual at the debugger prompt, if it is enabled.) The usual commands
are provided to walk through an evaluation, print and navigate the call stack,
step over the current call, or continue the evaluation unattended until you
hit another breakpoint. If you know other source level debuggers like
gdb then you should feel right at home. You can type h at the
debugger prompt to print the following list:
: h
Debugger commands:
a auto: step through the entire program, run unattended
c [f] continue until next breakpoint, or given function f
h help: print this list
n next step: step over reduction
p [n] print rule stack (n = number of frames)
r run: finish evaluation without debugger
s single step: step into reduction
t, b move to the top or bottom of the rule stack
u, d move up or down one level in the rule stack
x exit the interpreter (after confirmation)
. reprint current rule
! cmd shell escape
? expr evaluate expression
<cr> single step (same as 's')
<eof> step through program, run unattended (same as 'a')
The command syntax is very simple. Besides the commands listed above you can
also enter comment lines (// comment text) which will just be
ignored. Extra arguments on commands which don’t expect any will generally be
ignored as well. The single letter commands all have to be separated from any
additional parameters with whitespace, whereas the ‘!‘, ‘?‘ and
‘.‘ commands count as word delimiters and can thus be followed immediately
by an argument. For convenience, the ‘?‘ command can also be omitted if
the expression to be evaluated doesn’t start with a single letter or one of
the special punctuation commands.
The debugger can be exited or suspended in the following ways:
- You can type c to continue the evaluation until the next breakpoint, or
c foo in order to proceed until the debugger hits an invokation of the
function foo.
- You can type r to run the rest of the evaluation without the debugger.
- The a (“auto”) command single-steps through the rest of the evaluation,
running unattended. This command can also be entered by just hitting the
end-of-file key (Ctrl-d on Unix systems) at the debugger prompt.
- You can also type x to exit from the debugger and the interpreter
immediately (after confirmation).
At the debugger prompt, you can use the u (“up”), d (“down”), t
(“top”) and b (“bottom”) commands to move around on the current call
stack. The p command prints a range of the call stack centered around the
currently selected stack frame, which is indicated with the >> tag,
whereas ** denotes the current bottom of the stack (which is the rule to
be executed with the single step command). The p command can also be
followed by a numeric argument which indicates the number of stack frames to
be printed (this will then become the default for subsequent invocations of
p). The n command steps over the call selected with the stack
navigation commands. For instance:
> fact 3;
** [1] fact: fact n::int = if n>0 then n*fact (n-1) else 1;
n = 3
: c *
** [4] (*): x::int*y::int = x*y;
x = 1; y = 1
: p
[1] fact: fact n::int = if n>0 then n*fact (n-1) else 1;
n = 3
[2] fact: fact n::int = if n>0 then n*fact (n-1) else 1;
n = 2
[3] fact: fact n::int = if n>0 then n*fact (n-1) else 1;
n = 1
** [4] (*): x::int*y::int = x*y;
x = 1; y = 1
: u
>> [3] fact: fact n::int = if n>0 then n*fact (n-1) else 1;
n = 1
: u
>> [2] fact: fact n::int = if n>0 then n*fact (n-1) else 1;
n = 2
: p
[1] fact: fact n::int = if n>0 then n*fact (n-1) else 1;
n = 3
>> [2] fact: fact n::int = if n>0 then n*fact (n-1) else 1;
n = 2
[3] fact: fact n::int = if n>0 then n*fact (n-1) else 1;
n = 1
** [4] (*): x::int*y::int = x*y;
x = 1; y = 1
: n
++ [2] fact: fact n::int = if n>0 then n*fact (n-1) else 1;
n = 2
--> 2
** [2] (*): x::int*y::int = x*y;
x = 3; y = 2
:
If you ever get lost, you can reprint the current rule with the ‘.‘
command:
: .
** [2] (*): x::int*y::int = x*y;
x = 3; y = 2
Another useful feature is the ? command which lets you evaluate any Pure
expression, with the local variables of the current rule bound to their
corresponding values. Like the n command, ? applies to the current
stack frame as selected with the stack navigation commands. The expression
must be entered on a single line, and the trailing semicolon is optional. For
instance:
> fact 3;
** [1] fact: fact n::int = if n>0 then n*fact (n-1) else 1;
n = 3
: c *
** [4] (*): x::int*y::int = x*y;
x = 1; y = 1
: ?x+y
2
: u
>> [3] fact: fact n::int = if n>0 then n*fact (n-1) else 1;
n = 1
: n>0, fact n
1,1
A third use of the debugger is to trace function calls. For that the
interpreter provides the trace command which works similarly to break,
but sets so-called “tracepoints” which only print rule invocations and
reductions instead of actually interrupting the evaluation. For instance,
assuming the same example as above, let’s first remove the breakpoint on
fact (using the del command) and then set it as a tracepoint instead:
> del fact
> trace fact
> fact 1;
** [1] fact: fact n::int = if n>0 then n*fact (n-1) else 1;
n = 1
** [2] fact: fact n::int = if n>0 then n*fact (n-1) else 1;
n = 0
++ [2] fact: fact n::int = if n>0 then n*fact (n-1) else 1;
n = 0
--> 1
++ [1] fact: fact n::int = if n>0 then n*fact (n-1) else 1;
n = 1
--> 1
1
The break and trace commands can also be used in concert if you want
to debug some functions while only tracing others.
The current sets of breakpoints and tracepoints can be changed with the
break, trace and del commands, as shown above, and just break
or trace without any arguments lists the currently defined breakpoints or
tracepoints, respectively. Please see Interactive Commands above for
details. Also note that these are really interpreter commands, so to enter
them you first have to exit the debugger (using the a or r command) if
an evaluation is currently in progress. (However, it’s possible to set a
temporary breakpoint in the debugger with the c command, see above.)
In interactive mode, the interpreter also runs some additional scripts at
startup, after loading the prelude and the scripts specified on the command
line. This lets you tailor the interactive environment to your liking.
The interpreter first looks for a .purerc file in the user’s home directory
(as given by the HOME environment variable) and then for a .purerc
file in the current working directory. These are just ordinary Pure scripts
which may contain any additional definitions that you need. The .purerc file
in the home directory is for global definitions which should always be
available when running interactively, while the .purerc file in the current
directory can be used for project-specific definitions.
Finally, you can also have a .pure initialization file in the current
directory, which is usually created with the dump command (see
above). This file is loaded after the .purerc files if it is present.
The interpreter processes all these files in the same way as with the run
command (see above). When invoking the interpreter, you can specify the
--norc option on the command line if you wish to skip these
initializations.
The interpreter’s -c option provides a means to turn Pure scripts
into standalone executables. This feature is still a bit experimental. In
particular, note that the compiled executable is essentially a static
snapshot of your program which is executed on the “bare metal”, without a
hosting interpreter. Only a minimal runtime system is provided. This
considerably reduces startup times, but also implies the following quirks and
limitations:
- All toplevel expressions and let bindings are evaluated after
all functions have been defined. This might cause inconsistent behaviour
with an interpreted run of the same program, which executes expressions and
variable definitions immediately, as the program is being processed. To
avoid these semantic differences, you’ll have to make sure that expressions
are evaluated after all functions used in the evaluation have been defined
completely.
- Toplevel expressions won’t be of much use in a batch-compiled program,
unless, of course, they are evaluated for their side-effects. Usually your
program will have to include at least one of these to play the role of the
“main program” in your script. In most cases these expressions are best
placed after all the function and variable definitions, at the end of your
program.
- The eval function can only be used to evaluate plain toplevel
expressions. You can define local functions and variables in
with and when clauses inside an expression, but you
can’t use eval to define new global variables and functions. In
other words, anything which changes the executing program is “verboten”.
Moreover, the introspective capabilities provided by evalcmd
(discussed under Reflection in the Caveats and Notes section) won’t work
either because the interactive commands are all disabled. If you need any of
these capabilities, you have to run your program with the interpreter.
- Constant and macro definitions, being compile time features, aren’t
available in the compiled program. If you need to use these with
eval at run time, you have to provide them through variable and
function definitions instead. Also, the compiler usually strips unused
functions from the output code, so that only functions which are actually
called somewhere in the static program text are available to eval.
(The -u option and the --required pragma can be used to
avoid this, see Code Size and Unstripped Executables below.)
- Code which gets executed to compute constant values at compile time will
generally not be executed in the compiled executable, so your program
shouldn’t rely on side-effects of such computations (this would be bad
practice anyway). There is an exception to this rule, however, namely if a
constant value contains run time data such as pointers and local functions
which requires an initialization at run time, then the batch compiler will
generate code for that. (The same happens if the --noconst option
is used to force computation of constant values at run time, see Code Size
and Unstripped Executables.)
What all this boils down to is that anything which requires the compile time
or interactive facilities of the interpreter, is unavailable. These
restrictions only apply at run time, of course. At compile time the program
is being executed by the interpreter so you can use eval and
evalcmd in any desired way. See the description of the
compiling variable below for how to distinguish these cases in your
script.
For most kinds of scripts, the above restrictions aren’t really that much of
an obstacle, or can easily be worked around. For the few scripts which
actually need the full dynamic capabilities of Pure you’ll just have to run
the script with the interpreter. This isn’t a big deal either, only the
startup will be somewhat slower because the script is compiled on the
fly. Once the JIT has done its thing the “interpreted” script will run every
bit as fast as the “compiled” one, since in fact both are compiled (only at
different times) to exactly the same code!
Also note that during a batch compilation, the compiled program is actually
executed as usual, i.e., the script is also run at compile time. This might
first seem to be a big annoyance, but it actually opens the door for some
powerful programming techniques like partial evaluation. It is also a
necessity because of Pure’s highly dynamic nature. For instance, Pure allows
you to define constants by evaluating an arbitrary expression (see Constant
Definitions below), and using eval a program can easily modify
itself in even more unforeseeable ways. Therefore pretty much anything in your
program can actually depend on previous computations performed while the
program is being executed.
For the sake of a concrete example, consider the following little script:
using system;
fact n = if n>0 then n*fact (n-1) else 1;
main n = do puts ["Hello, world!", str (map fact (1..n))];
if argc<=1 then () else main (sscanf (argv!1) "%d");
When invoked from the command line, with the number n as the first
parameter, this program will print the string "Hello, world!" and the list
of the first n factorials:
$ pure -x hello.pure 10
Hello, world!
[1,2,6,24,120,720,5040,40320,362880,3628800]
Note the condition on argc in the last line of the script. This prevents
the program from producing an exception if no command line parameters are
specified, so that the program can also be run interactively:
$ pure -i -q hello.pure
> main 10;
Hello, world!
[1,2,6,24,120,720,5040,40320,362880,3628800]
()
> quit
To turn the script into an executable, we just invoke the Pure interpreter
with the -c option, using the -o option to specify the
desired output file name:
$ pure -c hello.pure -o hello
$ ./hello 10
Hello, world!
[1,2,6,24,120,720,5040,40320,362880,3628800]
Next suppose that we’d like to supply the value n at compile rather than
run time. To these ends we want to turn the value passed to the main
function into a compile time constant, which can be done as follows:
const n = if argc>1 then sscanf (argv!1) "%d" else 10;
(Note that we provide 10 as a default if n isn’t specified on the
command line.)
Moreover, in such a case we usually want to skip the execution of the main
function at compile time. The Pure runtime provides a special system variable
compiling which holds a truth value indicating whether the program is
actually running under the auspices of the batch compiler, so that it can
adjust accordingly. In our example, the evaluation of main becomes:
if compiling then () else main n;
Our program now looks as follows:
using system;
fact n = if n>0 then n*fact (n-1) else 1;
main n = do puts ["Hello, world!", str (map fact (1..n))];
const n = if argc>1 then sscanf (argv!1) "%d" else 10;
if compiling then () else main n;
This script “specializes” n to the first (compile time) parameter when
being batch-compiled, and it still works as before when we run it through the
interpreter in both batch and interactive mode, too:
$ pure -i -q hello.pure
Hello, world!
[1,2,6,24,120,720,5040,40320,362880,3628800]
> main 5;
Hello, world!
[1,2,6,24,120]
()
> quit
$ pure -x hello.pure 7
Hello, world!
[1,2,6,24,120,720,5040]
$ pure -o hello -c -x hello.pure 7
$ ./hello
Hello, world!
[1,2,6,24,120,720,5040]
You’ll rarely need an elaborate setup like this, most of the time something
like our simple first example will do the trick. But, as you’ve seen, Pure can
easily do it.
By default, the batch compiler strips unused functions from the output code,
to keep the code size small. You can disable this with the -u
option, in which case the output code includes all functions defined in the
compiled program or imported through a using clause, even if they
don’t seem to be used anywhere. This considerably increases compilation times
and makes the compiled executable much larger. For instance, on a 64 bit Linux
systems with ELF binaries the executable of our hello.pure example is about
thrice as large:
$ pure -o hello -c -x hello.pure 7 && ls -l hello
-rwxr-xr-x 1 ag users 178484 2010-01-12 06:21 hello
$ pure -o hello -c -u -x hello.pure 7 && ls -l hello
-rwxr-xr-x 1 ag users 541941 2010-01-12 06:21 hello
(Note that even the stripped executable is fairly large when compared to
compiled C code, as it still contains the symbol table of the entire program,
which is needed by the runtime environment.)
Stripped executables should be fine for most purposes, but you have to be
careful when using eval in your compiled program. The compiler only
does a static analysis of which functions might be reached from the
initialization code (i.e., toplevel expressions and let
bindings). It does not take into account code run via the eval
routine. Thus, functions used only in evaled code will be stripped
from the executable, as if they were never defined at all. If such a function
is then being called using eval at runtime, it will evaluate to a
plain constructor symbol.
If this is a problem then you can either use the -u option to
produce an unstripped executable, or you can force functions to be included in
the stripped executable with the --required pragma (cf. Code
Generation Options). For instance:
#! --required foo
foo x = bar (x-1);
eval "foo 99";
There is another code generation option which may have a substantial effect on
code size, namely the --noconst option. Normally, constant values
defined in a const definition are precomputed at compile time and then
stored in the generated executable; this reduces startup times but may
increase the code size considerably if your program contains big constant
values such as lists. If you prefer smaller executables then you can use the
--noconst option to force the value of the constant to be recomputed
at run time (which effectively turns the constant into a kind of read-only
variable). For instance:
#! --noconst
const xs = 1L..100000L;
sum = foldl (+) 0;
using system;
puts $ str $ sum xs;
On my 64 bit Linux system this produces a 187115 bytes executable. Without
--noconst the code becomes almost an order of magnitude larger in
this case (1788699 bytes). On the other hand, the smaller executable also
takes a little longer to run since it must first recompute the value of the
list constant at startup. So you have to consider the tradeoffs in a given
situation. Usually big executables aren’t much of a problem on modern
operating systems, but if your program contains a lot of big constants then
this may become an important consideration. However, if a constant value takes
a long time to compute then you’ll be better off with the default behaviour of
precomputing the value at compile time.
Note that while the batch compiler generates native executables by default, it
can just as well create object files which can be linked into other C/C++
programs and libraries:
$ pure -o hello.o -c -x hello.pure 7
The .o extension tells the compiler that you want an object file. When linking
the object module, you also need to supply an initialization routine which
calls the __pure_main__ function in hello.o to initialize the compiled
module. This routine is declared in C/C++ code as follows:
extern "C" void __pure_main__(int argc, char** argv);
As indicated, __pure_main__ is to be invoked with two parameters, the
argument count and NULL-terminated argument vector which become the
argc and the argv of the Pure program, respectively. (You can also
just pass 0 for both arguments if you don’t need to supply command line
parameters.) The purpose of __pure_main__ is to initialize a shell
instance of the Pure interpreter which provides the minimal runtime support
necessary to execute the Pure program, and to invoke all “initialization code”
(variable definitions and toplevel expressions) of the program itself.
A minimal C main function which does the job of initializing the Pure
module looks as follows:
extern void __pure_main__(int argc, char** argv);
int main(int argc, char** argv)
{
__pure_main__(argc, argv);
return 0;
}
If you link the main routine with the Pure module, don’t forget to also
pull in the Pure runtime library. Assuming that the above C code is in
pure_main.c:
$ gcc -c pure_main.c -o pure_main.o
$ g++ -o hello hello.o pure_main.o -lpure
$ ./hello
Hello, world!
[1,2,6,24,120,720,5040]
(The C++ compiler is used as the linker here so that the standard C++ library
gets linked in, too. This is necessary because Pure’s runtime library is
actually written in C++.)
In fact, this is pretty much what pure -c actually does for you when
creating an executable.
If your script loads dynamic libraries (using "lib:...";) then you’ll also
have to link with those; all external references have to be resolved at
compile time. This is taken care of automatically when creating
executables. Otherwise it is a good idea to run pure -c with the
-v0100 verbosity option so that it prints the libraries to be linked (in
addition to the commands which are invoked in the compilation process):
$ pure -v0100 -c hello.pure -o hello.o
opt -f -std-compile-opts hello.o.bc | llc -f -o hello.o.s
gcc -c hello.o.s -o hello.o
Link with: g++ hello.o -lpure
Well, we already knew that, so let’s consider a slightly more interesting
example from Pure’s ODBC module:
$ pure -v0100 -c pure-odbc/examples/menagerie.pure -o menagerie.o
opt -f -std-compile-opts menagerie.o.bc | llc -f -o menagerie.o.s
gcc -c menagerie.o.s -o menagerie.o
Link with: g++ menagerie.o /usr/local/lib/pure/odbc.so -lpure
$ g++ -shared -o menagerie.so menagerie.o /usr/local/lib/pure/odbc.so -lpure
Note that the listed link options are necessary but might not be sufficient;
pure -c just makes a best guess based on the Pure source. On most systems
this will be good enough, but if it isn’t, you can just add options to the
linker command as needed to pull in additional required libraries.
As this last example shows, you can also create shared libraries from Pure
modules. However, on some systems (most notably x86_64), this requires that
you pass the -fPIC option when batch-compiling the module, so that
position-independent code is generated:
$ pure -c -fPIC pure-odbc/examples/menagerie.pure -o menagerie.o
Also note that even when building a shared module, you’ll have to supply an
initialization routine which calls __pure_main__ somewhere.
Last but not least, pure -c can also generate just plain LLVM assembler
code:
pure -c hello.pure -o hello.ll
Note the .ll extension; this tells the compiler that you want an LLVM
assembler file. An LLVM bitcode file can be created just as easily:
pure -c hello.pure -o hello.bc
In these cases you’ll have to have to handle the rest of the compilation
yourself. This gives you the opportunity, e.g., to play with special
optimization and code generation options provided by the LLVM
toolchain. Please refer to the LLVM documentation (in particular, the
description of the opt and llc programs) for details.
Another point worth mentioning here is that you can’t just call Pure functions
in a batch-compiled module directly. That’s because in order to call a Pure
function, at least in the current implementation, you have to set up a Pure
stack frame for the function. However, there’s a convenience function called
pure_funcall in the runtime API to handle this. This function takes a
pointer to the Pure function, the argument count and the arguments themselves
(as pure_expr* objects) as parameters. For instance, here is a pure_main.c
module which can be linked against the hello.pure program from above, which
calls the fact function from the Pure program:
#include <stdio.h>
#include <pure/runtime.h>
extern void __pure_main__(int argc, char** argv);
extern pure_expr *fact(pure_expr *x);
int main()
{
int n = 10, m;
__pure_main__(0, NULL);
if (pure_is_int(pure_funcall(fact, 1, pure_int(n)), &m))
printf("fact %d = %d\n", n, m);
return 0;
}
And here’s how you can compile, link and run this program:
$ pure -o hello.o -c -x hello.pure 7
$ gcc -o pure_main.o -c pure_main.c
$ g++ -o myhello hello.o pure_main.o -lpure
$ ./myhello
Hello, world!
[1,2,6,24,120,720,5040]
fact 10 = 3628800
Note that the first two lines are output from the Pure program; the last line
is what gets printed by the main routine in pure_main.c.
This section is a grab bag of casual remarks, useful tips and tricks, and
information on common pitfalls, quirks and limitations of the current
implementation and how to deal with them.
People keep asking me what’s so “pure” about Pure. The long and apologetic
answer is that Pure tries to stay as close as possible to the spirit of term
rewriting without sacrificing practicality. It’s possible and in fact quite
easy to write purely functional programs in Pure, and you’re encouraged to do
so whenever this is possible or at least reasonable. On the other hand, Pure
doesn’t get in your way if you want to call external operations with side
effects; it does allow you to call any C function after all.
The short (and true) answer is that I simply liked the name, and there wasn’t
any programming language named “Pure” yet (quite a feat nowadays), so there’s
one now. If you insist on a (recursive) backronym, just take “PURE” to stand
for the “Pure Universal Rewriting Engine”.
Pure is based on the author’s earlier Q language, but it offers many new and
powerful features and programs run much faster than their Q equivalents. The
language also went through a thorough facelift in order to modernize the
syntax and make it more similar to other modern-style functional languages, in
particular Miranda and Haskell. Thus porting Q scripts to Pure often
involves a substantial amount of manual work, but it can (and has) been done.
Since its modest beginnings in April 2008, Pure has gone through a lot of
major and minor revisions which raise various backward compatibility issues.
We document these in the following, in order to facilitate the porting of
older Pure scripts.
Pure 0.7 introduced built-in matrix structures, which called for some minor
changes in the syntax of comprehensions and arithmetic sequences.
Specifically, the template expression and generator/filter clauses of a
comprehension are now separated with | instead of ;. Moreover,
arithmetic sequences with arbitrary stepsize are now written x:y..z
instead of x,y..z, and the ‘..‘ operator now has a higher precedence
than the ‘,‘ operator. This makes writing matrix slices like
x!!(i..j,k..l) much more convenient.
In Pure 0.13 the naming of the logical and bitwise operations was changed, so
that these are now called ~, &&, || and not/and/or,
respectively. (Previously, ~ was used for bitwise, not for logical
negation, which was rather inconsistent, albeit compatible with the naming of
the not operation in Haskell and ML.) Also, to stay in line with this
naming scheme, inequality was renamed to ~= (previously !=).
Pure 0.14 introduced the namespaces feature. Consequently, the scope of
private symbols is now confined to a namespace rather than a source module;
scripts making use of private symbols need to be adapted accordingly. Also
note that syntax like foo::int may now also denote a qualified symbol
rather than a tagged variable, if foo has been declared as a
namespace. You can work around such ambiguities by renaming the variable, or
by placing spaces around the ‘::‘ delimiter (these aren’t permitted in a
qualified symbol, so the construct foo :: int is always interpreted as a
tagged variable, no matter whether foo is also a valid namespace).
Pure 0.26 extended the namespaces feature to add support for hierarchical
namespaces. This means that name lookup works in a slightly different fashion
now (see Hierarchical Namespaces for details), but old code which doesn’t
use the new feature should continue to work unchanged.
Pure 0.26 also changed the nullary keyword to nonfix,
which is more consistent with the other kinds of fixity declarations.
Moreover, the parser was enhanced so that it can cope with a theoretically
unbounded number of precedence levels, and the system of standard operators in
the prelude was modified so that it becomes possible to sneak in new operator
symbols with ease; details can be found in the Symbol Declarations section.
Pure 0.41 added support for optimization of indirect tail calls, so that any
previous restrictions on the use of tail recursion in indirect function calls
and mutually recursive globals have been removed. Moreover, the logical
operators && and || are now tail-recursive in their second operand and
can also be extended with user-defined equations, just like the other
builtins. Note that this implies that the values returned by && and ||
aren’t normalized to the values 0 and 1 any more (this isn’t possible with
tail call semantics). If you need this then you’ll have to make sure that
either the operands are already normalized, or you’ll have to normalize the
result yourself.
Also, as of Pure 0.41 the batch compiler produces stripped executables by
default. To create unstripped executables you now have to use the -u
option, see Code Size and Unstripped Executables for details. The
-s option to produce stripped executables is still provided for
backward compatibility, but it won’t have any effect unless you use it to
override a previous -u option.
Pure 0.43 changed the rules for looking up symbols in user-defined namespaces.
Unqualified symbols are now created in the current (rather than the global)
namespace by default, see Symbol Lookup and Creation for details. The
-w option can be used to get warnings about unqualified symbols
which are resolved to a different namespace than previously. It also provides
a means to check your scripts for implicit declarations which might indicate
missing or mistyped function symbols.
Pure 0.45 added support for checking arbitrary pointer types in the C
interface, so that you don’t have to worry about passing the wrong kinds of
pointers to system and library routines any more. Moreover, the interpretation
of numeric pointer arguments (int* etc.) was changed to bring them in line
with the other new numeric matrix conversions (int** etc.). In particular,
the matrix data can now be modified in-place and type checking is more strict
(int* requires an int matrix, etc.). Also, there’s now support for
argv-style vector arguments (char** and void**). Please see the C
Types section for details.
The parser uses a fairly simplistic panic mode error recovery which tries to
catch syntax errors at the toplevel only. This seems to work reasonably well,
but might catch some errors much too late. Unfortunately, Pure’s terseness
makes it rather difficult to design a better scheme. As a remedy, the parser
accepts an empty definition (just ; by itself) at the toplevel only. Thus,
in interactive usage, if the parser seems to eat away your input without doing
anything, entering an extra semicolon or two should break the spell, putting
you back at the toplevel where you can start typing the definition again.
-
__show__ x
This function provides a “hook” to override the print representations of
expressions at runtime, which works in a fashion similar to Haskell’s
show function.
__show__ is just an ordinary Pure function expected to return a string
with the desired custom representation of a normal form value given as the
function’s single argument. The interpreter prints the strings returned by
__show__ just as they are. It will not check whether they conform to
Pure syntax and/or semantics, or modify them in any way. Also, the library
doesn’t define this function anywhere, so you are free to add any rules that
you want.
Custom print representations are most useful for interactive purposes, if
you’re not happy with the default print syntax of some kinds of objects. One
particularly useful application of __show__ is to change the format of
numeric values. Here are some examples:
> using system;
> __show__ x::double = sprintf "%0.6f" x;
> 1/7;
0.142857
> __show__ x::int = sprintf "0x%0x" x;
> 1786;
0x6fa
> using math;
> __show__ (x::double +: y::double) = sprintf "%0.6f+%0.6fi" (x,y);
> cis (-pi/2);
0.000000+-1.000000i
The prelude function str, which returns the print representation of
any Pure expression, calls __show__ as well:
Conversely, you can call the str function from __show__, but
in this case it always returns the default representation of an
expression. This prevents the expression printer from going recursive, and
allows you to define your custom representation in terms of the default
one. E.g., the following rule removes the L suffixes from bigint values:
> __show__ x::bigint = init (str x);
> fact n = foldl (*) 1L (1..n);
> fact 30;
265252859812191058636308480000000
Of course, your definition of __show__ can also call __show__
itself recursively to determine the custom representation of an object.
One case which needs special consideration are thunks (futures). The printer
will never use __show__ for those, to prevent them from being forced
inadvertently. In fact, you can use __show__ to define custom
representations for thunks, but only in the context of a rule for other kinds
of objects, such as lists. For instance:
> nonfix ...;
> __show__ (x:xs) = str (x:...) if thunkp xs;
> 1:2:(3..inf);
1:2:3:...
Another case which needs special consideration are numeric matrices. For
efficiency, the expression printer will always use the default representation
for these, unless you override the representation of the matrix as a
whole. E.g., the following rule for double matrices mimics Octave’s default
output format (for the sake of simplicity, this isn’t perfect, but you get the
idea):
> __show__ x::matrix =
> strcat [printd j (x!(i,j))|i=0..n-1; j=0..m-1] + "\n"
> with printd 0 = sprintf "\n%10.5f"; printd _ = sprintf "%10.5f" end
> when n,m = dim x end if dmatrixp x;
> {1.0,1/2;1/3,4.0};
1.00000 0.50000
0.33333 4.00000
Finally, by just purging the definition of the __show__ function you
can easily go back to the standard print syntax:
> clear __show__
> 1/7; 1786; cis (-pi/2);
0.142857142857143
1786
6.12303176911189e-17+:-1.0
Note that if you have a set of definitions for the __show__ function
which should always be loaded at startup, you can put them into the
interpreter’s interactive startup files, see Interactive Usage.
As explained in section Patterns, Pure allows multiple occurrences of the
same variable in a pattern (so-called non-linearities):
This rule will only be matched if both occurrences of x are bound to the
same value. More precisely, the two instances of x will checked for
syntactic equality during pattern matching, using the same primitive
provided by the prelude. This may need time proportional to the sizes of both
argument terms, and thus become quite costly for big terms. In fact,
same might not even terminate at all if the compared terms are both
infinite lazy data structures, such as in foo (1..inf) (1..inf). So you
have to be careful to avoid such uses.
When using non-linearities in conjunction with “as” patterns, you also have to
make sure that the “as” variable does not occur inside the corresponding
subpattern. Thus a definition like the following is illegal:
> foo xs@(x:xs) = x;
<stdin>, line 1: error in pattern (recursive variable 'xs')
The explanation is that such a pattern couldn’t possibly be matched by a
finite list anyway. Indeed, the only match for xs@(x:xs) would be an
infinite list of x‘s, and there’s no way that this condition could be
verified in a finite amount of time. Therefore the interpreter reports a
“recursive variable” error in such situations.
In the current implementation, “as” patterns cannot be placed on the “spine”
of a function definition. Thus rules like the following, which have the
pattern somewhere in the head of the left-hand side, will all provoke an error
message from the compiler:
a@foo x y = a,x,y;
a@(foo x) y = a,x,y;
a@(foo x y) = a,x,y;
This is because the spine of a function application is not available when the
function is called at runtime. “As” patterns in pattern bindings
(let, const, case, when) are not
affected by this restriction since the entire value to be matched is available
at runtime. For instance:
> case bar 99 of y@(bar x) = y,x+1; end;
bar 99,100
“As” patterns are also a useful device if you need to manipulate function
applications in a generic way. Note that the “head = function” rule means that
the head symbol f of an application f x1 ... xn occurring on (or
inside) the left-hand side of an equation, variable binding, or
pattern-matching lambda expression, is always interpreted as a literal
function symbol (not a variable). This implies that you cannot match the
“function” component of an application against a variable, at least not
directly. An anonymous “as” pattern like f@_ does the trick, however,
since the anonymous variable is always recognized, even if it occurs as the
head symbol of a function application. Here’s a little example which
demonstrates how you can convert a function application to a list containing
the function and all arguments:
> foo x = a [] x with a xs (x@_ y) = a (y:xs) x; a xs x = x:xs end;
> foo (a b c d);
[a,b,c,d]
This may seem a little awkward, but as a matter of fact the “head = function”
rule is quite useful since it covers the common cases without forcing the
programmer to declare “constructor” symbols (except nonfix symbols). On the
other hand, generic rules operating on arbitrary function applications are not
all that common, so having to “escape” a variable using the anonymous “as”
pattern trick is a small price to pay for that convenience.
Sometimes you may also run into the complementary problem, i.e., to match a
function argument against a given function. Consider this code fragment:
foo x = x+1;
foop f = case f of foo = 1; _ = 0 end;
You might expect foop to return true for foo, and false on all other
values. Better think again, because in reality foop will always return
true! In fact, the Pure compiler will warn you about the second rule of the
case expression not being used at all:
> foop 99;
warning: rule never reduced: _ = 0;
1
This happens because an identifier on the left-hand side of a rule, which is
neither the head symbol of a function application nor a nonfix
symbol, is always considered to be a variable (cf. Variables in Equations),
even if that symbol is defined as a global function elsewhere. So foo
isn’t a literal name in the above case expression, it’s a variable!
(As a matter of fact, this is rather useful, since otherwise a rule like f g
= g+1 would suddenly change meaning if you happen to add a definition like
g x = x-1 somewhere else in your program, which certainly isn’t
desirable.)
A possible workaround is to “escape” the function symbol using an empty
namespace qualifier:
foop f = case f of ::foo = 1; _ = 0 end;
This trick works in case expressions and function definitions, but
fails in circumstances in which qualified variable symbols are permitted
(i.e., in variable and constant definitions). A better solution is to employ
the syntactic equality operator === defined in the prelude to match the
target value against the function symbol. This allows you to define the
foop predicate as follows:
> foop f = f===foo;
> foop foo, foop 99;
1,0
Another way to deal with the situation would be to just declare foo as a
nonfix symbol. However, this makes the foo symbol “precious”, i.e., after
such a declaration it cannot be used as a local variable anymore. It’s usually
a good idea to avoid that kind of thing, at least for generic symbols, so the
above solution is preferred in this case.
A common source of confusion is that Pure provides two different constructs to
bind local function and variable symbols, respectively. This distinction is
necessary because Pure does not segregate defined functions and constructors,
and thus there is no magic to figure out whether an equation like foo x =
y by itself is meant as a definition of a function foo with formal
parameter x and return value y, or a pattern binding defining the
local variable x by matching the pattern foo x against the value of
y. The with construct does the former, when the
latter. (As a mnemonic, you may consider that when conveys a sense
of time, as the individual variable definitions in a when clause
are executed in order, while the function definitions in a with
clause are all done simultaneously.)
Another speciality is that with and when clauses are
tacked on to the end of the expression they belong to. This mimics
mathematical language and makes it easy to read and understand a definition in
a “top-down” fashion. This style differs considerably from other
block-structured programming languages, however, which often place local
definitions in front of the code they apply to. To grasp the operational
meaning of such nested definitions, it can be helpful to read the nested
scopes “in reverse” (from bottom to top). Some people also prefer to write
their programs that way. In contrast to Haskell and ML which have
let expressions to support that kind of notation, Pure doesn’t
provide any special syntax for this. But note that you can always write
when clauses in the following style which places the “body” at the
bottom of the clause:
result when
y = foo (x+1);
z = bar y;
result = baz z;
end;
This doesn’t incur any overhead, since the compiler will always eliminate the
trivial “tail binding” for the result value. E.g., the above will compile to
exactly the same code as:
baz z when
y = foo (x+1);
z = bar y;
end;
If possible, you should decorate numeric variables on the left-hand sides of
function definitions with the appropriate type tags, like int or
double. This often helps the compiler to generate better code and makes
your programs run faster. The | syntax makes it easy to add the necessary
specializations of existing rules to your program. E.g., taking the
polymorphic implementation of the factorial as an example, you only have to
add a left-hand side with the appropriate type tag to make that definition go
as fast as possible for the special case of machine integers:
fact n::int |
fact n = n*fact(n-1) if n>0;
= 1 otherwise;
(This obviously becomes unwieldy if you have to deal with several numeric
arguments of different types, however, so in this case it is usually better to
just use a polymorphic rule.)
Also note that int (the machine integers), bigint (the GMP “big” integers) and
double (floating point numbers) are all different kinds of objects. While they
can be used in mixed operations (such as multiplying an int with a bigint
which produces a bigint, or a bigint with a double which produces a double),
the int tag will only ever match a machine int, not a bigint or a
double. Likewise, bigint only matches bigints (never int or double
values), and double only doubles. Thus, if you want to define a function
operating on different kinds of numbers, you’ll also have to provide equations
for all the types that you need (or a polymorphic rule which catches them
all). This also applies to equations matching against constant values of these
types. In particular, a small integer constant like 0 only matches machine
integers, not bigints; for the latter you’ll have to use the “big L” notation
0L. Similarly, the constant 0.0 only matches doubles, but not ints or
bigints.
Constants differ from variables in that they cannot be redefined (that’s their
main purpose after all) so that their values, once defined, can be substituted
into other definitions which use them. For instance:
> const c = 2;
> foo x = c*x;
> show foo
foo x = 2*x;
> foo 99;
198
While a variable can be rebound to a new value at any time, you will get an
error message if you try to do this with a constant:
> const c = 3;
<stdin>, line 5: symbol 'c' is already defined as a constant
Note that in interactive mode you can work around this by purging the old
definition with the clear command. However, this won’t affect any earlier
uses of the symbol:
> clear c
> const c = 3;
> bar x = c*x;
> show foo bar
bar x = 3*x;
foo x = 2*x;
(You’ll also have to purge any existing definition of a variable if you want
to redefine it as a constant, or vice versa, since Pure won’t let you redefine
an existing constant or variable as a different kind of symbol. The same also
holds if a symbol is currently defined as a function or a macro.)
Constants can also be used in patterns (i.e., on the left-hand side of a rule
in a definition or a case expression), but only if you also declare
the corresponding symbol as nonfix. This is useful, e.g., if you’d
like to use constants such as true and false on the
left-hand side of a definition, just like other nonfix symbols:
> show false true
const false = 0;
const true = 1;
> nonfix false true;
> check false = "no"; check true = "yes";
> show check
check 0 = "no";
check 1 = "yes";
> check (5>0);
"yes"
Note that without the nonfix declaration, the above definition of
check wouldn’t work as intended, because the true and
false symbols on the left-hand side of the two equations would be
interpreted as local variables. Also note that the standard library never
declares any constant symbols as nonfix, since once a symbol is
nonfix there’s no going back. Thus the library leaves this to the
programmer to decide.
As the value of a constant is known at compile time, the compiler can apply
various optimizations to uses of such values. In particular, the Pure compiler
inlines constant scalars (numbers, strings and pointers) by literally
substituting their values into the output code, and it also precomputes simple
constant expressions involving only (machine) integer and double
values. Example:
> extern double atan(double);
> const pi = 4*atan 1.0;
> show pi
const pi = 3.14159265358979;
> foo x = 2*pi*x;
> show foo
foo x = 6.28318530717959*x;
In addition, the LLVM backend eliminates dead code automatically, so you can
employ a constant to configure your code for different environments, without
any runtime penalties:
const win = index sysinfo "mingw32" >= 0;
check boy = bad boy if win;
= good boy otherwise;
In this case the code for one of the branches of check will be completely
eliminated, depending on the outcome of the configuration check.
For efficiency, constant aggregates (lists, tuples, matrices and other kinds
of non-scalar terms) receive special treatment. Here, the constant is computed
once and stored in a read-only variable which then gets looked up at runtime,
just like an ordinary global variable. However, there’s an important
difference: If a script is batch-compiled (cf. Batch Compilation), the
constant value is normally computed at compile time only; when running the
compiled executable, the constant value is simply reconstructed, which is
often much more efficient than recomputing its value. For instance, you might
use this to precompute a large table whose computation may be costly or
involve functions with side effects:
const table = [foo x | x = 1..1000000];
process table;
Note that this only works with const values which are completely
determined at compile time. If a constant contains run time objects such as
pointers and (local) functions, this is impossible, and the batch compiler
will instead create code to recompute the value of the constant at run time.
For instance, consider:
using system;
const p = malloc 100;
foo p;
Here, the value of the pointer p of course critically depends on its
computation (involving a side effect which sets aside a corresponding chunk of
memory). It would become unusable without actually executing the
initialization, so the compiler generates the appropriate run time
initialization code in this case. For all practical purposes, this turns the
constant into a read-only variable. (There’s also a code generation option to
force this behaviour even for “normal” constants for which it’s not strictly
necessary, in order to create smaller executables; see Code Size and
Unstripped Executables for details.)
The interpreter always takes your extern declarations of C routines
at face value. It will not go and read any C header files to determine whether
you actually declared the function correctly! So you have to be careful to
give the proper declarations, otherwise your program might well give a
segfault when calling the function. (This problem can to some extent be
alleviated by using the bitcode interface. See Importing LLVM Bitcode and
Inline Code in the C Interface section.)
Another limitation of the C interface is that it does not offer any special
support for C structs and C function parameters. However, an optional addon
module is available which interfaces to the libffi library to provide that
kind of functionality, please see pure-ffi for details.
Last but not least, to make it easier to create Pure interfaces to large C
libraries, there’s a separate pure-gen program available at the Pure website.
This program takes a C header (.h) file and creates a corresponding Pure
module with definitions and extern declarations for the constants
and functions declared in the header. Please refer to pure-gen: Pure interface generator for
details.
Pure does lazy evaluation in the same way as Alice ML, providing an
explicit operation (&) to defer evaluation and create a “future” which
is called by need. However, note that like any language with a basically eager
evaluation strategy, Pure cannot really support lazy evaluation in a fully
automatic way. That is, coding an operation so that it works with infinite
data structures usually requires additional thought, and sometimes special
code will be needed to recognize futures in the input and handle them
accordingly. This can be hard, but of course in the case of the prelude
operations this work has already been done for you, so as long as you stick to
these, you’ll never have to think about these issues. (It should be noted here
that lazy evaluation has its pitfalls even in fully lazy FPLs, such as hidden
memory leaks and other kinds of subtle inefficiencies or non-termination
issues resulting from definitions being too lazy or not lazy enough. You can
read about that in any good textbook on Haskell.)
The prelude goes to great lengths to implement all standard list operations in
a way that properly deals with streams (a.k.a. lazy lists). What this all
boils down to is that all list operations which can reasonably be expected to
operate in a lazy way on streams, will do so. (Exceptions are inherently eager
operations such as #, reverse and foldl.) Only those
portions of an input stream will be traversed which are strictly required to
produce the result. For most purposes, this works just like in fully lazy FPLs
such as Haskell. However, there are some notable differences:
- Since Pure uses dynamic typing, some of the list functions may have to peek
ahead one element in input streams to check their arguments for validity,
meaning that these functions will be slightly more eager than their Haskell
counterparts.
- Pure’s list functions never produce truly cyclic list structures such as the
ones you get, e.g., with Haskell’s cycle operation. (This is actually a
good thing, because the current implementation of the interpreter cannot
garbage-collect cyclic expression data.) Cyclic streams such as cycle
[1] or fix (1:) will of course work as expected, but, depending
on the algorithm, memory usage may increase linearly as they are traversed.
- Pattern matching is always refutable (and therefore eager) in Pure. If you
need something like Haskell’s irrefutable matches, you’ll have to code them
explicitly using futures. See the definition of the unzip function
in the prelude for an example showing how to do this.
Here are some common pitfalls with lazy data structures in Pure that you
should be aware of:
Laziness and side effects don’t go well together, as most of the time you
can’t be sure when a given thunk will be executed. So as a general guideline
you should avoid side effects in thunked data structures. If you can’t avoid
them, then at least make sure that all accesses to the affected resources
are done through a single instance of the thunked data structure. E.g., the
following definition lets you create a stream of random numbers:
> using math;
> let xs = [random | _ = 1..inf];
This works as expected if only a single stream created with random
exists in your program. However, as the random function in the
math module modifies an internal data structure to produce a
sequence of pseudorandom numbers, using two or more such streams in your
program will in fact modify the same underlying data structure and thus
produce two disjoint subsequences of the same underlying pseudorandom
sequence which might not be distributed uniformly any more.
You should avoid keeping references to potentially big (or even infinite)
thunked data structures when traversing them (unless you specifically need
to memoize the entire data structure). In particular, if you assign such a
data structure to a local variable, the traversal of the data structure
should then be invoked as a tail call. If you fail to do this, it forces the
entire memoized part of the data structure to stay in main memory while it
is being traversed, leading to rather nasty memory leaks. Please see the
all_primes function in Lazy Evaluation and Streams for an example.
Pure versions since 0.12 offer some basic reflection capabilities via the
evalcmd primitive. This function provides access to interactive
commands like clear, save and show, which enable you to inspect
and modify the running program. The only “canonical” way to represent an
entire Pure program in Pure itself is the program text, hence evalcmd
only provides a textual interface at this time. But of course custom
higher-level representations can be built on top of that, similar to those
discussed in section The Quote.
Here’s an example showing what can be done using the show command and a
little bit of trivial text processing. The following sym_info function
retrieves information about a given collection of global symbols in a way
which can be processed in a Pure program. The cat argument can be any
combination of the letters “c”, “v”, “f” and “m” denoting the categories of
constants, variables, functions and macros, respectively. (You can also just
leave this empty if you don’t care about the type of symbol.) The pat
argument is a shell-like glob pattern for the name of symbols which should be
listed (just “*” matches all symbols). The result is a list of tuples (name,
value, cat, descr) with the name of the symbol and its value, as well as the
category and description of the symbol, as provided by show -s.
using system;
sym_info cat::string pat::string
= [name,eval ("("+name+")"),descr | name,descr = info]
when
// Get the info about matching symbols from the 'show' command.
info = evalcmd $ sprintf "show -sg%s %s" (cat,pat);
// Split into lines.
info = if null info then [""] else split "\n" $ init info;
// Get rid of the last line with the summary information.
info = init info;
// Retrieve the information that we need.
info = [x | x@(s,_) = map fields info;
// Get rid of extra lines containing extern and fixity declarations.
s ~= "extern" && s ~= "nonfix" && s ~= "outfix" &&
s ~= "prefix" && s ~= "postfix" && ~fnmatch "infix*" s 0];
end with
// Regex call to split the summary information about one symbol, as
// returned by 'show -s', into the name and description parts.
fields s::string = tuple $
[info!2 | info = tail $ regs $ reg_info $
regex "([^ ]+)[ ]+([a-z]*)[ ]*(.*)" REG_EXTENDED s 0];
end;
E.g., this call retrieves information about all defined macros:
> sym_info "m" "*";
[("$",($),"mac","2 args, 1 rules"),(".",(.),"mac","3 args, 1 rules"),
("void",void,"mac","1 args, 6 rules")]
As mentioned in the Macro Hygiene section, Pure macros are lexically scoped
and thus “hygienic”. So Pure macros are not suceptible to name capture, but
there is also one Pure-related caveat here. The expression printer currently
doesn’t check for different bindings of the same variable identifier when it
prints a (compile time) expression. For instance, consider:
> def F x = x+y when y = x+1 end;
> foo y = F y;
> show foo
foo y = y+y when y = y+1 end;
This looks as if y got captured, but in fact it’s not, it’s just the
show command which displays the definition in an incorrect way. You can
add the -e option to show which prints the deBruijn indices of locally
bound symbols, then you see that the actual bindings are all right anyway
(note that the number before the colon is the actual deBruijn index, the
sequence of bits behind it is the subterm path):
> show -e foo
foo y/*0:1*/ = y/*1:1*/+y/*0:*/ when y/*0:*/ = y/*0:1*/+1 end;
Alas, this means that if you use dump to write such a definition to a text
file and read it back with run later, then you’ll get the wrong
definition. This is an outright bug in the expression printer which will
hopefully be fixed some time. But for the time being you will have to correct
such glitches manually.
Pure programs may need a considerable amount of stack space to handle
recursive function and macro calls, and the interpreter itself also takes its
toll. So you should configure your system accordingly (8 MB of stack space is
recommended for 32 bit systems, systems with 64 bit pointers probably need
more). If the PURE_STACK environment variable is defined, the
interpreter performs advisory stack checks on function entry and raises a Pure
exception if the current stack size exceeds the given limit. The value of
PURE_STACK should be the maximum stack size in kilobytes. Please
note that this is only an advisory limit which does not change the program’s
physical stack size. Your operating system should supply you with a command
such as ulimit(1) to set the real process stack size. (The
PURE_STACK limit should be a little less than that, to account for
temporary stack usage by the interpreter itself.)
Like Scheme, Pure does proper tail calls (if LLVM provides that feature on the
platform at hand), so tail-recursive definitions should work fine in limited
stack space. For instance, the following little program will loop forever if
your platform supports the required optimizations:
This also works if your definition involves function parameters, guards and
multiple equations, of course. Moreover, conditional expressions
(if-then-else) are tail-recursive in both
branches, and the logical operators && and ||,
as well as the sequence operator $$, are tail-recursive in their
second operand.
Also note that tail call optimization is always disabled if the debugger is
enabled (-g). This makes it much easier to debug programs, but means that you
may run into stack overflows when debugging a program that does deep tail
recursion.
As described in section Exception Handling, signals delivered to the
process can be caught and handled with Pure’s exception handling facilities.
This has its limitations, however. Since Pure code cannot be executed directly
from a C signal handler, checks for pending signals are only done on function
entry. This means that in certain situations (such as the execution of an
external C routine), delivery of a signal may be delayed by an arbitrary
amount of time. Moreover, if more than one signal arrives between two
successive signal checks, only the last one will be reported in the current
implementation.
When delivering a signal which has been remapped to a Pure exception, the
corresponding exception handler (if any) will be invoked as usual. Further
signals are blocked while the exception handler is being executed.
A fairly typical case is that you have to handle signals in a tail-recursive
function. This can be done with code like the following:
using system;
// Remap some common POSIX signals.
do (trap SIG_TRAP) [SIGHUP, SIGINT, SIGTERM];
loop = catch handler process $$ loop
with handler (signal k) = printf "Hey, I got signal %d.\n" k end;
process = sleep 1; // do something
Running the above loop function enters an endless loop reporting all
signals delivered to the process. Note that to make this work, the
tail-recursive invocation of loop must immediately follow the
signal-handling code, so that signals don’t escape the exception handler.
Of course, in a real application you’d probably want the loop function to
carry around some data to be processed by the process routine, which then
returns an updated value for the next iteration. This can be implemented as
follows:
loop x = loop (catch handler (process x))
with handler (signal k) = printf "Hey, I got signal %d.\n" k $$ 0 end;
process x = printf "counting: %d\n" x $$ sleep 1 $$ x+1;