These are the notes on compilers and netlib --Title: Math 481/581 Lecture 21: Netlib, C and FORTRAN Compilers
Today we'll cover Netlib and the usage of C and FORTRAN compilers.
If you run emacs on a script, you see the entire source code for the program. You can copy the script into your account, modify it, run it, and immediately see the effects of your changes.
If you run emacs on a native machine executable (hereafter referred to as a "binary" [because it ain't ASCII a'tall]), you will see gibberish. If you alter some of the gibberish and save your changes, the program will almost certainly cease to function --- when you run it, you will probably get a "core dumped" message.
One of the differences between these two types of programs is that scripts are interpreted and binaries are compiled.
The executable file for a program such as "fgrep" contains raw CPU instructions. When you run the fgrep program, the contents of the executable file are loaded into memory (well, not really, but close enough for us) and the CPU directly executes the in-memory copy as a sequence of CPU instructions.
In other words, the gibberish you see when you type
emacs /usr/bin/fgrep
has meaning to the computer's
CPU. It is important to know that different types of CPUs
interpret the gibberish differently -- so a machine executable
that works on a Sun will not work on a PC or SGI. Different
operating systems or even different versions of the same OS
may also have different binary executables.
So how do "they" come up with machine executables? On UNIX systems, most binaries some from some code written in the C programming language. There is a program called a C compiler that translates C source code into binary executables that can run on the local CPU.
In other words, the compiler translates source code for some language into raw CPU instructions. C, C++, and FORTRAN all work this way. In a minute, we'll see why you might want to do this.
Shell scripts and things like maple, mathematica, and matlab
programs are interpreted as opposed to compiled. For
example, your login shell prompts you to enter a command. When
you do so, the shell figures out what to do and then does it
for you. In other words, interpreters execute program statements
directly. Another example: when you invoke the svd
function in matlab, matlab ends up calling a C/FORTRAN routine
on your behalf that computes the singular value decomposition
for you. This routine is probably very long and horribly
complex, but you don't care -- all you have to say is "svd".
So why in the world would you want to write code in C or FORTRAN? There are a number of reasons:
Of course, there are downsides to using compiled languages:
You may have heard the word "RAD" (in a sentence not containing the words "dude" and/or "gnarly"), which stands for "Rapid Application Development". This term is usually used to describe specific programming languages, but it also embodies a program development philosophy that's best explained by example.
If you are going to write a code to solve the 3D Navier-Stokes equation, you could start by scribbling an outline of your program on a piece of toilet paper, and then fire up emacs and begin:
int main(int argc, char *argv[]) { /* your code here */ return 0; }In a couple of months, you might have a working code that consists of several thousand lines. Or you might end up with several thousand lines of junk.
If you use the RAD approach, you will probably prototype your code in something like matlab. You can develop pieces of the code as small, independent scripts that can be tuned and debugged separately. You'll usually run "small" test cases to check yourself as you go along.
Once your algorithm is working properly (e.g., near the parameter values of interest), you need to decide if the code you have:
If the answer to either question is "no", you need to decide whether it is worthwhile translating your code into C or FORTRAN (or something else).
As always, you have to make intelligent choices. For example, writing your own SVD in C probably isn't going to gain you any speed over matlab's SVD. Your best bets for making the choice between a compiled or interpreted programs are to
A word on the speed issue: if you decide to recode your application in a compiled language because it runs too slowly, remember to factor in your own programming time. For example, you might have a code that you need to run five times. If it takes two days per run, this works out to ten days of runtime, plus the week it took you to develop it, for seventeen days total.
If you recode it in C, you might get it down to six hours of runtime, which appears to be a Big Win, until you add in the three months of effort it took to recode the thing!
On many systems, you enjoy a large gain in speed if you write your code in FORTRAN instead of C. By "large", I mean a factor of two to three (based on personal experience).
Language | Executable Name |
---|---|
C | cc |
C++ | CC, c++, C++, or cxx |
FORTRAN 77 | f77 |
FORTRAN 90 | f90 |
If your system has the FSF's GNU compilers installed, you can reach the compilers at:
Language | Executable Name |
---|---|
GNU C | gcc |
GNU C++ | g++ |
GNU FORTRAN 77 | g77 |
You'll need to consult your system's manpages for details on your particular platform.
Sometimes, you'll have the GNU compilers available to you
and sometimes not. For example, cc
is gcc
on Linux systems.
Some evil code requires gcc
in order to
compile, but this is pretty rare. As a rule of thumb, you'll
want to use your vendor-supplied compilers instead of gcc
on DEC UNIX and SGI IRIX systems.
I'm not sure about Sun Solaris 2+, IBM AIX, or HP/UX, as I
have never used these operating systems. On SunOS 4.x (aka
Solaris 1.x), you definitely want to use gcc
if it is available.
My experience has been that following the above recommendations
will gain you about a factor of two in speed (i.e., code compiled
with gcc
on DEC UNIX systems runs half as fast as if
you had compiled it with cc
; on SunOS 4.x, the
opposite is true).
Finally, it appears to be the case that gcc
version
2.8.x produces binaries run about half as fast as those
compiled with gcc
version 2.7.x. You can type
gcc -v
to find out what version you have. I have
not had an opportunity to compare different versions of vendor-supplied
compilers.
The important thing is that choosing the right compiler for the language your code is written in can gain you a significant speedup. We'll touch on this briefly in the section on the optimizer.
Before getting into this, let's take a brief look at the two phases of compilation. First, you run the compiler on your source code to produce object files. Next you run the compiler again to combine all the object files and support libraries into the executable.
During the first compilation phase, your source code is
translated into machine instructions by the compiler.
For example, if you have a file of C program source called
prog.c
, you can achieve this with the following
command:
cc -c prog.cThe "-c" flag tells the compiler to translate
prog.c
to machine instructions and place these instructions in the
object file prog.o
. The command for FORTRAN, etc.
is analogous.
When you have created an object file for each source file, you
are ready to build the final executable. Although it is possible
to invoke the linker directly (available via the ld
command on many systems), it is best to let the compiler invoke
it for you. Continuing the above example, we'd link prog.o
to produce an executable named prog
with the
following:
cc -o prog prog.o -lmThe "-o" flag tells the compiler what to name the output file. If you don't use this flag, the compiler picks a default output filename, usually
a.out
.
The "-lm" bit tells the compiler to also link in the math runtime library. Basically, a library is an archive of plain old object files that perform a bunch of operations for you. In the C language, the math library contains the trig and transcendental functions such as sin, cos, log, etc. If your source code calls any of these functions and you forget to add in the "-lm", the linker will be unable to produce the final executable and will issue an error message similar to "unresolved symbol" or "unresolved external".
You can link against your own libraries, too. For example, either
of the following commands will link your code against a library
called libjunk.a
in the mystuff
subdirectory in your account:
cc -o prog prog.o $HOME/mystuff/libjunk.a -lm cc -o prog prog.o -L$HOME/mystuff -ljunk -lm
You need to know two commands to create your own library:
ar
and ranlib
.
Making a library archive is pretty simple. If you want to
build a library called libjunk.a
from the
object files funcs1.o
, funcs2.o
,
and funcs3.o
(these files presumably
contain a bunch of functions or subroutines that you
commonly use), type:
ar crv libjunk.a funcs1.o funcs2.o funcs3.o
On some systems, you need to "bless" the library you just
built with ranlib
. Move the library to its
final resting place (for example, the directory
$HOME/mystuff
) and type:
ranlib $HOME/mystuff/libjunk.aIf your system has a ranlib command, you probably need to use it; otherwise, you can skip this step.
You don't have to name your libraries
lib<whatever>.a
; you can,
in fact, call them whatever you like. The advantage
of using the lib<whatever>.a
scheme is that you can use the
-L<path> -l<whatever>
syntax (which is considered Good UNIX Form).
There are a couple of good reasons to make your own
libraries. First, you save a little compilation time
becuase you don't have recompile the files
funcs1.c
, funcs2.c
,
and funcs3.c
every time you rebuild
one of your programs.
There is an even nastier problem with having separate
copies of the three "funcs" files in each of your
source code directories. If you find a bug in, say,
funcs2.c
you will have to make changes
in every copy of this file. It is very easy to
forget to do this --- the end result is that you'll
get bitten by the same bug six months from now!
By putting your functions into a library and maintaining the library's source code in a single location, you largely avoid this problem. If you modify the library's source code, you simply need to relink each program that uses the library.
To compile a code with optimization, use the "-O" flag. Most compilers support various levels of optimization; the respective manpage will give you all the details. Optimization only affects the compile stage; it has no effect when linking.
Here's how to turn on the optimizer:
cc -O -c prog.c cc -o prog prog.o -lmIn general, your code will run twice as fast if you compile it with optimization.
Finding and correcting logic errors is beyond the scope of this course. Normally code containing such errors will execute, but give the wrong answer. Logic errors usually reflect a problem with your algorithm or your implementation of the algorithm. Going over your code with a fine toothed comb is usually your only recourse in this case.
Core dumps occur when your code performs some illegal operation, such as attempting to access memory that is outside your process' address space.
When this happens, the operating system causes your program to abort and write the process' entire virtual address space into a file called "core".
This file can be very useful for debugging. In particular, it can help you figure out exactly where your code performed the fatal operation.
For most compilers, you need to turn off the optimizer and compile with the "-g" flag in order to make use of the core file. Our ongoing example looks like:
cc -g -c prog.c cc -o prog prog.o -lmWhen your code dumps core, you can figure out where it died by typing
dbx ./prog core
. This will get you to
the dbx debugger prompt. If you type "where", a stack trace
will be printed. Type "quit" to get out of dbx. Dbx has online
help available in its manpage. You can also type "help" at
the prompt for detailed help.
If you compile your code with the GNU compilers, you'll want to use the GNU debugger "gdb" instead of dbx.
Finally, a word of warning: NEVER attempt to link object files produced with your vendor's C compiler against object files produced with gcc. Sometimes it works, but sometimes it doesn't. When it doesn't work, it isn't at all clear what's wrong -- in other words, it will probably take you a long, long time to figure out what the problem is.
is a fairly comprehensive resource for all sorts of freely available mathematical software. Most of the code is written in FORTRAN-77, but some things have been ported to C, C++, and FORTRAN-9x.
The quality of the code is generally very high (from a numerical analysis standpoint). Unfortunately, the code tends to be a little difficult to use. If you need the best available implementation of an algorithm (e.g., your problem is numerically sensitive), Netlib is a good place to look.
If you're just getting started, or want to get something up and running quickly, I'd recommend prototyping in something like Matlab. If you need extra speed, you can port the Matlab code to C or FORTRAN and use something like Numerical Recipes. If Numerical Recipes doesn't supply the algorithm you want, or if you decide that their implementation is not robust enough, you can probably find some code on Netlib that'll do what you need.
The Guide to Available Mathematical Software, or GAMS, is at:
GAMS can be extremely helpful when you are trying to locate something on Netlib.