[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Lecture 2 Notes




Title: Math 481/581 Lecture 2: UNIX Intro

Math 481/581 Lecture 2: UNIX Intro

© 1998 by Mark Hays <hays@math.arizona.edu>. All rights reserved.


Today we'll go through a bunch of UNIX concepts and commands. Do not try to write it all down -- there's too much of it.

A copy of these notes are available in the course mailing list archives:

http://majordomo.math.arizona.edu/scicomp


Overview

UNIX is a multitasking multiuser operating system.

Multitasking  This means that the operating system can run multiple programs simultaneously.
Multiuser  This means that multiple users can use the system concurrently, without interfering with one another.

UNIX's primary strength is its ability to process files. A file is a linear array of bytes stored on disk.


The UNIX Filesystem

UNIX stores all files in filesystems. There are several types of files that appear in the filesystem: The ones marked with a "*" are the ones you are likely to encounter as a regular user.

The filesystem maintains the following information for each file:

Regular files are just what you expect: they contain some sort of data.

Directories lend structure to the filesystem. A directory is a container for other files and directories. If you have used UNIX or Windows before, you are probably familiar with this. Macintosh users usually refer to directories as "folders" -- but it's the same concept.

The fact that directories can contain other directories implies a containment hierarchy that looks like an inverted tree:

                       /
                    .  . .
                .     .    .
            .        .       .
        .           .          .
    .              .             .
 file1          disk1/         aplm1/
                  .
                  .
                  .
               people/
             .        .
           .            .
        hays/           rbc/
       .    .          .   .
      .      .        .     .
   file1    file2   file1  dir1/

The toplevel node of the tree is called the "root" of the filesystem and is denoted "/". The root directory contains serveral files and subdirectories. UNIX uses the "/" character to separate the subdirectory component names from one another. Therefore, the absolute pathname of the file "file1" in hays' home directory (that's me) is denoted "/disk1/people/hays/file1".

Any file in the filesystem is contained in a directory; in this way, each file can be uniquely specified by its absolute path name. Another way to look at it is that a file that is not contained in any directory is totally inaccessible (ie, it has been deleted).

Every directory contains two special directories: "." and "..". The "." directory points to the directory in which it is contained; for example, the absolute paths "/a/b/c/." and "/a/b/c" are identical. The ".." entry points to the parent node in the tree and so "/a/b/c/.." is the same as "/a/b". The root node has no parent so "/.." is the same as "/".

Each user on a UNIX system has his/her own home directory. This is a directory that is owned by the user and contains that user's personal files. Your home directory is like your desk: it is your private workspace.

Life without directories (as in early versions of DOS) would be extremely annoying. For one thing, it would be impossible for multiple users to have files with the same name. For another thing, finding your files would be nearly impossible: the departmental Sun cluster consists of over 150,000 files -- imagine if they all "lived" in a single location!

If you are a Mac or DOS/Windows user you may have noticed that there are no device designations like "C:" or "D:" in the UNIX filesystem tree. In UNIX, each physical disk contains files and subdirectories just like the overall tree. To construct the overall tree, the root directories of each physical disk are grafted into the overall tree at the appropriate places (called "mount points"). A typical Math Department Sun workstation's filesystem tree is built from several disks. Hiding the devices from the user makes it easier to add disks to the system without inconveniencing users.


UNIX Commands

When you log in to shell.u.arizona.edu, you are presented with the shell prompt "> ". The shell is a special program that waits for you to input a command by typing the command and pressing the RETURN key. At this point, the shell executes the command. The command might produce some output text -- this text will appear on your terminal. When the command completes, the shell reissues the prompt and awaits the next command.

The syntax for most UNIX commands is:

command [options] [other args]

The square brackets denote optional arguments which vary from command to command. The "command" part specifies the operation to be performed. You can think of it as the verb. The "options" are used to modify the behavior of the command and can be thought of as adverbs. The interpretation of the "other args" depends on the command but can most often be thought of as direct objects.

Some commands are interpreted directly by the shell; for example, the "exit" or "logout" command is recognized by the shell as a signal to terminate. The result of this command is to log you off of the system.

Most shells also support the notion of aliases. Aliases are shortcuts that allow you to execute groups of commands that are annoying to type over and over. For example, I have an alias called "pss" that shows all of my jobs on the machine. The "pss" alias is expanded by the shell and, in reality, causes the following to be executed:

ps awux | fgrep hays

If the command you type is neither a shell-builtin nor an alias, the shell searches the system for an executable file whose name is the same as the "command" you typed. If such a file is found, it is executed and the rest of the arguments you typed are passed to the command for further interpretation. If such a file cannot be found, an error message will be printed. We'll see how the search process works a little later.

The typical UNIX system contains a large number of executable commands; for example, there are almost 2,000 executables on my Linux box at home. I probably know what 25% of them do and I probably use 5% of them on a daily basis. Part of the philosophy of UNIX is that complex commands are built out of simple commands. This creates modular layers of software, which is generally considered a Good Thing from a design standpoint.

Unfortunately, this paradigm, while nice in theory, tends to cause frustration among new users. The intent of the UNIX portion of this course is to make the learning curve easier by showing you how to make effective use of the top 50 or so commands.

You won't find too many "all in one" monolithic software packages on most UNIX machines. For example, you won't find Microsoft Office, which incorporates a word processor, a spreadsheet, a database, and a bunch of other stuff in a single neat package. Part of the reason for this is that such packages are expensive to develop. Most of the available UNIX software is free: it has been developed by universities and government agencies and subsequently released free of charge. It is hard to imagine the NSF funding a $10 million grant to develop a UNIX Office clone.

Instead, you will normally find several different spreadsheets, databases, and text editors. The advantage of this is that it allows each user to use the programs he/she is comfortable with. For instance, some people like to use the "emacs" editor, some like "vi", some like "pico", etc. The disadvantage is that the tools are not usually well-integrated in the way that Office is. The problem with monolithic packages is that they tend to be less flexible: the end-user is trapped in the integrated environment.

Before delving into specific commands, you need to know that a UNIX process is a command in a state of execution. This is important because every process, regardless of its associated command, carries a certain amount of administrative baggage with it.

Now let's look at some of the more common UNIX commands.


Directory Commands

Every process has a "current working directory" (CWD) associated with it. For example, when you first log in, a shell process is started on your behalf and its current working directory is set to your home directory. There are 3 common cases for commands that expect one or more filename arguments: Being able to use relative pathnames saves a lot of typing.

Here is a list of directory-related commands:

pwd The pwd command prints out the absolute pathname of your current working directory. In other words, it tells you where you are.
cd [dirname] The cd command changes your CWD to "dirname" (which can be absolute or relative). If you omit "dirname", you change to your home directory.
ls [-a -l -F]   [filenames] The ls command lists the contents of one or more directories. If "filenames" are specified, those items are listed; otherwise, the contents of the CWD are listed. If the "-l" option is given, you get a long listing which shows you file sizes, modification dates, permissions, etc. If the "-F" option is given, the files in the listing will be marked by type. Normally, files whose name begins with a "." will not be included in the output; the "-a" option reverses this behavior.
mkdir dirnames The mkdir command creates the specified directories.
rmdir dirnames The rmdir command removes the specified directories. The operation will only succeed if the directory is empty (contains no files or subdirectories -- other than "." and "..").

Here is a sample transcript using these commands. Things that I type are in bold:

	> pwd			# where am I?
	/disk1/people/hays
	> mkdir dir1		# make a directory
	> ls -F			# look at files by type
	file1   program*   dir1/
	> cd ../rbc		# try to go to Bob's directory
	/disk1/people/rbc: Permission denied.
	> rmdir dir1		# remove a directory
	>


File Commands

The following commands are useful when dealing with files:

cat [filenames] This is similar to the DOS "type" command. It concatenates the specified files (or stdin if none are specified) and sends the results to stdout. The terms "stdin" and "stdout" are explained below.
more [filename] The "more" program is a "pager" -- it displays files on your terminal one screenfull at a time. This is useful when you need to look at a large file. When you are looking at a file with more, the following keystrokes have the following effects:
q Causes "more" to quit.
space  Shows the next screenfull of text.
less    [filenames] The "less" pager is more's big brother. It has the following additional keystroke commands:
b Shows the previous screenfull of text.
/  Less will prompt you to enter a search string. It will search for and highlight any matches found. You can jump to the next match by pressing "/" again and hitting the RETURN key.
?  The same as "/" but the search proceeds backwards.
h  Shows a (long) list of all available keystroke commands.
rm [-i -r]    filenames The rm command removes the specified files. If you give the "-i" option, rm will ask for confirmation before removing each file. If you give the "-r" option and any of the "filenames" is a directory, that directory and all its files and subdirectories will be recursively removed. This one is dangerous! UNIX, due to its multiuser nature, DOES NOT have an "undelete" feature.
mv [-i] srcs    dst The "mv" command has two purposes: it moves and renames files and directories. If dst is a directory and "srcs" consists of multiple items, it moves each of "srcs" to "dst". Otherwise, "srcs" must consist of a single item, and a move/rename operation is performed. If the "-i" option is given, mv will ask for confirmation before clobbering existing files.
cp [-i -r] srcs    dst The "cp" command copies files in a manner similar to "mv", except that none of "srcs" can be a directory unless the "-r" (recursive) option is given.
fgrep string    [filenames] This command prints (to stdout) all lines containing "string" in the specified "filenames" (or stdin, if no filenames are specified).
touch    [filenames] This command creates any of the specified if they do not already exist. It also sets the modification time of each of the files to the current time. It's a handy way to create to create a bunch of files while you are playing around with the other commands.


Miscellaneous Useful Commands

Here are some miscellaneous commands:

cal mm yyyy  Prints a calendar for the specified month and year.
clear Clears the terminal window.
date Prints the current date.
finger user Shows information about another user.
quota Reports your disk quota in kilobytes. If you go over quota, attempts to create new files or append to existing files will fail. At u.arizona.edu, your "hard limit" is 11 MB.
rlogin host Connect to another UNIX system using the RLOGIN protocol.
telnet host Connect to another UNIX system using the TELNET protocol.


Pipes and Alligators

Several of the commands described above wanted to make use of things called "stdin" and "stdout". When the shell executes a command, that command is normally connected to three streams. By default all three streams are connected to your terminal.

stdin This is a byte stream through which a program can read data.
stdout  This is a byte stream onto which a program can write data.
stderr This stream is almost always left connected to the terminal. Programs normally use this stream for printing error and status messages.

The nice thing about streams is that they can be redirected. For example, suppose that we want to see all lines containing the string "hoho" in the files a, b, c, d, and e. One way to do it is to:

	> fgrep hoho a
	...
	> fgrep hoho b
	...
	> fgrep hoho c
	...
	> fgrep hoho d
	...
	> fgrep hoho e
	...
This will work. However, suppose that there are so many matches that all of the results scroll by at high speed.

Well, we'd like to use "cat" to join all 5 files together, use "fgrep" to get the matching lines, and "less" to view the output. Enter the "pipe" operator:

	> cat a b c d e | fgrep hoho | less
It works like this: the shell executes less, fgrep, and cat. The stdin stream of the "less" process is connected to the stdout stream of the "fgrep" process. Similarly, fgrep's stdin is connected to cat's stdout. So cat acts as a source of data, fgrep acts as a filter and processes the data as it comes in, and less acts as a data sink. All of the stderrs are left connected to the terminal -- so that if there are any errors, you will get to see them.

The command you submit to the shell is referred to as a job. The job may consist of a single UNIX command, or it may be a command pipeline as above. Each job consists of one or more processes; for example, the pipeline shown above becomes three processes.

Note that programs do not necessarily make use of the standard streams; for example, text editors do not typically make use of stdin or stdout. In other words, the three streams are available for use -- the online documentation for each program will give you the exact details.

Now suppose that the results of fgrep are so interesting that we'd like to save them to a file called "results". Enter alligator #1:

	> cat a b c d e | fgrep hoho > results
The ">" operator connects that command's stdout to the specified file -- so any output generated by the command goes into the file. If the file "results" already exists, the fate of the command depends on your shell and its configuration. Some shells refuse to clobber existing files with an alligator. You can get around this by first doing an "rm results" or by disabling this feature with "unset noclobber" (or your shell's equivalent).

Assuming that your shell either doesn't support "noclobber" or that "noclobber" has been disabled, the ">" operator will erase the previous contents of "results".

Oops! Suppose we also want to look for "hoho" in files "f" and "g", too. One way to do it is to rerun the whole thing with the extra files:

	> cat a b c d e f g | fgrep hoho > results
A faster way (in terms of typing and CPU time) is to use alligator #2:
	> cat f g | fgrep hoho >> results
The ">>" operator works just like ">", except that it appends to the file instead of overwriting it.

Can you redirect stdin? Yup. Alligator #3 does the trick:

	> fgrep hoho < a | less	# these 2 do the same thing
	> fgrep hoho   a | less
The "<" operator connects the command's stdin to the specified file.

The syntax for redirecting stderr is shell dependent and will not be covered at this time.


Globbing

Most shells support filename globbing; for example, if you want to remove all TeX DVI files in your CWD (and confirm each removal), you can simply do:
	> ls
	a.dvi   a.tex   b.dvi   b.tex
	> rm -i *.dvi
	rm: remove `a.dvi'? y
	rm: remove `b.dvi'? n
	> ls
	a.tex   b.dvi   b.tex
	>
If you don't want to bother confirming each removal, you can do:
	> rm *.dvi
	> ls
	a.tex   b.tex
Be advised: the asterisk is dangerous -- it matches anything. The addition of a single space results in disaster:
	> rm * .dvi	# note extra space
	rm: .dvi: No such file or directory
	> ls		# huh?
	>		# aieeeee!
Most shells define the following metcharacters for filename globbing:

* Matches any string of zero or more characters.
? Matches any single character
[range]  Matches any character in "range". For example, [abc] matches "a", "b", or "c". The pattern [A-Z] matches any uppercase letter. [^A-Z] matches anything that is not an uppercase letter. Finally, the pattern "[A-Za-z024]" matches any letter and one of "0", "2", and "4".

You can use globbing to define some pretty complicated sets of filenames: consider "[A-Z]*[^0-9]*.dv?".


Job Control

Normally when you execute a command from the shell, you cannot type any more commands until the current command completes. Most shells provide some sort of job control facilities. Those that do have such features generally implement (at a minimum) the commands described below.

To terminate a running command, you can normally hit CTRL-C. This means that you hold down the key marked "CTRL" and then hit the "C" key. Normally, the running program will terminate and the shell will re-issue its prompt.

Sometimes you would like to suspend a job for a few seconds. To do this, hit CTRL-Z. This will immediately put the job to sleep, and the shell will emit a message saying that the job is stopped and re-issue its prompt. When you are ready to restart the job, issue the "fg" command to bring the process back "into the foreground". The shell will wait for the job to complete before issuing another prompt.

If you want the job to simply run without further interaction by the shell, you have two choices. You can either start it in the background using an ampersand:

	> cat f g | fgrep hoho >> results &
Or you can start the job, suspend it, and use the "bg" command to tell it to run in the background. Either way, you will be able to execute other commands while the background commands run to completion. When it finally does complete, the shell will print a message notifying you of this fact.

Note that if a background job expects user input, it will stop itself until you bring it to the foreground with "fg" and provide the necessary input. Also, if the last command in a pipeline produces output on stdout, this output will appear on your terminal.

To get a list of jobs running under your current login shell, issue the "jobs" command. You will see something like the following:

	[1]-  Running                 emacs &
	[2]+  Running                 netscape &
You can manipulate jobs by number by using "fg %1", "bg %2", etc.

Every UNIX process is assigned a unique number called a "process ID". To see a list of processes running under the current login shell, use the "ps" command. The output on my machine looks something like:

	> ps
	PID TTY STAT  TIME COMMAND
	171   1 S    0:00 /bin/login -- hays 
	176   1 S    0:00 -bash 
	223   1 S    0:00 sh /usr/X11R6/bin/startx 
	256   1 S    0:00 wish /home/hays/bin/iclock 
	257   1 S    0:00 kaudioserver 
	...
	>
The PID column shows the process ID, and the command column shows the name and argments for the command that is running.

To get a list of all running on the machine, use "ps awux" (on shell.u.arizona.edu -- on other systems, the options will be different). For example,

	> ps awux
	USER       PID %CPU %MEM  SIZE   RSS TTY STAT START   TIME COMMAND
	bin        107  0.0  0.5   836   324  ?  S   07:52   0:00 portmap 
	hays       176  0.0  0.9  1148   628   1 S   07:52   0:00 -bash 
	hays       257  0.0  2.8  5092  1836   1 S   07:52   0:00 kaudioserver 
	hays       258  0.0  3.9  4924  2532   1 S   07:52   0:00 kwmsound 
	root        96  0.0  0.5   832   372  ?  S   07:52   0:00 crond 
	root       118  0.0  0.5   816   324  ?  S   07:52   0:00 inetd 
	...
	>

If you need to kill a job that won't die with CTRL-C, you can suspend it with CTRL-Z, do a "ps awux | fgrep username", find the process ID of the thing you'd like to kill, and terminate it for sure with "kill -9 PID". You should use this as a last resort.

If you try to log out with jobs suspended, the shell will usually complain. The best thing to do is run "jobs" and "fg" each job that is marked "Stopped" and terminate it cleanly. Oftentimes, you'll have a suspended editing session -- you might want to do one last save before logging out to avoid data loss.

If you log out with background jobs running, they continue to run until they complete. If you log in later and want to see if they're still running, neither "jobs" nor "ps" will help -- you'll have to run "ps awux | fgrep username | less" to find them.

Every process has an associated priority that determines how often it gets to run. If you are going to run a background for a long period of time (more than 30 minutes, say), you should start it with a low priority by using the nice command:

	> nice command &
Late at night when nobody is using the machine, your jobs will get lots of CPU time. But when people are using the machine during the day, running it at low priority tends to keep it out of other users' (and your) way.

In summary:

CTRL-C Terminate the current foreground command.
CTRL-Z Suspend the current foreground command.
command & Execute "command" in the background.
fg [job] Bring a suspended or background job into the foreground.
bg [job] Run a suspended job in the background.
jobs Display a list of currently active jobs.
ps [awux] Display a summary of processes running under the current login shell, or, if "awux" is specified, display a detailed listing of all processes running on the machine.
kill -9 PID Terminate (with extreme prejudice) the process with process ID PID. Use this as a last resort.
w Displays the current time, system up-time, load average, and prints a summary of who is logged in.
nice cmd Execute "cmd" at low priority.
nohup cmd & Execute "cmd" in such a way that it keeps running after logout -- only Korn Shell users should need to worry about this.


Getting More Information

Almost all of the commands described so far have many other useful options. One nice thing about UNIX is that almost every command has associated online documentation that describes what it does and how to use it.

If you are going to use UNIX, you will need to learn how to read manpages. If you are trying to locate some particular piece of information, do not read the manpage from beginning to end like a novel. In a moment, we'll see how to read manpages using "less". Since "less" lets you search, you can home in on what you are looking for without reading 2000+ lines of computerese gobbledygook.

A typical manpage consists of several sections:

To access the online documentation for "command", type

	> man [section] command
The UNIX manual is organized into sections. User command appear in section 1, which is searched first. If you type "man time", you will get a description of the "time" command in section 1. This command prints runtime statistics for a command. If you type "man 2 time", you get the manpage for the time() C language function call, which tells you what time it is. Section 1 of the manual is of primary interest to us. Sections 2 and 3 are of interest to C programmers. Sections 4-8 are mostly used by UNIX system administrators.

To obtain a list of likely commands pertaining to "topic", type

	> man -k topic		# on some systems
	> man -k topic | less	# on other systems
The output of "man -k" is a list of manpages, sections, and a brief synopsis of the manpage. Once you find what you are looking for, you can do a "man" on the appropriate command for detailed information. Again, it takes a little savvy to narrow the list. The command "man -k copy" produces a pretty long list, whereas "man -k copy | fgrep file | less" narrows it down to a few possibilities (commands dealing with copying files). The "cp" command heads the resulting list on my system.


Environment Variables

Each UNIX process has an associated set of environment variables. Each such variable has a name and an associated value -- both the name and the value are text strings.

Environment variables are normally set up when you first log in. Your shell reads a set of configuration files (which are simply text files that contain shell commands) which set environment variables, define shell aliases (remember them?), etc.

Environment variables have no significance in and of themselves. In fact, it is up to each UNIX command to decide which (if any) environment variables it observes and how to act on the associated value.

To see a list of all your environment variables and their values, use the env command:

	> env
	TERM=xterm
	AUTHSTATE=compat
	SHELL=/bin/tcsh
	HOME=/home/u7/mhays
	USER=mhays
	PATH=/usr/bin:/etc:/usr/ucb:/usr/bin/X11:/usr/local/bin:/usr/local/bin/X11:.
	TZ=MST7
	...
Hmmm. All kinds of stuff. You can look at a sorted list with:
	> env | sort | less

There are two particularly important environment variables that are always defined: HOME and PATH. HOME contains the absolute pathname of your home directory. It can be handy sometimes:

	> cd /a/b/c/d/e		# go far far away
	> echo $HOME		# see what's in HOME
	/home/u7/mhays
	> cp $HOME/file .	# use HOME

Recall that several things happen when you give the shell a command: first, the shell looks to see if it's a command built in to the shell. If not, it sees whether there is an alias that matches what you typed. If not, the shell searches the system for an executable with the same name as the command.

The PATH variable controls this search process. The value of PATH is a list of absolute pathnames separated by colons. When the shell is looking for an executable, it searches each of these directories in order until it finds an executale file of the correct name. For example, if you type telnet shell.azstarnet.com and the telnet program lives in /usr/ucb/telnet, the shell really executes /usr/ucb/telnet shell.azstarnet.com. By modifying your PATH, you can make commands disappear, make new commands appear, or replace existing commands with your own special version.

To find the absolute path to a particular command, you can use the "which" command:

	> which ls
	/usr/bin/ls

The MANPATH variable is used by the "man" program to locate online manual pages. Like PATH, it's value is a colon separated list of absolute pathnames.

The PAGER variable is used by many programs to select a default PAGER to use. If PAGER is undefined, a reasonable system default, usually "more", is used. By setting PAGER to "less", you can cause "man" and your mail program to use "less" instead of "more".

Programs such as mail and news readers often want to start up a text editor on your behalf; eg, to edit a mail message. These programs usually check the EDITOR and/or VISUAL variables for the name of the editor to use.

How do you set environment variables? The answer depends on the shell you are using. The default shell on the u.arizona.edu system is called "tcsh" and you set environment variables with:

	> setenv PAGER less
	> man ls		# now you're got less!
If you are not using "tcsh", you will need to consult your shell's manpage for details.

We will see how to make such changes permanent in a couple of lectures.