Unix Operating System

Marc P. Thomas

California State University, Bakersfield

This document is for classroom use and is taken from the Encyclopedia of Information Systems , published by Academic Press, and copyrighted by Academic Press.


Outline

  1. Interactive Multiuser Operating Systems
  2. The Design Objectives of Unix, File-store Organization, Text Processing, and Programming.
  3. The Role of the C Programming Language with Regard to Portability and Reliable System Software.
  4. Process Control: Signals and Fork.
  5. Error Logging and Recovery From System Failures.
  6. Modifiability and Application Programmer Interface (API).
  7. The User's Perspective on Unix.

Glossary


Introduction

Strictly speaking, UNIX (in capitals) has been a registered trademark of UNIX System Laboratories (first owned by AT&T, then sold, in 1993, to Novell, Inc., then sold, in 1995, to SCO). The Unix® Trademark is currently owned by The Open Group. But the word "Unix" has also come to refer to a collection of very closely related operating systems (e.g. AT&T UNIX System V, BSD 4.3 Unix, Sun Microsystems' Solaris, Silicon Graphic's Irix, DEC/Compac Tru-64 Unix, IBM's AIX, Hewlett-Packard's HP Unix, FreeBSD, NetBSD, SCO UNIX, Minix, Linux, and many others) which have found use on a wide variety of computing platforms, ranging from single-user personal computers to large network server multi-processor machines. Many of these variants are proprietary, but others have source code freely available. While there are differences, and disputes concerning which version is the "standard," it is the design similarities and the wide range of computing hardware supported that are significant. For the purposes of this article, features of the Unix operating system will refer to those features which are common to almost all modern versions of Unix.

It is helpful to give a brief outline of Unix history. UNIX was the creation of a group of researchers at Bell Laboratories (now Lucent Technologies) during the years 1969-1970. This effort was led by Ken Thompson, Dennis Ritchie, M. D. McIlroy, and J. F. Ossanna. The desire was to produce a multiuser, multiprocessing operating system which would support research in computer science, but which had modest hardware requirements (unlike the earlier MULTICS project). Users were expected to connect to the system primarily via terminals over RS-232 serial communication lines. Additional improvements were made over the next few years. The kernel was rewritten in C in 1973. By 1974 Bell Laboratories Version 5 of the UNIX system was available for a nominal charge with full source code (but, of course, officially unsupported). Within academic computer science programs it quickly became a popular choice. The first commercial version was released as System V in 1983 and, in some sense, became the first mature standardized version of UNIX.

Interest in UNIX spread and spurred modifications which added functionality. The most successful were the modifications introduced by researchers and graduate students at the University of California at Berkeley during the late 1970's and early 1980's. One of the most significant milestones was the addition (in version BSD 4.2 in 1983) of TCP/IP networking via the software abstraction of a network socket . This made it possible for application programmers to write portable code which accessed a network. The mature version of this line of development was BSD 4.3 Unix.

Some commercial vendors chose to market Berkeley variants, two popular ones being Sun Microsystems' SunOS and Digital Equipment Corporation's Ultrix. Other vendors, including Data General, IBM , Hewlett-Packard, and Silicon Graphics, adopted AT&T's System V. A third version, called Xenix, was developed by Microsoft and licensed to the Santa Cruz Operation (SCO). It incorporated some PC-specific features such as support for the PCLAN NETBIOS protocol. This proliferation added greatly to the popularity of Unix but was at odds with software portability issues.

Unix was primarily attractive to smaller startup ventures because the cost of writing an operating system from scratch is prohibitive to most small vendors. In addition, the Unix application programmer interface (API) is quite flexible and this saves coding time. Many of the machines marketed were in the workstation class, that is, their performance and cost put them above personal computers such as the Apple Macintosh and IBM PC but below most minicomputers. Since this was a smaller market than the rapidly growing personal computer market and since there were at least three competing standards for Unix, most Unix software was priced much higher than personal computer software. This situation tended to keep Unix out of the personal computer market where PC-DOS, MS-DOS, and later, Windows 3.1/95/98/NT, would develop as the de facto standards on the IBM PC clone. Only with the growth of Linux in the 1990's did truly low cost Unix software become available for personal computer users.

An attempt at consolidation and the adoption of a standard for Unix in the late 1980's produced two camps. One group was formed by the AT&T and Sun Microsystems agreement to merge features of both their systems as System V Release 4 (SVR4). This was marketed by Sun under the Solaris name. A second group (including Apollo Computer, Digital Equipment Corporation, Hewlett-Packard, and IBM) was formed around the Open Software Foundation (OSF) agreement. By this time the line between BSD and System V Unixes had become blurred and the products of both of these groups had features drawn from both parents.

The full Unix operating system had been ported only to the top end of the early personal computer market, since Unix required cpu, memory, and disk resources which were well in excess of what was available on the majority of these early machines. However, the growing popularity of the IBM PC clone as a personal computing platform, and the need to have a system which students could work on encouraged Andrew Tanenbaum to write a small version of Unix in 1986 called Minix, for the PC platform. Although it originally supported multi-processing with a single user and only floppy disk drives for storage, it evolved over the years to Minix 2.0, with many additional features not found in the early versions. The full source code is available. Unlike most Unix operating systems, Minix uses a microkernel-based design.

Richard Stallman had founded the GNU Project in 1983 to supply Unix compatible compilers, development tools, and utilities under a copylefted software license. This excellent package of freely available software with sources (and with the encouragement to make freely available enhancements) became a mainstay to computer science programs everywhere. Even more importantly, the GNU Project enabled a new Unix operating system to rise to prominence during the 1990's.

Graphical user interfaces for Unix were available in the early 1980's but all were proprietary and communication at the graphical level between two different Unix platforms was generally not possible. This situation was remedied when the Athena Project at MIT provided a standard platform independent graphical interface for Unix and other operating systems that can be used over a network. It is usually referred to as the X Window System , X11 , or simply X . In addition, it decouples display and execution, so that a remote graphics program can be run in a local graphics window. The X Window System has continued to this day to be the main graphical interface for Unix. It was primarily enhanced and distributed commercially by the X Consortium, whose membership included most of the major vendors (including Compaq, Hewlett Packard, Hummingbird, IBM, Silicon Graphics, and Sun). These duties have been transferred to The Open Group (which includes the above vendors but which is a larger consortium). The Open Group also licenses commercial products such as the Motif interface for the X Window System and the Common Desktop Environment (CDE). The licensing does allow the existence of free implementations such as XFree86 (which is usually packaged with Linux). Although originally developed for Unix, X is flexible enough to be used in conjunction with proprietary windowing systems (e.g. Hummingbird's Exceed product family for use with Windows 95/98/NT), thereby allowing Unix and non-Unix graphical exchange and connectivity.

The most significant development in the 1990's has been the availability of a freely distributable Unix operating system called Linux. Linux was originally ported only to the 80386 PC platform. By 1999 Linux had matured and was available on a wide range of platforms, from personal computers (with Intel Pentium family processors) to a number of RISC processor machines (e.g. Alpha processor machines, Sun SPARC processor machines, and MIPS processor machines), and supported a very wide range of boards and other hardware devices.

Linux was developed primarily by Linus Torvalds but a key factor which made possible its early release in 1991 and subsequent popularity was the suite of GNU tools. Linux also has debts to Minix, notwithstanding design disagreements between Tanenbaum and Torvalds over issues of portability and kernel type (microkernel versus monolithic). An interesting repartee took place in 1992 in the Usenet News, with several other experts joining in; it is still worth reading.

Finally, it should be noted that although sources of information on Unix are legion, one finds information scattered among articles, books, web pages, circulated notes, programming handouts, source code, Usenet news posts, and folklore. One also finds an element of strong opinion in most of these sources. This is probably due to the fact that almost all of the pioneers were programmers, interested in their subject, somewhat partisan, not inclined to waste words (nor suffer fools gladly). While this is somewhat refreshing in an industry which suffers from excessive marketing hype, it does, however, present a hurdle to those users who are interested in Unix but come from a non-technical background. It also makes it difficult to compile a static and stable list of sources in the same way that one does for a research article in a professional journal. This will be evident from the Bibliography given at the end of this article.


I. Interactive Multiuser Operating Systems

Before discussing the features and internals of UNIX it is necessary to first have a general discussion of interactive multi-user operating systems. From the perspective of the users, any modern operating system, if it wishes to be competitive, will have to deal with all of the following issues:

Satisifying user requirements and expectations is a necessary step in operating system design. Other considerations are the introduction of new hardware, the scalability of the design, the long-range stability of the platform as unanticipated changes force modifications of the design, and the demands which will be put upon the system administrators who will have to manage a system with growing complexity. When these factors are considered, one can list a number of desirable and more specific design features:

Each of the above five areas will be discussed in a separate section with attention given to ways in which the Unix operating system is distinctive or unique (see Table I).

The last section will discuss users' reactions to the Unix operating system as compared to non-Unix operating systems.

Table I. Unix System Architecture (for a monolithic kernel)

Kernel
Memory

User Memory

User Interface

kernel
daemons
user processes

(fork, exec, wait, exit, SIGCHLD cycle)
Shells:
sh
csh
ksh
zsh
tcsh
bash

and

X Window
System
Interface

Abstraction Layer

descriptors for files, pipes, network sockets..

Hierachical
File Store

files
pipes
process tree
sockets
and other
devices
/proc /etc /bin /sys /dev /mnt /usr ...


II. The Design Objectives of Unix, File-store Organization, Text Processing, and Programming

A simplified point of view is that a computer runs programs in main memory and a program accepts data in the form of input and produces results as some form of output. Since main memory is very limited and generally volatile , it was realized very early on that some form of secondary storage would be needed to keep the input data and store the output results. The modern secondary storage device of choice is usually some type of disk drive. It may have magnetic media, optical media, or magneto-optical media. The media may be removeable or fixed. Disk drives themselves are usually divided into smaller chunks called partitions . Usually, the data is organized into files according to some format within some partition. The files are usually grouped into directories (which may have subdirectories , so there is usually a hierarchy of directories). Since even a small system may contain multiple disk drives, it is necessary to organize the local file structures on each partition of each disk drive into some global logical structure which will be clear and convenient from the user's point of view.

Many non-Unix operating systems still require that a file be fully referenced by a name which tells which partition (e.g. "C:\winnt\system32\cmd.exe" where "C:" indicates the partition) or which physical device (e.g. "DUA1:[faculty.smith]memorandum.txt" where "DUA1:" indicates the formatted drive) the file is on.

In contrast to this physical path naming Unix assumes that the directory structure is hierarchical (i.e. a tree , with the forward slash "/" used as a separator) where partitions have been mounted on various nodes or subdirectories. Mounting is the process whereby the local file structure on a partition gets mapped into the global file tree. For example, one partition which has a local file named quota might be mounted on /usr/faculty and another partition which has a (different) local file named quota might be mounted on /usr/student . The first file is referred to by the logical path name /usr/faculty/quota and the second by /usr/student/quota . Neither faculty members nor students need to know which physical partition these files are on because they do their work with respect to the logical paths rather than the physical paths. In addition, Unix allows symbolic links for user convenience. For example, the directory /database may actually be symbolically linked to a very long logical path name, something like /usr/local/applications/7.5.6/database , but the symbolic path makes it easier for users to reference and work with the files in the directory. This is a feature which some newer operating systems have only recently implemented (c.f. the Single Instance Store in Microsoft Windows 2000, which is a slight modification of a symbolic link).

More importantly, the designers of Unix, as well as those who have modified the system over the years, have consistently mapped all system devices into the file store as well. For example, the /dev directory contains each physical device, raw disk drives, formatted partitions, keyboard, mouse, printer ports, serial ports, network interfaces, terminals, etc. Many of these devices require root privileges for access, but their presence in the file store makes writing system software much easier. More recent Unix ports include a /proc directory so that system configuration and process activity can be gathered very easily. In addition, the fact that most of the information about the running system can be obtained from the file store (rather than, for example, having to give students executables marked suid root which will be able to read and decode kernel memory) makes Unix an excellent platform for teaching operating system principles.

Unix attempts to have a consistent logical and hierarchical file store which unifies the organization of, not only the files, but all of the physical devices in the system, and an application program gains access to one of these (file, pipe, network socket, etc.) by making a system call and obtaining a descriptor for it, thereby masking hardware differences.

A related design aspect of Unix is the use of text files whenever possible. Every operating system needs to store its configuration information somewhere. Many operating systems put configuration information in binary files (for example, so called "registry" files). If this information becomes corrupted, special utilities are usually needed to correct or restore it and a utility is needed even to read the information.

With the exception of password encryption files on some systems, all configuration files in Unix are human readable text files . In addition, the scripts which are used to build the Unix kernel are also text files. This has the following benefits:

In addition, Unix provides a selection of shells with powerful scripting capabilities (e.g. sh, csh, ksh, bash ) and auxiliary text processing utilties (e.g. awk, eval, find, grep, read, sed, sort, test ) so that many times one can combine these and accomplish a task in a fraction of the time needed to write a program to do the same task. This is aided by shell constructs which support input and output redirection and piping the output of one program to the input of another.

The ability to do powerful and flexible document preparation was a feature of Unix from the start. All Unix systems support nroff for basic document formatting and the on-line manual pages, but many users have gone to the markup languages TeX and LATeX , especially if mathematical constructs such as equations, subscripts, and special symbols are required.

One primary design goal of Unix was to support research in computer science, so that many things in Unix are designed for the convenience of the programmer. The make utility is invaluable for keeping object code for the various modules up to date. This utility has proved so useful it is almost always implemented in compiler packages even for non-Unix platforms. For more elaborate software development, the Concurrent Versions System cvs is available from GNU. It has been very important in the development of the BSD Unix systems. Finally, there are tools for lexical analysis ( lex ) and syntactical analysis ( yacc ). This scripting, text processing, and computer science developing environment is available to both system users and general users.


III. The Role of the C Programming Language with Regard to Portability and Reliable System Software

The common practice, prior to Unix, was for operating systems to be written in the assembly language of the given processor line. This virtually assured that the resulting code would not be portable to other systems unless the hardware was identical. One reason this was done was to keep the operating system code efficient and ensure that not too large a percentage of time was spent executing system calls (as opposed to running user programs). Most compilers for high level languages did not produce code efficient enough to compete with that which an assembly language programmer could write. In addition, most high level languages did not even support the low-level bit operations, arrays of typeless pointers, and other constructs all operating system code requires.

Ken Thompson had written a typeless language called "B" in 1970 for the first Unix system on the Digital Equipment PDP-7. It owed various features to another typeless language called "BCPL" which had been written in 1967 by Martin Richards. With these influences Dennis Ritchie designed the C language to be essentially hardware independent and having the features one needs for writing portable operating system code. Ritchie wrote the first C compiler in 1972 and implemented it on a Digital Equipment PDP-11. From 1973 on, all Unix operating systems were essentially written in C (with some assembly language in the low-level device drivers).

It is the following specific features of C which are of primary importance for writing portable and efficient operating system code:

C revolutionized not only the writing of the Unix operating system, but the writing of all operating systems. With the standardization of C as ANSI C in 1983, other than in a few cases where processor power and memory space were very limited (for example, the 80286 port of OS/2), it has been used to write most new operating systems introduced in the last 15 years.


IV. Process Control: Signals and Fork

One of the responsibilities of an operating system is to manage both the system and user processes which do the work of the system. It is necessary to have clean, efficient mechanisms for creating, controlling, and ending processes and to perform these operations at the correct times. A process always executes in the context of some environment. The environment includes variables for terminal type, paths used to search for link libraries and executable files, and system information, A process has a data area in which to store important current information (more precisely, a process has local data storage in a stack area and global data storage in a data area, but this distinction will not be needed for the following discussion). It has descriptors which are handles to files or devices. Finally, it has a process id which is a unique number used to reference it. All of this information, together with the current point of execution and register contents, constitutes the state of the process.

Suppose, for example, that it is desired to have a process handling non-authenticated requests made over a TCP/IP network such as the Internet (for example, the process, or daemon might be handling world wide web requests). It will probably be a process with system privileges so when a request is received (for example, to view a particular home page) this parent process will generally create a child process which has no special privileges (that is, it will be assigned to a non-root user such as nobody and may be restricted to accessing only a portion of the full file store) that will handle the request. In order to handle the request it is usually the case that some of the information known by the parent will have to be copied to the data area of the child. When the request has been fully satisfied, the child process will end itself and terminate in such a way that the parent will be informed that it has finished its task. The parent process will have to keep track of how many active child processes it has running at any given time so that system resources (such as the number of open network sockets, buffer space, and system load) are not exceeded.

Under Unix the above sequence of events would be handled as follows:

The above sequence is efficient and elegant but it may not be clear how powerful it is at first glance until one compares it to systems which lack this fork, exec, wait, exit, SIGCHLD cycle.

Many non-Unix systems do not use the fork and exec calls and substitute a spawn system call instead. A complication is that spawn allows starting a new program but does not provide an automatic mechanism for the parent to pass its environment, descriptors or data to the spawned offspring. Consequently, spawn does not recapture the full power of fork (not even the full power of fork ... exec ). In order to pass information from its environment, descriptors, or data to a spawned offspring it is necessary either to encode the information as text and pass it on the command line or use some operating system message passing construct, thereby increasing overhead for the call. This greatly limits how much information can be shared between parent and child processes.

Consequently, operating systems which substitute the spawn call tend to use multiple threads in the daemon rather than spawned offspring whever the task requires much shared data. While this requires less operating system overhead it is also inherently less secure because

But using threads is not an option confined to non-Unix operating systems. Almost all modern versions of Unix support standard POSIX thread calls. The modern practice in Unix is to use a parent process with multiple child processes if security is the primary concern and to use multiple threads in the parent process if efficiency is the primary concern.

In summary, one can say that the Unix process control constructs, specifically the fork, exec, wait, exit, SIGCHLD cycle together with the option of POSIX threads, allows maximum flexibility with regard to all aspects of process control.


V. Error Logging and Recovery From System Failures

All systems crash occasionally. Very often this happens due to the upgrading of some hardware or some unforseen interaction between privileged processes. Unix supports the following features which help to speed recovery from system failures:

Unix recovery procedures are quite robust by comparison to what is available from non-Unix operating systems. This has certainly been a factor when decisions have been made in choosing a server platform. It has not been as much of a factor in the case of client platforms for the following reason. Over the past decade, rapid growth of small networks has resulted in a shortage of experienced system administrators who are able to diagnose and recover a crashed system. In the case of a client machine used primarily by a single user it may be more cost effective to simply reload and rebuild the operating system, especially if there are a large number of identical clients at the site. This has the advantage that it does not require nearly as much expertise from the system administrator; it has the disadvantage that some problems are never really solved.


VI. Modifiability and Application Programmer Interface (API)

Most languages have a standard run time library for routine services such as text and numeric input and output, randon number generation, mathematical functions, and text string operations. In this respect ANSI C is typical. One problem for the application programmer is that anything else may be platform dependent. A classic example of platform dependent code is graphics software where, at the current time, a number of competing standards exist (e.g. Silicon Graphic's OpenGL, the Graphical Kernel System, and others). This is primarily due to the continual upgrading of graphics display hardware to speed up 2-dimensional and 3-dimensional drawing, texturing, and the use of more sophisticated lighting schemes.

We could make a table of additional types of functions which the application programmer needs but which may be operating system dependent (see Table II).

Table II. Unix Application Programmer Function Calls

Category Sample Function Calls and Constructs
ANSI C

ANSI C
Standard I/O

fprintf fscanf fgets fputs
unlink rewind fseek
Not in ANSI C
POSIX

Advanced File
and Directory Ops.

mkdir rmdir opendir readdir
closedir rewinddir dup dup2

Special
Terminal I/O

tcsetattr tcgetattr
the termios struct

Run-time
Information

getenv time localtime ctime
sysconf pathconf

Process
Control

fork exec wait exit
and signal handling
Not covered by ANSI C, POSIX
BSD Sockets

Networking
under TCP/IP

socket connect listen accept
bind send sentto recv
recvfrom
Not Covered by ANSI C, POSIX, BSD Sockets
X Window System

2D Raster Graphics
and Windowing

X11 function calls
Not Covered by ANSI C, POSIX, BSD Sockets, X Window System
Proprietary

3D Graphics,
other Graphics

OpenGL, GKS, and a variety of
other graphics packages

Historically it has always been the case that software standardization has lagged behind improvements in hardware functionality. This presents a dilemma to the applications programmer who would like to incorporate new functionality into software products as soon as possible while at the same time cover as many different platforms as possible.

Unix is not perfect in this respect, but, with the POSIX, BSD Sockets, and X Window System suite of system calls, has worked more for standardization than most platforms. Unix has certainly also provided a much richer environment of function calls than most other platforms.


VII. The User's Perspective on Unix

In the end, it is usually software costs, maintenance costs, and user preferences which decide whether one operating system is adopted rather than another. The history of computing contains many examples where a technically superior solution did not succeed in the marketplace.

This is pertinent to Unix, which, although more powerful and flexible than most other operating systems, has a steep initial learning curve. This more than any other fact has worked against acceptance of Unix by the general, non-technical user.

Users who are proficient with programming and who need to modify their systems to adapt to special needs generally prefer Unix to non-Unix operating systems. This is not surprising given the original intent of the designers that Unix be suitable for all tasks from programming on up to doing research in computer science.

However, general users, including those who insist that "I just want to get my work done" almost always prefer proprietary operating systems where the various functional units of the machine are all integrated into one desktop and many standard operations (in some configurations, only standard operations) can be done by pointing and clicking with a mouse.

This trend toward hiding implementation details has spawned much confusion in the popular press, even to the effect that, for example, Windows NT has been described as a point and click system and Unix as a command line system. A right click on any program icon, to Properties and then to the Shortcut tab reveals a command line in the Target box, showing what a delusion this point of view is. The real issue is that many users prefer an elaborate graphical interface which will, in effect, type the command line for them when they double-click on the program icon, will allow them to move files between directories by dragging icons with the mouse, and will initiate standard actions by clicking on a data icon. The X Window System, with a variety of choices of desktop (Motif, CDE, and others) and window manager, will provide this functionality under Unix, but it seems that many users find X harder to get accustomed to than many proprietary windowing systems.

If one considers the history of the automobile it is certainly the case that there are some parallels. It took almost seventy five years for the general motorist to simply be able to drive an automobile with a standard control system, and not need to know anything about the underlying principles such as the thermodynamics of the Carnot cycle or the use of negative feedback in order to stabilize the handling.

Electrical systems are inherantly more difficult to stabilize than mechnical ones and software failure is at least an order of magnitude more common than hardware failure. Even allowing for the general speedup of technological development during the 20th century, it is probably too much to ask that computing be in the same situation of having stable, standard interfaces after only fifty years. Consequently, the prudent purchaser should demand flexibility with regard to system software, even if it requires learning something about the underlying principles of computer hardware and system software.


Bibliography

Note that all URL's given below were verified at the time of publication but cannot, of course, be guaranteed indefinitely into the future.