19.1 The Directory Data Structure

Each file must have at least one name. A file may have more than one distinct name, but the same name may not be shared by tw distinct files, i.e. each name must define a unique file.

A name may be multipart. When written, the parts or components of the name are separated by slashes (“/”). The order of components within a name is significant i.e. “a/b/c” is different from “a/c/b”.

If file names are divided into two parts: an initial part or “stem” and a final part or “ending”, then two files whose names have identical stems are usually relate in some way. They may reside on the same disk, they may belong to the same user, etc. Users make initial reference to files by quoting the file name, e.g. in the “open” system call. An important operating sysem function is to decode the name into the corresponding “inode” entry. To do this, UNIX creates and maintains a directory data structure. This structure is equivalent to a directed graph with named edges.

In its purest form, the graph is a tree i.e. it has a single root node, with exactly one path between the root and any node. More commonly in UNIX (but not so commonly in other operating systems) the graph is a lattice which may be obtained from a tree by coalescing one or more groups of leaves.

In this case, while there is still only one path between the root and any interior node, thee may be more than one path between the root and a leaf. Leaves are nodes without successors and correspond to data files. Interior nodes are nodes with successors and correspond to directory files.

The name for a file is obtained from the names of the edges of the path between the root and the node corresponding to the file. (For this reason, the name is often referred to as a “pathname”.) If there are several paths, then the file has several names.