Difference between revisions of "In UNIX Everything is a File"
(Created page with "[https://unix.stackexchange.com/users/138/warren-young Warren Young] brilliantly wrote in 2014 [https://unix.stackexchange.com/questions/141016/a-laymans-explanation-for-every...") |
(No difference)
|
Revision as of 11:27, 10 January 2020
Warren Young brilliantly wrote in 2014 this dissertation which, given him due credit, will be mirrored here and sourced.
"Everything is a file" is a bit glib. "Everything appears somewhere in the filesystem" is closer to the mark, and even then, it's more an ideal than a law of system design.
For example, Unix domain sockets are not files, but they do appear in the filesystem. You can ls -l a domain socket to display its attributes, cat data to/from one, modify its access control via chmod, etc.
But, even though regular TCP/IP network sockets are created and manipulated with the same BSD sockets system calls as Unix domain sockets, TCP/IP sockets do not show up in the filesystem,¹ even though there is no especially good reason that this should be true.
Another example of non-file objects appearing in the filesystem is Linux's /proc filesystem. This feature exposes a great amount of detail about the kernel's run-time operation to user space, mostly as virtual plain text files. Many /proc entries are read-only, but a lot of /proc is also writeable, so you can change the way the system runs using any program that can modify a file. Alas, here again we have a nonideality: BSD type Unixes generally run without /proc, and the System V Unixes expose a lot less via /proc than Linux does.
I can't contrast that to MS Windows
First, much of the sentiment you can find online and in books about Unix being all about file I/O and Windows being "broken" in this regard is obsolete. Windows NT fixed a lot of this.
Modern versions of Windows have a unified I/O system, just like Unix, so you can read network data from a TCP/IP socket via ReadFile() rather than the Windows Sockets specific API WSARecv(), if you want to. This exactly parallels the Unix Way, where you can read from a network socket with either the generic read(2) Unix system call or the sockets-specific recv(2) call.²
Nevertheless, Windows still fails to take this concept to the same level as Unix, even here in 2018. There are many areas of the Windows architecture that cannot be accessed through the filesystem, or that can't be viewed as file-like. Some examples:
1. Drivers
Windows' driver subsystem is easily as rich and powerful as Unix's, but to write programs to manipulate drivers, you generally have to use the Windows Driver Kit, which means writing C or .NET code.
On Unix type OSes, you can do a lot to drivers from the command line. You've almost certainly already done this, if only by redirecting unwanted output to /dev/null.³
2. Inter-program communication.
Windows programs don't communicate easily with each other.
Unix command line programs communicate easily via text streams and pipes. GUI programs are often either built on top of command line programs or export a text command interface, so that the same simple text-based communication mechanisms work with GUI programs, too.
3. The registry
Unix has no direct equivalent of the Windows registry. The same information is scattered through the filesystem, most of it in /etc, /proc and /sys.
If you don't see that drivers, pipes, and Unix's answer to the Windows registry have anything to do with "everything is a file," read on.
How does the "Everything is a file" philosophy make a difference here?
I will explain that by expanding on my three points above, in detail.
Long answer, part 1: Drives vs Device Files
Let's say your CF card reader appears as E: under Windows and /dev/sdc under Linux. What practical difference does it make?
It is not just a minor syntax difference.
On Linux, I can say dd if=/dev/zero of=/dev/sdc to overwrite the contents of /dev/sdc with zeroes.
Think about what that means for a second. Here I have a normal user space program (dd(1)) that I asked to read data in from a virtual device (/dev/zero) and write what it read out to a real physical device (/dev/sdc) via the unified Unix filesystem. dd doesn't know it is reading from and writing to special devices. It will work on regular files just as well, or on a mix of devices and files, as we will see below.
There is no easy way to zero the E: drive on Windows, because Windows makes a distinction between files and drives, so you cannot use the same commands to manipulate them. The closest you can get is to do a disk format without the Quick Format option, which zeroes most of the drive contents, but then writes a new filesystem on top of it. What if I don't want a new filesystem? What if I really do want the disk to be filled with nothing but zeroes?
Let's be generous and say that we really do want a fresh new filesystem on E:. To do that in a program on Windows, I have to call a special formatting API.⁴ On Linux, you don't need to write a program to access the OS's "format disk" functionality. You just run the appropriate user space program for the filesystem type you want to create: mkfs.ext4, mkfs.xfs, or what have you. These programs will write a filesystem onto whatever file or /dev node you pass.
Because mkfs type programs on Unixy systems work on files without making artificial distinctions between devices and normal files, it means I can create an ext4 filesystem inside a normal file on my Linux box:
$ dd if=/dev/zero of=myfs bs=1k count=1k $ mkfs.ext4 -F myfs
That literally creates a 1 MiB disk image in the current directory, called myfs. I can then mount it as if it were any other external filesystem:
$ mkdir mountpoint $ sudo mount -o loop myfs mountpoint $ grep $USER /etc/passwd > mountpoint/my-passwd-entry $ sudo umount mountpoint
Now I have an ext4 disk image with a file called my-passwd-entry in it which contains my user's /etc/passwd entry.
If I want, I can blast that image onto my CF card:
$ sudo dd if=myfs of=/dev/sdc1
Or, I can pack that disk image up, mail it to you, and let you write it to a medium of your choosing, such as a USB memory stick:
$ gzip myfs $ echo "Here's the disk image I promised to send you." |
mutt -a myfs.gz -s "Password file disk image" you@example.com
All of this is possible on Linux⁵ because there is no artificial distinction between files, filesystems, and devices. Many things on Unix systems either are files, or are accessed through the filesystem so that they look like files, or in some other way look sufficiently file-like that they can be treated as such.
Windows' concept of the filesystem is a hodgepodge; it makes distinctions between directories, drives, and network resources. There are three different syntaxes, all blended together in Windows: the Unix-like ..\FOO\BAR path system, drive letters like C:, and UNC paths like \\SERVER\PATH\FILE.TXT. This is because it's an accretion of ideas from Unix, CP/M, MS-DOS, and LAN Manager, rather than a single coherent design. It is why there are so many illegal characters in Windows file names.
Unix has a unified filesystem, with everything accessed by a single common scheme. To a program running on a Linux box, there is no functional difference between /etc/passwd, /media/CF_CARD/etc/passwd, and /mnt/server/etc/passwd. Local files, external media, and network shares all get treated the same way.⁶
Windows can achieve similar ends to my disk image example above, but you have to use special programs written by uncommonly talented programmers. This is why there are so many "virtual DVD" type programs on Windows. The lack of a core OS feature has created an artificial market for programs to fill the gap, which means you have a bunch of people competing to create the best virtual DVD type program. We don't need such programs on *ix systems, because we can just mount an ISO disk image using a loop device.
The same goes for other tools like disk wiping programs, which we also don't need on Unix systems. Want your CF card's contents irretrievably scrambled instead of just zeroed? Okay, use /dev/random as the data source instead of /dev/zero:
$ sudo dd if=/dev/random of=/dev/sdc
On Linux, we don't keep reinventing such wheels because the core OS features not only work well enough, they work so well that they're used pervasively. A typical scheme for booting a Linux box involves a virtual disk image, for just one example, created using techniques like I show above.⁷
I feel it's only fair to point out that if Unix had integrated TCP/IP I/O into the filesystem from the start, we wouldn't have the netcat vs socat vs Ncat vs nc mess, the cause of which was the same design weakness that lead to the disk imaging and wiping tool proliferation on Windows: lack of an acceptable OS facility.
Long Answer, part 2: Pipes as Virtual Files
Despite its roots in DOS, Windows never has had a rich command line tradition.
This is not to say that Windows doesn't have a command line, or that it lacks many command line programs. Windows even has a very powerful command shell these days, appropriately called PowerShell.
Yet, there are knock-on effects of this lack of a command-line tradition. You get tools like DISKPART which is almost unknown in the Windows world, because most people do disk partitioning and such through the Computer Management MMC snap-in. Then when you do need to script the creation of partitions, you find that DISKPART wasn't really made to be driven by another program. Yes, you can write a series of commands into a script file and run it via DISKPART /S scriptfile, but it's all-or-nothing. What you really want in such a situation is something more like GNU parted, which will accept single commands like parted /dev/sdb mklabel gpt. That allows your script to do error handling on a step-by-step basis.
What does all this have to do with "everything is a file"? Easy: pipes make command line program I/O into "files," of a sort. Pipes are unidirectional streams, not random-access like a regular disk file, but in many cases the difference is of no consequence. The important thing is that you can attach two independently-developed programs and make them communicate via simple text. In that sense, any two programs designed with the Unix Way in mind can communicate.
In those cases where you really do need a file, it is easy to turn program output into a file:
$ some-program --some --args > myfile $ vi myfile
But why write the output to a temporary file when the "everything is a file" philosophy gives you a better way? If all you want to do is read the output of that command into a vi editor buffer, vi can do that for you directly. From the vi "normal" mode, say:
- r !some-program --some --args
That inserts that program's output into the active editor buffer at the current cursor position. Under the hood, vi is using pipes to connect the output of the program to a bit of code that uses the same OS calls it would use to read from a file instead. I wouldn't be surprised if the two cases of :r — that is, with and without the ! — both used the same generic data reading loop in all common implementations of vi. I can't think of a good reason not to.
This isn't a recent feature of vi, either; it goes clear back to the ancient ed(1) text editor.⁸
This powerful idea pops up over and over in Unix.
For a second example of this, recall my mutt email command above. The only reason I had to write that as two separate commands is that I wanted the temporary file to be named *.gz, so that the email attachment would be correctly named. If I didn't care about the file's name, I could have used process substitution to avoid creating the temporary file:
$ echo "Here's the disk image I promised to send you." |
mutt -a <(gzip -c myfs) -s "Password file disk image" you@example.com
That avoids the temporary by turning the output of gzip -c into a FIFO (which is file-like) or a /dev/fd object (which is file-like). (Bash chooses the method based on the system's capabilities, since /dev/fd isn't available everywhere.)
For yet a third way this powerful idea appears in Unix, consider gdb on Linux systems. This is the debugger used for any software written in C and C++. Programmers coming to Unix from other systems look at gdb and almost invariably gripe about it, "Yuck, it's so primitive!" Then they go searching for a GUI debugger, find one of several that exist, and happily continue their work...often never realizing that the GUI just runs gdb underneath, providing a pretty shell on top of it. There aren't competing low-level debuggers on most Unix systems because there is no need for programs to compete at that level. All we need is one good low-level tool that we can all base our high-level tools on, if that low-level tool communicates easily via pipes.
This means we now have a documented debugger interface which would allow drop-in replacement of gdb, but unfortunately, the primary competitor to gdb didn't take the low-friction path.
Still, it is at least possible that some future gdb replacement would drop in transparently simply by cloning its command line interface. To pull the same thing off on a Windows box, the creators of the replaceable tool would have had to define some kind of formal plugin or automation API. That means it doesn't happen except for the very most popular programs, because it's a lot of work to build both a normal command line user interface and a complete programming API.
This magic happens through the grace of pervasive text-based IPC.
Although Windows' kernel has Unix-style anonymous pipes, it's rare to see normal user programs use them for IPC outside of a command shell, because Windows lacks this tradition of creating all core services in a command line version first, then building the GUI on top of it separately. This leads to being unable to do some things without the GUI, which is one reason why there are so many remote desktop systems for Windows, as compared to Linux: Windows is very hard to use without the GUI.
By contrast, it's common to remotely administer Unix, BSD, OS X, and Linux boxes remotely via SSH. And how does that work, you ask? SSH connects a network socket (which is file-like) to a pseudo tty at /dev/pty* (which is file-like). Now your remote system is connected to your local one through a connection that so seamlessly matches the Unix Way that you can pipe data through the SSH connection, if you need to.
Are you getting an idea of just how powerful this concept is now?
A piped text stream is indistinguishable from a file from a program's perspective, except that it's unidirectional. A program reads from a pipe the same way it reads from a file: through a file descriptor. FDs are absolutely core to Unix; the fact that files and pipes use the same abstraction for I/O on both should tell you something.⁹
The Windows world, lacking this tradition of simple text communications, makes do with heavyweight OOP interfaces via COM or .NET. If you need to automate such a program, you must also write a COM or .NET program. This is a fair bit more difficult than setting up a pipe on a Unix box.
Windows programs lacking these complicated programming APIs can only communicate through impoverished interfaces like the clipboard or File/Save followed by File/Open.
Long Answer, part 3: The Registry vs Configuration Files
The practical difference between the Windows registry and the Unix Way of system configuration also illustrates the benefits of the "everything is a file" philosophy.
On Unix type systems, I can look at system configuration information from the command line merely by examining files. I can change system behavior by modifying those same files. For the most part, these configuration files are just plain text files, which means I can use any tool on Unix to manipulate them that can work with plain text files.
Scripting the registry is not nearly so easy on Windows.
The easiest method is to make your changes through the Registry Editor GUI on one machine, then blindly apply those changes to other machines with regedit via *.reg files. That isn't really "scripting," since it doesn't let you do anything conditionally: it's all or nothing.
If your registry changes need any amount of logic, the next easiest option is to learn PowerShell, which basically amounts to learning .NET system programming. It would be like if Unix only had Perl, and you had to do all ad hoc system administration through it. Now, I'm a Perl fan, but not everyone is. Unix lets you use any tool you happen to like, as long as it can manipulate plain text files.
Footnotes:
Plan 9 fixed this design misstep, exposing network I/O via the /net virtual filesystem.
Bash has a feature called /dev/tcp that allows network I/O via regular filesystem functions. Since it is a Bash feature, rather a kernel feature, it isn't visible outside of Bash or on systems that don't use Bash at all. This shows, by counterexample, why it is such a good idea to make all data resources visible through the filesystem.
By "modern Windows," I mean Windows NT and all of its direct descendants, which includes Windows 2000, all versions of Windows Server, and all desktop-oriented versions of Windows from XP onward. I use the term to exclude the DOS-based versions of Windows, being Windows 95 and its direct descendants, Windows 98 and Windows ME, plus their 16-bit predecessors.
You can see the distinction by the lack of a unified I/O system in those latter OSes. You cannot pass a TCP/IP socket to ReadFile() on Windows 95; you can only pass sockets to the Windows Sockets APIs. See Andrew Schulman's seminal article, Windows 95: What It's Not for a deeper dive into this topic.
Make no mistake, /dev/null is a real kernel device on Unix type systems, not just a special-cased file name, as is the superficially equivalent NUL in Windows.
Although Windows tries to prevent you from creating a NUL file, it is possible to bypass this protection with mere trickery, fooling Windows' file name parsing logic. If you try to access that file with cmd.exe or Explorer, Windows will refuse to open it, but you can write to it via Cygwin, since it opens files using similar methods to the example program, and you can delete it via similar trickery.
By contrast, Unix will happily let you rm /dev/null, as long as you have write access to /dev, and let you recreate a new file in its place, all without trickery, because that dev node is just another file. While that dev node is missing, the kernel's null device still exists; it's just inaccessible until you recreate the dev node via mknod.
You can even create additional null device dev nodes elsewhere: it doesn't matter if you call it /home/grandma/Recycle Bin, as long as it's a dev node for the null device, it will work exactly the same as /dev/null.
There are actually two high-level "format disk" APIs in Windows: SHFormatDrive() and Win32_Volume.Format().
There are two for a very...well...Windows sort of reason. The first one asks Windows Explorer to display its normal "Format Disk" dialog box, which means it works on any modern version of Windows, but only while a user is interactively logged in. The other you can call in the background without user input, but it wasn't added to Windows until Windows Server 2003. That's right, core OS behavior was hidden behind a GUI until 2003, in a world where Unix shipped mkfs from day 1.
My copy of Unix V5 from 1974 includes /etc/mkfs, a 4136 byte statically-linked PDP-11 executable. (Unix didn't get dynamic linkage until the late 1980s, so it's not like there's a big library somewhere else doing all the real work.) Its source code — included in the V5 system image as /usr/source/s2/mkfs.c — is an entirely self-contained 457-line C program. There aren't even any #include statements!
This means you can not only examine what mkfs does at a high level, you can experiment with it using the same tool set Unix was created with, just like you're Ken Thompson, four decades ago. Try that with Windows. The closest you can come today is to download the DOS source code, first released in 2014, which you find amounts to just a pile of assembly sources. It will only build with obsolete tools you probably won't have on-hand, and in the end you get your very own copy of DOS 2.0, an OS far less powerful than 1974's Unix V5, despite its being released nearly a decade later.
(Why talk about Unix V5? Because it is the earliest complete Unix system still available. Earlier versions are apparently lost to time. There was a project that pieced together a V1/V2 era Unix, but it appears to be missing mkfs, despite the existence of the V1 manual page linked above proving it must have existed somewhere, somewhen. Either those putting this project together couldn't find an extant copy of mkfs to include, or I suck at finding files without find(1), which also doesn't exist in that system. :))
Now, you might be thinking, "Can't I just call format.com? Isn't that the same on Windows as calling mkfs on Unix?" Alas, no, it isn't the same, for a bunch of reasons:
First, format.com wasn't designed to be scripted. It prompts you to "press ENTER when ready", which means you need to send an Enter key to its input, or it'll just hang.
Then, if you want anything more than a success/failure status code, you have to open its standard output for reading, which is far more complicated on Windows than it has to be. (On Unix, everything in that linked article can be accomplished with a simple popen(3) call.)
Having gone through all this complication, the output of format.com is harder to parse for computer programs than the output of mkfs, being intended primarily for human consumption.
If you trace what format.com does, you find that it does a bunch of complicated calls to DeviceIoControl(), ufat.dll, and such. It is not simply opening a device file and writing a new filesystem onto that device. This is the sort of design you get from a company that employs 126000 people, and needs to keep employing them.
When talking about loop devices, I talk only about Linux rather than Unix in general because loop devices aren't portable between Unix type systems. There are similar mechanisms in OS X, BSD, etc., but the syntax varies somewhat.
Back in the days when disk drives were the size of washing machines and cost more than the department head's luxury car, big computer labs would share a larger proportion of their collective disk space as compared to modern computing environments. The ability to transparently graft a remote disk into the local filesystem made such distributed systems far easier to use. This is where we get /usr/share, for instance.
Contrast Windows, where a remote disk is typically either mapped to a drive letter or must be accessed through a UNC path, rather than integrated transparently into the local filesystem. Drive letters offer you few choices for symbolic expression; does P: refer to the "public" space on BigServer, or to the "packages" directory on the software mirror server? UNC paths mean you have to remember which server your remote files are on, which gets difficult in a large organization with hundreds or thousands of file servers.
Windows didn't get symlinks until Windows Vista, released in 2007, which introduced NTFS symbolic links. Windows' symbolic links are a bit more powerful than Unix's symbolic links — a feature of Unix since since 1977 — in that they can also point to a remote file share, not just to a local path. Unix did that differently, via NFS in 1984, which builds on top of Unix's preexisting mount point feature, which it has had since the beginning.
So, depending on how you look at it, Windows trailed Unix by roughly 2 or 3 decades.
Even then, symbolic links aren't a normal part of a Windows user's experience, for a couple of reasons.
First, you can only create them with the backward command line program MKLINK. You can't create them from Windows Explorer, whereas the Unix equivalents to Windows Explorer typically do let you create symlinks.
Second, the default Windows configuration prevents normal users from creating symbolic links, requiring that you either run the command shell as Administrator or give the user permission to create them via an obscure path in a tool your average user has never even seen, much less knows how to use. (And unlike with most Admin privilege problems in Windows, UAC is of no help in this case.)
Linux boxes don't always use a virtual disk image in the boot sequence. There are a bunch of different ways to do it.
man ed
Network socket descriptors are FDs underneath, too, by the way.