Title: Practice of computing Author: Alexander Arkhipov Created: Before 2023-06-24 Modified: 2024-03-23 In this small text I discuss the choice of software, file hierarchies, and programming style. I find that these two topics are among the ones where our various educators (themselves influenced by similar misdesign and misdirection) cripple us the most, perhaps never even explicitly, or intentionally. SOFTWARE CHOICE In the year of 2023 people shouldn't have to use proprietary operating systems, or any other proprietary software. The few cases where the use of proprietary software is justified that I can think of are: - You are trying to reverse-engineer something. - You are being forced to -- that usually occurs when some "boss-man" tries to get everyone to some lowest common denominator. If you are reading this, you are probably better than that, and may wish to leave this organisation (or whatever it is) soon. - You are forced to use weird hardware that requires weird software. This one really sucks because it often has to do with international politics, and lacks a common-sense solution. In the latter two situation you can often get somewhat around the restriction by running the untrusted-proprietary bits in a virtual machine. Even if you can't use a VM for everything, there's usually still nothing stopping you from running mostly-free software on your hardware. When I first started using free software, I used to play a lot of computer games, and had to run Micro$oft's office software (bloody university didn't accept my LaTeX reports!). I kept the latter mostrosity on a VM, and played to my heart's content using wine. Later, I stopped using the office because I graduated, and I stopped playing games because I found them so terribly boring compared to all the things that I could do now. FILE HIERARCHIES This document focuses on maintaining good file hierarchies under Unices, and so many consequent presumptions and opinions are taken. File hierarchies are hard. What makes it worse is that we've all been damaged from childhood with all those graphical file-managing tools that are both slower than the command line and promote some very bad habits. Natural consequences of such damage are even getting standardised: there are a lot of programs that create weirdly-named dirs like ~/Downloads and ~/Documents, and there is the XDG base dir specification. It is my opinion that file hierarchies should facilitate three things primarily: 1. It should be easy to navigate to the file by typing its name. 2. It should be easy to manipulate files via scripts. 3. It should be easy to perform similar operations on a group of closely-related files, e.g. with only one or two commands all my configs for $program should be able to get archived, be copied, change permissions and ownership, etc. Generally I assume that the user knows where most of his (important) files are and can find the rest with find(1) or locate(1) or something. Firstly, please limit your file names to [A-Za-z0-9_.-]. Avoid minuses (dashes/whatever) as the first character (or at all, really): they cause all sorts of problems like "rm -i *" becoming "rm -i -f file1 file2". Minuses are OK for semantic purposes, however, such as naming files after dates ("2013-05-05.photo.jpg" (though depending on how it will be used it might be fine to name it "20130505.photo.jpg")), or versioning them ("prog-0.4.tar"). If you use them for separating "fields" in filenames, consider using periods instead. There are also some characters, which are not mandated by POSIX, but which are still commonly used by some programs: comma (,) and colon (:) are used in maildirs, I believe some GNU utilities like tilde (~), I think systemd uses at (@), and there's lost+found on many (most?) Unix filesystems, and test(1) can often be invoked as `[', or more specifically, `/bin/['. I'd advise against using those for normal files, however. Be conservative with numbers and underscores and only capitalise letters with VERY IMPROTANT files that should be sorted first like README, Makefile, CHANGES, INSTALL etc. That is not to name directories in your ~ so: they already are sorted because the "unimportant" files have a period as the first character. Naturally, keep the names short. Of course if you have some, e.g., photo saved on your disc, but you only open it once a decade, it's fine to give it a longer, more descriptive name like "1998.tortoise_on_red_sea_beach.png" (do tortoises live on the Red sea?). On the other hand if you have a copy of "the brave soldier Schweik" on the disc, there is no need to name it "the_fateful_adventures_of_the_brave_soldier_schweik_during_the_world_war_by_jaroslav_hashek.pdf" that doesn't even fit the line! Instead, Name it "schweik.pdf" and be done with it. Also, people coming from the Micro Soft-Disasterous Operating System (MS-DOS) and its successors tend to name files like `foo.txt' rather than just `foo'. On Unix this is usually unnecessary, except for web servers and similar, which can quickly evaluate content-type based on the filename, rather than magic numbers. Another thing to avoid is not to create too many separate files/directories just for the purpose of doing so. In particular, I sometimes see the following: - People abuse the directories like foo.conf.d/, by putting each small $thing into a separate file, without any real logical reason to do so. - People write programs, where each single function is its own file. I'd avoid symlinks: they are sometimes necessary between files, but directory symlinks are evil. Just imagine how many files have been wrongly `rm -rf'ed since the introduction of symlinks! PROGRAMMING This advice mostly relates to C. I am very inconsistant with my sh, awk and perl styles, and I don't have one for other programming languages. There are already a lot of good stylistic references for C (and probably for other languages, but we'll stick with C here). Good ones in particular are OpenBSD's style(9) (google it on http://man.openbsd.org) and the style employed by K&R in their book "the C programming language". A lot of good advice is offered in "the practice of programming" by Brian Kernighan and Rob Pike. Do read those for a more complete overview of good style. Here I shall instead go over some techniques that are critically underemployed and some mistakes that I find it very helpful to compile with flags `-std=c99 -Wall -Wextra -pedantic -O0 -g'. Of course, this is just for the debugging purposes, for "real" compilation you'd probably remove at least the `-O0 -g'. A common advice is to use linters. I use them *sometimes*. I often ignore them, however, because they tend to misreport more than find actual issues. Portable programs tend to not only cause much less pain, but to also be better written in general, so try to stay within mainstream, only using local extensions when necessary. Definitely do not rely on a particular compiler's or standard library's quirk. In the vast majority of cases code alignment (except indentation) is a huge waste of effort that doesn't even pay off, so I usually avoid that. When possible I position my functions starting on their own line like so: int foo(void) ... this makes the code very easily grep(1)pable (grep ^foo\( *.c)). And no need for ctags! Generally it is worth writing in a way that'll make moving around simpler. Side effects should generally be avoided, except for some well-understood idioms like *a++ = *b++. There are many programs whose portability depends upon preprocessor instructions like #if and #ifdef. It is easily observed that even in small number they can bring much confusion. Avoid such constructs when at all possible. Even the debug macros are often not really necessary. Compare the two: #define DEBUG #ifdef DEBUG printf("this is a debug statement\n"); #endif /* DEBUG */ and enum = { DEBUG = 1 }; if (DEBUG) printf("this is a debug statement\n"); If we set DEBUG to 0 in the second case, the compiler will optimise the statement away, but it'll warn us if we mess up the actual debug code. Comments are often used in silly ways. In fact some people even have their editors configured to make the comments dimmer because so many of them are useless. In fact they should have them configured to make comments brighter and/or bolder. Comments are very powerful and should therefore be used with much discretion. Good comments include the ones that go briefly over functions and global data, and the ones which help the reader to understand a complicated algorithm used in code. They should definitely not state the obvious, or be seen as a compensation for bad code, which should be rewritten instead. Such comments have a tendency of eventually contradicting the code, which is the great danger of comments. Here are examples of really bad comments that I've actually seen (actual text changed): /******************************************************* * * * * * MY HELLO WORLD ROUTINE * * * * * *******************************************************/ do_stuff() //!!! /**** Maybe I should draw more asterisks... ****/ And another thing is to definitely not just leave code commented out, at least not after debugging. Though if you do write a debugging function, leave it there, for you might need it in the future. There are a lot of reasons to divide programs into multiple files. Do create as many as are needed, especially for portability, and when compilation becomes too long. Do not, however put things into new files just for the sake of it. Many programmers seem to think that if they put every single function into its own file it'll somehow make things better. In fact it only makes their code a navigational nightmare. Speaking of multiple files, don't include files in your included files. There is, of course a "protection" often employ to avoid cycles: #ifndef THISFILE_H #define THISFILE_H #include "a.h" #include "b.h" /* file contents... */ #endif /* THISFILE_H */ which does not, however, protect your preprocessor from processing it over and over again Instead simply comment what files should be included before that one and let the actual includer worry about what to include: /* * Description of what the file is for. * * #include "a.h" * #include "b.h" * #include "thisfile.h" */ A thing that I, thankfully, see more rarely, but still do sometimes is variables getting burdened with as many qualifiers as the programmer managed to come up with. In truth, there should be as *few* qualifiers as possible. In particular many people seem to make variables unsigned for no reason other than that they should never be negative. That not only encourages integer overflow errors, but can make the code much more complicated. Observe, for instance, this piece of code, which calculates the column, corresponding to nth character in some line: ... for (off = 0, col = -1; off <= n; off++) if (s[off] == '\t') col = ((col+1)/8 + 1) * 8 - 1; else if (isprint(s[off])) col++; ... col here should never end up negative, however it is still useful to assign it to -1 initially to simplify the program, and possibly to indicate an error. Lastly there's a construct I rarely see talked about: if (bad) return 0; else { /* The rest of the function's code here. */ } Are such constructs the reason why people insist on two (or one ultrawide) monitors, and writing lines hundreds of columns wide? A much better version is very simple: if (bad) return 0; /* The rest of the function's code here. */