My first C program: simple process supervisor like daemontools/runit in 359 LoC

lxfontes · on May 25, 2012

Good job !! Consistent coding convention :)

As a constructive note, think about reversing your conditions (check if something NOT happened) and break huge chunks of code into functions. This will guarantee a 'sane' amount of nested 'ifs' and improve code readability. Ex:

  for (int i = dn - 1; i >= 0; i--) {
    if (!has_child(dlist[i]->d_name)) {
      if ((fp = fopen(path, "r")) != NULL) {
	  .............
      } else {
        slog(LOG_ERR, "Can't read %s: %m", path);
      }
    }
  }

Notice how the error message is 20 lines far from the actual condition. You can rework that as:

      if ((fp = fopen(path, "r")) != NULL) {
	  do_some_work_with(fp);
      } else {
        slog(LOG_ERR, "Can't read %s: %m", path);
      }

or

      if (!(fp = fopen(path, "r")) {
        slog(LOG_ERR, "Can't read %s: %m", path);
        continue;
      }
      //removed 1 nesting level
      ........

fallenhitokiri · on May 25, 2012

I, as most developers, appreciate well commented code. But as others have mentioned you are going a bit too far.

  // The user has given and illegal number or type of arguments. The program
  // usage is printed to the standard error stream and we exit abnormally.
  fprintf(stderr, USAGE);
  exit(EX_USAGE);

What else is it supposed to be? This is, even for me who does not use C(++) on a regular basis, obvious.

It looks like you are interested in writing well documented code. I think a good example you can learn a lot from is Redis[1]. Also written in C.

[1] https://github.com/antirez/redis

//edit: fixed typo

codebeaker · on May 25, 2012

I adore that you used literate style documentation for your source, as someone who is also learning to love programming efficient tools in C I very much appreciate that.

I question the need for yet-another-process-monitor, but I acknowledge that upstart, monit, etc are by and large overcomplicated and messy. They provide more features, of course, but they violate my interpretation of the unix philosophy.

I'm sure many people on HN will condemn you for writing this but, I for one am very excited to try it.

I had a similar idea of a daemon that would allow inter-process signaling for users to signal processes which shared their GID, to simplify web application deployment without relying on sudo privilege restrictions.

You may have just inspired me to take your lead, and just do it - a few hundred lines of C, and smart documentation have a lot of value as learning, experience, proof-of-concept and a discussion point; and that in itself is reason enough for this to exist!

0xbadcafebee · on May 25, 2012

I can't really comment on code since i'm no master of C programming myself, but there are really way too many comments. It actually makes it harder to read code if there are comments on every new line, and harder still if you have multiple lines of comments to explain one line of code.

My recommendation is to have one comment to explain a large chunk of code and let the programmer discern what part is doing what instead of annotating it piece by piece.

m_for_monkey · on May 25, 2012

It is meant as a literate program: http://uggedal.github.com/going/going.c.html. (Scroll down on this page!)

0xbadcafebee · on May 25, 2012

I saw that, and I still think it's quite excessive. Perl modules often come with POD documentation in-line with code, but it's usually just to document something like a function or method with maybe an example or two. (And yes, POD is not literate programming, but it's more like the comments you'd see in a normal program source code)

I suppose this is up to the individual, but I don't like literate programming partly because it interferes with a dynamic that coders around the world have. Most people include comments only when it's necessary. It signifies something important to take note of and clarifies unusual behavior. Literate programming partly strips that away without giving you a really long-term benefit... It seems useful to me mostly for the initial implementation and ends up making maintenance a chore.

If I can modify my original comment a bit for literate programming: Make a comment at the beginning of your function with a bullet list of how the function will work, step by step, and then just write the code. Easier to read and you still have your [mostly-]literate program.

tptacek · on May 25, 2012

If you keep commenting like this, the metadiscussion about comments might become competitively long.

I liked the comments, not because I needed them to help understand what the program is doing (this is a bog standard C program that I think most Unix developers have written several times over) but as document of someone learning C, and as something I can show other people who want to learn C.

azelfrath · on May 25, 2012

I'm a sub-par C programmer and the comments definitely helped me out. Instead of assuming the reader knows what each function does, they are explained in enough detail that the average reader could at least Google for more info based on keywords.

Camillo · on May 25, 2012

It's actually a pretty nice example of the problems with overcommenting. For example, redundancy: "atexit(cleanup_children)" is perfectly self-explanatory, yet he repeats the same thing with the circumlocution "We setup our cleanup function as an exit handler which will be called at normal process termination." That's just a waste of both the writer and the reader's time.

And of course, redundancy spawns inconsistency: the code says "spawn_unquarantined_children()", but the comment says "All quarantined (a newly initialized child structure is quarantined by default) children is spawned for the first time." Which is it, quarantined or unquarantined? And the comment breaks number agreement, too.

I find it remarkably illustrative that comment rot has already set in into version 1.0 of a "359 LoC" program (actually 797, according to github).

tptacek · on May 25, 2012

I give him kudos for using atexit() at all; it's sorely underused.

lshevtsov · on May 25, 2012

This. Having a paragraph of explanation text for each line is not a replacement for source code that is readable by itself.

I understand, though, that explaining each and every line of code was done for the sake of learning C.

tptacek · on May 25, 2012

This is very readable C code, as C code goes. The documentation isn't there as a substitute for that.

sp332 · on May 25, 2012

Agreed. Even literate programming (where the comments often outweigh the code) doesn't stop to explain #include <stdlib.h> !

Edit: ok I forgot this was his first C program.

tptacek · on May 25, 2012

No, but you can see from that page that if you wanted to write a simple C program as a document on how C works that the format works nicely for it.

It's his first C program. Mine were similarly prolix. Another habit I got out of after a year or so was wrapping every use of every new API in a me-friendly wrapper function. Also, of writing my own linked list code over and over again.

advisedwang · on May 25, 2012

Literate programming tools like CWEB often have a tool (in CWEB called CTANGLE) that extracts just the machine-readable code. Perhaps there is some tool you can use to decomment this code (maybe even just grep :D)

antirez · on May 25, 2012

congrats for your first C project! A few reamarks:

1) // style commands are C99 only. If you don't have other C99 dependencies it's somewhat a shame it could not compile in older systems just for this.

2) inline should rarely be used if not into performance critical code.

3) two space indent is not C-ish ;)

4) While 'bool' is valid C99 code, it is more or less never used in C code.

rcfox · on May 25, 2012

> While 'bool' is valid C99 code, it is more or less never used in C code.

I disagree. It's used quite a bit in the code I work on. Little things like an explicit statement that this variable is only true or false significantly reduce the cognitive load when reading code.

Anyway, even if it were true that nobody uses bool, there's absolutely no reason to discourage its use.

zvrba · on May 25, 2012

> Little things like an explicit statement that this variable is only true or false significantly reduce the cognitive load when reading code.

Since it's allowed to use bool in arithmetic context (some_int + some_bool will compile fine, for example), having a bool type is useless. (Useless in C and C++ that is; Java, for example, has sane bool handling.)

I have no idea why C99 introduced _Bool type when they could have just inserted typedef char bool; in stdbool.h. Has anybody ever used for something useful the fact that memory representation of _Bool is constrained to read as 0 or 1?

fusiongyro · on May 25, 2012

1. It's semantically valuable, or you wouldn't see it defined in everyone's code

2. C lets you commit all kinds of atrocities in arithmetic contexts with or without it.

3. Having a standard typedef saves us from all having to make our own

4. None of your points support the case of "don't use stdbool"

zvrba · on May 25, 2012

> 4. None of your points support the case of "don't use stdbool"

Huh? I was saying that the language did not have to be extended to get a bool typedef. C99 defines a new type, _Bool (name from the reserved identifier space), which is different from existing types. stdbool is a simple typedef of _Bool to bool.

Also, introducing bool would have had meaning if implicit conversion to int wasn't allowed. But since it is, it's a useless type.

My viewpoint is that types exist for enforcing semantic constraints; what use of a bool type that allows performing semantically nonsense operation, such as implicit conversion to int in arithmetic context? The C99 committee messed it up big with bool.

fusiongyro · on May 25, 2012

Adding it to the language saves us all the work of typedefing it. Which is a very common practice.

C's approach to typing is, in general, pretty weak. Mostly they're there so the compiler can decide how much space to set aside. There's no point singling out this particular instance of nonsense when everywhere you turn in C you're greeted by other forms of the same nonsense. There's no point being pedantic about bool if you're not going to be pedantic about all the other types.

Enough bikeshedding. Use it or don't.

zvrba · on May 26, 2012

> There's no point singling out this particular instance of nonsense when everywhere you turn in C you're greeted by other forms of the same nonsense.

By your reasoning, since nonsense already exists, it's pointless to try to do the right thing. And they could have done it.

alexchamberlain · on May 25, 2012

Presumably some optimiser could use this fact to pack several bools into the same register?

VMG · on May 25, 2012

it still is a useful annotation

ajross · on May 25, 2012

I'm aware of no "older systems" still in reasonable use that don't support the // comment convention. That got added to pretty much every C toolchain as soon as anyone started shipping a C++ preprocessor. I believe it is disabled by the -ansi switch to gcc though; use -std=c99 instead if you care about that stuff (or better: don't use -ansi, it doesn't actually make your code more portable, it just whines a lot).

maybird · on May 25, 2012

(3) You bring up an interesting issue. Why is there so much variance out there? Google uses 2 spaces, the linux kernel uses 8, and most people I know use 4.

Are people using such a wide range of font sizes, and font aspect ratios, that 4 spaces doesn't make sense for everyone?

udp · on May 25, 2012

I used 3 on my last open source C project, just to troll every faction of the indent war. I actually think it looks pretty nice.

https://raw.github.com/udp/json-parser/master/json.c

dschulz · on May 25, 2012

indent -kr json.c

problem solved ;-)

graue · on May 25, 2012

When I was writing a lot of C, I settled on 8-column tabs (a style I copied from OpenBSD). Combined with an 80-column wide terminal, it prevents you from nesting too far down — 4 or 5 levels and you keep having to wrap, which is an automatic sign that "something is wrong, I need to refactor this."

tptacek · on May 25, 2012

I was an 8spcr (default tabstop, tab-to-indent, learned C in pico). I've gone a year or two writing nothing but Ruby (where 2spc is the rule) and recently had to build a small C program, and oh my god is 8spc terrible.

But I'm also a single-return fetishist, so I tend to nest deeper than normal.

jonathansizz · on May 25, 2012

It's funny how so many coders take offence at such trivial things. I always get annoyed by cuddled elses, although I don't suppose I could give a rational explanation for this!

pivo · on May 25, 2012

I also feel annoyed by cuddled elses. My reason (rationalization?) is that because the else clause is visually further indented it seems to suggest that the clause is somehow less important, and it's also harder to see the structure of the code when the if and else don't align.

basugasubaku · on May 25, 2012

Your get_tail_child always returns NULL

  for (tail_ch = head_ch; tail_ch; tail_ch = tail_ch->next);

  return tail_ch;

uggedal · on May 26, 2012

That was embarrasing. Should be fixed now: https://github.com/uggedal/going/commit/15a83397076ffbebe585...

jws · on May 25, 2012

Well spotted!

Going to want to add a test case with more than one child at a time. I think as is it loses all but the most recent child.

jws · on May 25, 2012

Congratulations! You have written completely non-terrifying C code. This would not inspire any sense of dread in me if I had to go in and look for a problem, so I will.

Comments – I see you have alarmed the HN crowd, but I was reading the annotated page and it makes sense in that context. As a data point, I only looked at the lefthand column paragraphs about twice, related to the quarantine handling in the forever loop. There is an odd effect if children are dying faster than the quarantine period, but I convinced myself that there was a finite number of children and they would eventually get quarantined if they kept that up allowing the period to finally expire. I looked at the titles though.

Design – I'd lose the fixed size string buffers in the child_t. 256 sounds like a lot for a command until someone uses a wildcard in a directory. strdup() and free() are your friends here. I see you have a centralized free function for the child_t and are using valgrind. I trust you will not leak or shoot off a body part.

Portability – You are using "%m" to get the strerror(errno) into your log messages. This is a GNU libc extension. I don't have a pure BSD machine around, but it doesn't work on OS X. This will likely also crop up for very small Linux systems that use alternative C libraries for space reasons, such as OpenWRT.

strncmp() and the strnXXX() functions in general – These are not really what you want. Think of them as functions for fixed length string fields with optional NUL terminators. Then never use them.

Consider…

  strncmp("foo", "foobar", 3)

It returns 0, that means equal. That is seldom what you actually intended. In your case, strcmp() is fine because you are working with guaranteed NUL terminated strings.

I see a strnlen()…

  inline bool str_not_empty(char *str) {  
    return strnlen(str, 1) == 1;
  }

Most C programmers would rather see…

  inline bool str_not_empty(char *str) {  
    return *str != 0;
  }

… but they'd also leave off the "inline" and use "int" instead of bool. "inline" is the compiler's problem, let it take care of that. The part you might do for performance is to mark the argument as "const" since you aren't going to change it. That gives the optimizer more freedom around the calls, though in this case the optimizer will probably inline it and it won't matter. "const" is also a nice hint to people reading your code.

I suppose summing up strings, "char *" is not an opaque type to C programmers.

Signals and file descriptors across forks – Signals are generally a bit tricky, and what to do with open file descriptors across a fork is also fussy. I'll leave that to you. (Frequently in cases like this, closing all file descriptors is the right answer. A shame there isn't a portable way to do that. Remember you might have something open from the syslog functions.)

alexchamberlain · on May 25, 2012

* Using strncmp is more secure than strcmp.

* What's the harm in `inline`? I would encourage its use, rather than discourage?

jws · on May 25, 2012

Using strncmp is an error. It returns erroneous results if you are using NUL terminated strings and one or both strings is too long.

At one time (shortly after the Morris worm) people flocked to using strncpy() for security. That was also an error. strncpy() will happily leave you with a missing NUL terminator, plus it has sad performance implications since it fills the whole destination with NUL, so if you make a "plenty big" buffer you do a lot of useless memory writing.

To reiterate: The strnXXX functions are not for NUL terminated strings. They are for fixed size buffers of characters padded with optional NUL characters.

As to "inline*. The compiler knows better and will probably ignore it anyway. I think the place where it is wanted is if you are declaring inline functions in header files. This prevents the duplicate function errors at link time that result if included in more than one object file. The standards and compilers are a mess on precisely how to do this.

xyzzyz · on May 25, 2012

… but they'd also leave off the "inline" and use "int" instead of bool.

Why use int, when there's bool in C99 that clearly communicates the intent?

zvrba · on May 25, 2012

Because 1) bool is a useless type in C and C++ since it's an arithmetic type, 2) it's no clearer than an int. (consider a function that returns true on failure, logic being that the rest of the program should be aborted.)

You expect that "true" is success, but you can't know without checking, and then you're back in the same situation as with int.

In practice, idiomatically, you write something like if(!some_function(blah)), and in this context it totally doesn't matter whether the function returns bool or int.

jws · on May 25, 2012

I like the human communication aspect of "bool", but dislike the compiler language dependency.

The C cultural expectation is that anything named like a predicate is going to return an int for true or false. In another language with a different culture I would use a Boolean or whatever it had.

xyzzyz · on May 27, 2012

Every modern C compiler (not counting Microsoft one with terribly lame support for C99) supports bool, so there's no compiler dependency.

utf8guy · on May 25, 2012

Is this really you're first C program. That's very, very impressive! My first C program was something like:

    main() {
      printf("Hello, world\n");
    }

travelstacker · on May 25, 2012

Gasp! I'm sure your first program is meant to read:

#include <stdio.h>

main() { printf("hello, world\n"); }

;)