
Pmnetd 2.0
Various random notes.
Greg Cronau
gregc@pm-tech.com
October 8th, 1999

This re-write began because of a desire to use hylafax, running on a BSDI
3.1 system, with a rack of modems attached to a Portmaster 2e. We first
tried using pmnetd version 1.5.

It didn't work.

It actually was able to talk to the modems, but the fax transmissions
themselves were full of errors, dropped scan lines, and modem timeouts.

On the other hand, this new version of pmnetd has been able to consistently
send clean faxes to 8 modems simultaineously.

While this version was re-written with the express purpose of getting
hylafax working, I tried to make all changes from a generic viewpoint.
Testing was done with programs other that hylafax. And except for one
very small bit of hylafax-specific voodoo in the pipeloop() function,
there is nothing in the current code that is specific to hylafax.

And even that small bit of code in pipeloop() should not hurt other uses
of this daemon, and removal of that code will not hurt hylafax either.
The voodoo is mostly a performance improvment. It prevents an anoying,
but non-fatal, timeout of about 10 seconds at the beginning of each fax
transmission. That code has further comments in the source.

I fully expect this version to work fine for all applications. However,
there are 2 issues, that have been noted on the portmaster user's mailing
list, that I havn't been able to address:

1.) Sending breaks through this daemon:
    As near as I can tell, this can't be done. The pty specification has
    no mechanism to allow the communication of this kind of out-of-band
    info to be transfered across the pty/tty interface. I've seen reports
    that there was a version of the older in.pmd for SunOS that made this
    work, but I can't see how it was done without a kernel patch.

2.) I've also seen a report(and a patch from livingston) to fix a problem
    in which an offline printer on the portmaster propagates back to
    pmnetd and causes it to spin and suck up 100% of the CPU. I'm not
    sure of what exactly is happening here. I don't have a serial printer
    to test this problem. And frankly, I can't see the mechanism that
    causes this to happen. Hopefully, the redesign has eliminated the
    source of this problem. But if it still exists, I'd like to hear
    about it.

Please feel free to mail any suggestions, problems, bug-fixes, enhancements,
or large wads of cash to the above email address.


-----------------------------------------------------------------------------
Building and installing:

Compiling is fairly simple. It's a 1 line command and is documented in
the source code. I saw no reason to create a makefile.

I install my binary in /usr/local/sbin and the pm-devices file in
/usr/local/etc. In rc.local I put the command:

"/usr/local/sbin/pmnetd -l"


-----------------------------------------------------------------------------
Changes from Livingston(Lucent) version 1.5:

1.) The biggest problem with the original code is that it was not 8-bit
    clean. In fact, I'd called it 7-bit dirty... :-) The old code used
    strlen()'s to check the size of it's internal buffers, and just threw
    away any NULs it found in the data. Of course, this completely screwed
    up any binary transfers!

    Code was changed to keep a count of characters in each buffer, and to
    allow for proper filling of a buffer than still contains data, and
    for writes() that only write a part of the data.

2.) Old code didn't use non-blocking I/O. If pmnetd was only driving 1 line
    at a time, this wasn't a problem. Anything more and it fell apart.
    If the OS signaled, via the select() call, that an fd was ready for
    writing, pmnetd might try to write 1000 chars, but the os might only
    be ready for 500. The write() would block at this point. This would
    bring all multiplexing to a halt and everything would hang until that
    write() finally completed. This probably wouldn't cause much of a
    problem with printers, or interactive sessions, but fax modems appear
    to operate like a streaming tape drive. Unless you keep the data stream
    pumped, they will fail with a "DTE to DCE underrun" error.

    Code was fixed to use proper non-blocking I/O on all file and socket
    descriptors.

    (Note: I still use blocking I/O during the portmaster secure login
     phase. I may(probably) change this in the future.)

3.) Buffer sizes were increased to allow better throughput.

4.) A number of improperly implemented malloc()'s were replaced by static
    arrays.

5.) The main pipeloop() function was pretty thouroughly overhauled. It had
    the look of something that grew in fits and starts and was tweaked to
    fix problems as they were found. There was very little orthogonality
    to something that should have been very orthogonal. It may have had
    several different authors.

6.) Buffer handling was re-thought to so that every effort is made to deliver
    all characters to their destination. Buffers are only flushed when an
    error occurs or the destination closes before the characters can be
    delivered.

7.) Socket and pty opening and closing operations were rethought and
    hopefully they're done in a more logical manner now.

8.) Added a new debug switch "-d" for additional debugging info. This is
    info above and beyond the normal info sent to stdout or the logfile.

9.) Logfile switch was changed so that if just "-l" is used, info is sent
    to the default logfile. Currently that is "/var/log/pmnetd.log".

10.) Default devices file is now "/usr/local/etc/pm-devices".

11.) SIGHUP handling was changed. The first reception of SIGHUP simply
     closes and re-opens the logfile. This is useful so that nightly/weekly
     logfile rolling programs can roll the logfile without affecting any
     current users. However, if a second SIGHUP is recieved within 60
     seconds of the first SIGHUP, then pmnetd will treat that as a restart.
     All connection will be closed, the devices file will be re-read, and
     all data structure will be re-initialized.

The original code had enough problems that I had no confidence in
the correctness of any of it. I decided to go over every function
line-by-line and rethink the operation of each one. This resulted in
the following changes:

12.) A number of functions were eliminated or combined with other functions.

13.) A fair number of new functions were created to combine redundant code,
     and to provide better structure to the system.

14.) All failed system calls are now trapped and produce some kind of logfile
     message. This should produce a better trace of failures and should make
     debugging easier in the event of problems.

15.) As much as possible, functions were changed to produce consistent
     return values. Unless otherwise noted, functions that return pointers
     return NULL for error, and functions that return int, return -1 for
     failure and 0 for success.

16.) A number of bugs and inconsistencies were discovered and fixed.

17.) A lot of sloppy code was cleaned up and reduced to something that
     I think is a bit more compact and understandable. The code wasn't
     necessarily wrong, but it was done in a painful manner. Compare
     the function ResolveDNS() from version 1.5 with my new version.
     (In particular, note the *creative* use of malloc() and free() at
      the beginning of the old function. :-))

18.) I also reformatted the code into a format I'm used to working in.
     For the most part, it's fairly normal, with a few eccentricities
     on my part. I use a tab setting of 4. If you work on or view this
     code, I highly suggest you use that tab setting.


-----------------------------------------------------------------------------
Implementation notes:
  (More here later when I get time.)

-----------------------------------------------------------------------------
Platform problems:
  (More here later when I get time.)

