The Archie system is designed to maintain several different information catalogs, of various types. Nonetheless, it was originally conceived to maintain a catalog of files available by anonymous FTP, and this is still the application for which it most popular today.
Now that the arserver, arexchange, and arretrieve programs have been configured (according to the instructions in “Configuring the Basic System” on page ), you can go about setting up the system to maintain an anonymous FTP catalog.
In general, to set up the anonymous FTP catalog, anonftp, the following steps must be followed:
Make sure the configuration files listed in the previous sections have been modified to reflect your system and information retrieval needs. In particular, make sure that the domains database has been configured (see ardomains).
Load the Data Host information into the Host Information Files. Use the host_manage program to add, delete and modify individual host entries. This is explained below.
You need only enter into the Host Databases, those Data Hosts for which you plan to be directly responsible. You need not (and should not) enter other sites. Once you start participating in the global inter-Archie data exchanges, those data hosts for which you are not responsible will be entered into your database automatically. Similarly, you need not delete those sites for which you are not responsible. The exchange subsystem will propagate this information from the “master” Archie system responsible for that site.
Archie can now retrieve from anonymous ftp sites pre-generated ls-lR.gz files In order to activate this you will need to setup the file in the following way.
l
Where is where the program is located on your system.
You also need to fix the file by replacing the line
by
Hence when using the option in retrieve mode
Archie will try to first locate the ls-lR.gz file. If it can’t it will look for ls-lR.Z, ls-lR in that order and as a last resort dynamically create the new listing.
When the parsing phase of the anonftp catalog fails on a particular data host the temporary parse file (with the parse_t suffix) is not removed from the holding directory (). In addition, the filtered file is renamed with the suffix .filtered to allow the system administrator to see both the unfiltered and filtered versions. The system administrator may, if desired, manually fix the input data if desired.
The system provides the administrator with the approximate location of the parsing error and displays the line that caused the problem. This error can be viewed through the use of the host_manage program after the update phase of the cycle has been completed. Alternatively, the Archie log file contains a more detailed explanation of the error. However, as illustrated in the example Figure 5, the parse_t file is not the one actually parsed since the filter program first runs on the input. As a result, the error line generated is that from the output of the filter.
By default, the distribution is configured to use perl language scripts for the filter_anonftp_unix_bsd filter (which is a soft link to the file ). The perl interpreter is available on many anonymous FTP archive sites. If you do not have perl installed at your site, you can change this soft link to point instead to the file , which is an alternative filter based on the standard UNIX sed(1) program. This second filter is less efficient than the perl filter so we recommend that you install perl and use that in preference.
To ensure the system is properly set up, the following programs can be tested by running them from the command line. The results will be written to stdout. Recall that, in normal operation, each of these steps would be run from the cron(8) daemon at predetermined times (see “Configuration” on page ). Almost all programs in the Archie system will accept a -v (verbose) command line option and you may want to invoke the programs with this flag when testing out the system.
Load some information into the Host Databases, either through the host_manage.
Run arretrieve. This will contact the local arserver and request a set of header files, forming the initial data for the Update Cycle.
Run arcontrol in Data Acquisition mode (the -r command line switch). This will read the header files and connect to the Data Hosts listed in them. It will then perform the required action to obtain the recursive listing from each site.
Run arcontrol in Parse mode (with the -p command line switch). This will clean up and parse the data into the form required for insertion into the catalog.
Finally, run arcontrol in Update mode (the -u switch). This inserts the new data into the anonymous FTP (anonftp) catalog and modifies the Host Information Files.
If you have started the Archie/Prospero server (dirsrv) you can then use any standard Archie client to query the database.
If you have a WWW server, you can use the cgi-client program to query the database