The database

Overview

The database has been completely revised. The goals of this reconstruction were the building of a faster system and more reliable database. The back-end of the database is composed of

Index
Site files
Start database

The index is a tree structure, allowing fast searches of the strings file. Rebuilding the index is a relatively expensive process, so it is not updated after every new key is inserted into the database. Thus, in order to handle a search request, a linear search is made of the (usually) small un-indexed portion of the strings file, in addition to checking the fast index. To ensure quick response times, the index should, periodically, be brought up to date with the strings file.

Within a catalog, files having the prefix , are part of the index. For example, is the strings file, and is the fast index. Depending on the database, the creation of the index is done differently. The anonftp database creates an index file for all substrings where instead the webindex database creates an index file that contains only the left substrings.

The site files are now independent of one another. In case of corruption, only the site that is corrupted need be considered. In certain cases, such as webindex, extra files may be present with each site file. The files contain excerpts of the different URLs. The files, present in both anonftp and webindex, hold extra information for large site files, in order to speed up searches.

The directory , present in each catalog, holds information that ties the index with the site files.

Building the index

In order to rebuild the index, the program must be run.

The build may take some time to run, depending on the size of and the amount of memory allocated to the build. One would call as follows:

The above example builds the anonftp index using 50 megabytes of memory. Temporary storage is used in the directory. The man page lists all the other switches that can be used. One option of particular importance is “”, which forces the index to be rebuilt, even if the un-indexed portion of the strings file is very small. By default, the program will not rebuild the index if the un-indexed text is less than 1 megabyte.

Ordering the results

With the volume of information in the database and the amount of replication on the network, it is often desirable to return the results in some order of closeness. It is now possible to configure the order in which results are returned, according to the name of the domain to which the result site belongs. To avoid slowing down searches, this ordering is done at the time at which the data site is inserted into the database, rather than when the results are being returned.

The order is defined in the file . The following is an example of such a file.

Domains on the same line have the same precedence. Hence, sites and sites will be returned first, in no particular relative order. The sites in the domains and will be listed later. To discourage users from making unnecessary, long distance file transfers, the sites from New Zealand and Australia are returned last. (We assume that users accessing a North American archie server are usually in North America.) The “*” represents all other sites.

One can also specify the domains using pseudo-domains defined in as well as sub-domains such as .

The other programs

All programs prefixed by in aid in managing the database. They are:

Program	Description
db_build	Build the database index
db_check	Verify the consistency of the database
db_dump	Dump the list of sites that are in
db_reorder	Reorder the sites according to
db_siteidx	Build the files for specific sites
db_stats	Compute statistics
fix_start_db	Fix problems in the start_db database.

Information about the command line options of these programs may be found in their respective man pages.