The main work done during March was on the prototype indexers, following the master-slave interaction model described in the February report. The following results have been achieved so far:
A prototype master (temporarily called the "consult-ant", since the slaves must consult him before exploring a document; this will probably change to "account-ant", since this process keeps track of everything) has been developed using a subset of the prototype protocol. So far, only the CLAIM request is supported, as well as the new UNCLAIM request (see protocol extension below).
For each CLAIM request, the consult-ant performs the following steps:
A prototype slave (called the "correspond-ant", since it gets assignments and then goes out to follow up on the information). Each correspond-ants is started with a base document, which it explores to build its queue. It then procedes to cycle through the queue until it finds a document that it may explore. The exploration phase consists of fetching the desired document, adding any links to the queue, and storing index information (for the moment, simple word frequency lists). If the document cannot be fetched (due to connection timeout, etc.), the document is returned to the queue and an UNCLAIM request is sent to the consult-ant.
An additional request type has been added to the protocol: UNCLAIM. After a correspond-ant has successfully claimed a document, it may still be unable to explore it for some reason (i.e. document no longer exists, the site is down, etc.). Under these circumstances, the correspond-ants should sent an UNCLAIM request to their consult-ant:
UNCLAIMurl", where url is fully specified. This request says "I could not explore this document" A master may only response with:
OKAY" meaning "Acknowledged."
A preliminary test run has been made, running several correspond-ants (between 2 and 5 at different times) on different machines here at CMT, and I am happy to report that the cooperation model works! Unfortunately, the consult-ant does get bogged down at times when fetching robots.txt files (although this diminishes with time as more of them are cached locally). More on this next month when I've been able to tune them more carefully and follow their progress better.
I will be attending the Third International WWW Conference in Darmstadt, Germany in April to participate in the workshop on indexing the web.
I have received interest from Johnny Irons at the University of Aukland about running an indexer for New Zealand. I am exploring ideas now for having territories to facilitate this sort of arrangement.
I have corresponded briefly with Darrell Woelk, head of the Infosleuth project at MCC, about ways in which out projects could collaborate. I am hopeful that an overlap of interest will exist, as such collaboration would be very beneficial.
I corresponded briefly with Jeremy D. Zawodny, a CS student at Bowling Green State University in Ohio. Jeremy was interested in the possibility of doing some independent study work relating to WebAnts. I am looking forward to hearing back from him.