The Current WebAnts(tm) Effort

An alternative search engine model that helps to address the resource problem while avoiding the fatal trade-off is to distribute the resource load. That is, instead of having one engine attempt to index the entire web, the problem can be shared by a number of smaller cooperating engines each of which indexes only a small portion of the web. Information can be shared as to what documents have been indexed, so that documents may be indexed non-redundantly in parallel. This reduces indexing time, a factor that is already becoming a problem for some engines; it has been estimated, for example, that Nikos could index the web every five weeks, based on an assumption of 140,000 documents. Given that Lycos has knowledge of the existence of seven times this number of documents, it seems unlikely that a solitary search engine could re-index the web in a reasonable period of time. In addition, a distributed approach eliminates the need for a gigabyte of storage in one place. Each cooperating engine need only provide as much storage as is comfortable. Since no site is required to make a large resource commitment, more sites will be able to participate.

Similarly, cooperation can be employed when serving index information to user. Nikos hopes to have a server for its entire index in every country with a significant web presence, thereby distributing the information serving load. By distributing the index, this can be carried one step further. Queries would be submitted to a local server, which would either provide an answer itself or propagate the query to other cooperating engines, as appropriate. By allowing for the propagation of requests and consolidation of results, the entirety of the index is available to every user without him or her having to query every engine and without any one engine having to have the entire index.

An informal and outdated (and somewhat redundant), but more detailed description of the drawbacks of current indexing schemes, the advantages of distribution, and other potential applications of cooperative agents is available.

Our approach is that described above: numerous small search engines, or "ants" that cooperate in order to make indexing more resource efficient and information serving more effective, in terms of load distribution. Specifically, we will build and test prototypes of both a cooperative web indexer and a cooperative web information server.

Testing for each of the web agent prototypes is expected to be phased as follows:

The index structure will be similar to that used by Lycos, in that it will base a portion of its summary of each document on the document's structure and another portion will be based on the relative weightiness of terms within the document as compared to the corpus as a whole. This information structure has proved effective for Lycos and can be implemented straight-forwardly. While the design of an index structure is important to a search engine, extensive work in this direction is beyond the scope of the current effort.


Back up to the WebAnts home page

Last updated 06-Feb-95 by John Leavitt (jrrl@cmu.edu)