SIMULATION RESULTS - Intelligent Agents for Data Mining and

Several keyword(s) were submitted to test the i-agent. The keywords covered a broad range of topics. Keywords submitted were single keywords, such as “Conference” and “Tennis,” and multiple keywords, such as “Confer-ence Australia,” “Intelligent Agents” and “Information Retrieval.” Volunteers tested the results of searching, first, by using the keyword(s) and their chosen search engine (AltaVista, Excite, Google, HotBot, Infoseek, Northernlight, etc.). The volunteers then used the i-agent program for search and retrieval from the WWW by giving the same keyword(s). The results from the two searches were then compared. For example, in one experiment, a volunteer used the keywords “Conference Australia” and performed a search using the AltaVista, Excite, Lycos and Yahoo search engines. These search engines, shown below, returned a large number of results.

Search Engines Number of pages returned

AltaVista Conference: 26,194,461 and Australia: 34,334,654 Excite 2,811,220

Lycos 673,912 Yahoo 40 categories and 257 sites

Table 1. Search Results as of June 2002

Search query: Conference Australia

It is very unlikely that a user will search the 26,194,461 results shown in the AltaVista query in Table 1. This could be due to users’ past experience in not finding what they want, or it could be due to the time constraint that users have when looking for information. A business would consider the cost of obtaining the information and just what value exists after the first 600 pages found. It is very unlikely that a user will search more than 600 pages in a single query, and most users likely will not search more than the first 50.

In a more recent experiment, a volunteer used the search query of

“Conference Australia.” This volunteer extended the search to include several other search engines that are considered more popular in their use. The results illustrate that search engines and the results are changing dynamically. How-ever, it is still very unlikely that a user will, for example, search the 1,300,000 results as shown in Google or the 3,688,456 shown in Lycos. The following table illustrates their results.

The dynamics of web data and information means that the simulation could be done any day and different results will be obtained. The essence of this Table 2. Search Results as of July 2002

Search Engines Number of pages returned AltaVista 766,674

Google 1,300,000 HotBot 807,500 Lycos 3,688,456 Northernlight 103,748

Yahoo 251 pages with 20 hits per page

Search query: Conference Australia

Table 3. Search Results and Evaluation as of June 2002

Search Engines

Number of pages returned Number of relevant pages

AltaVista Conference: 26,194,461 and

Australia: 34,334,654

2,050 180 179 from 200 pages

returned

Excite 2,811,220 1,889

Lycos 673,912 2,210

Yahoo 40 categories and 257 sites 84 sites were relevant

research takes that into account by providing a way to continuously provide the user with their desired information while taking into account the dynamics of information on the Web. To evaluate the performance of the information filtering and retrieval of the above system, several simulations were performed.

Simulation results for some of the keywords that are used to check the performance of the systems were: Conference, Conference Australia, Intelli-gent AIntelli-gents, Information Retrieval, and Tennis. The results obtained were then passed to the evolutionary computing system, and final results were passed to volunteers for testing. The evolutionary computing, as described above, was then used to find the most relevant pages to this search query. The evolutionary computing was run for 200 generations. The top 200 URLs returned from the evolutionary computing were then presented to the volunteers to assess the results. The volunteers then compared the results obtained from i-agents to their own opinion from the web sites they visited and evaluated. It is interesting to see that the results obtained by combined i-agent and evolutionary comput-ing are very good.

The number of URLs returned is feasible, and the users are provided with relevant web pages. Out of another 62 experiments performed, the combined i-agent and evolutionary computing system performance was good. The volunteers have reported that in more than 70 percent of the experiments (out of 62 experiments), the results of the combined i-agent and evolutionary computing system were very satisfactory. Table 4 shows that relevancy of the

Table 4. Search Results by i-Agent and Combined i-Agent and Evolutionary Computing as of July 2002

Search query Number of pages returned i-agent

Number of relevant pages returned by i-agent and evolutionary computing

and fuzzy logic Conference 180 from 215 pages returned 169 from 200 pages returned Conference

and Australia

127 from 223 pages returned 127 from 200 pages returned Intelligent

Agents, and Tennis.

189 from 348 pages returned 132 from 200 pages returned

Information Retrieval

117 from 216 pages returned 111 from 200 pages returned Information

Retrieval

127 from 223 pages returned 108 from 200 pages returned

URLs, based on some of the queries given to i-agent and evolutionary computing.

CONCLUSION

The amount of information potentially available from the World Wide Web (WWW), including such areas as web pages, page links, accessible docu-ments, and databases, continues to increase. This information abundance increases the complexity of searching and locating relevant information for users. This chapter suggests intelligent agents as a way to improve the performance of search and retrieval engines. The use of search engines, the i-agent, evolutionary computing, and fuzzy logic has been shown to be an effective way of filtering data and information retrieval from the World Wide Web.

In this research, an intelligent agent was first constructed to assist in better information filtering, information gathering, and ranking from the World Wide Web than simply using existing search engines. Simulation results show that the performance of the system using the intelligent agent provides better results than conventional search engines. It is not very surprising since agents use the results of search engines and then filter the irrelevant web pages. In most of the simulated cases, the combination of evolutionary computing, fuzzy logic, and i-agents in filtering data and information provided improved results over tradi-tional search engines. This research is unique in the way the agents are constructed and in the way evolutionary computing and fuzzy logic are incorporated to help users. The method described in this chapter demonstrates that using i-agents, evolutionary computing, and fuzzy logic together has a great deal of promise. In particular, the method employed shows that it can:

• collect and rank relevant web pages;

• reduce the size of the result set and recommend the more relevant web pages according to a given query;

• increase the precision of the results by displaying the URLs of relevant web pages.

The methods employed and described here require further study. The first study that is being looked at is directed at individuals. This includes having more volunteers to test and compare the methods. Given the information that these volunteers obtain, evaluate the value of the resulting information. Then,

com-pare the results among the individuals. The second study that is being looked at is for business competitiveness. This study employs the methods described in this chapter and evaluates the information that a business needs in its industry to maintain a watch on its current and new competitors.

ACKNOWLEDGMENT

The authors would like to acknowledge the assistance of Mr. Long Tan and Mr. Thien Huynh in programming efforts on several parts of this research project.

REFERENCES

Bigus, J. P. & Bigus, J. (1998). Constructing Intelligent Agents with Java

— A Programmer’s Guide to Smarter Applications. Hoboken, NJ:

Wiley InterScience.

Cabri, G., Leonardi, L., & Zambonelli, F. (2000, February). Mobile-agent coordination models for Internet applications. IEEE Spectrum.

Chen, H., Chung, Y., Ramsey, M., & Yang, C. (1997). Intelligent spider for Internet searching. In Proceedings of the 30^th Hawaii International Conference on System Sciences (vol. IV, pp. 178-188).

Cho, H. J., Garcia-Molina, H., & Page, L. (1998). Efficient crawling through URL ordering. Proceedings of the 7^th International Web Conference, Brisbane, Australia (April 14-18).

Chong, C. W., Ramachandran, V., & Eswaran, C. (1999). Equivalence class approach for web link classification. Malaysia: University Telekom, Faculty of Engineering and Information Technology (Conference Paper).

Eyeballz. (2002). Search engine marketing in New Zealand. Last accessed May 24, 2002, from: http://www.eyeballz.co.nz/search_

engine_statistics.htm

Hines, M. (2002). The problem with petrabytes. The Information Architect.

Last accessed June 5, 2002, from: http://www.techtarget.com.au Jensen, J. (2002). Using an intelligent agent to enhance search engine

performance. Retrieved January 2002, from: http://www.firstmonday.dk/

issues/issue2_3/jansen/index.htm

Jentzsch, R. & Gobbin, R. (2002). An e-commerce communicative multi-agent agent model. Proceedings of the IRMA 2002 Conference, Seattle, Washington (May 19-22).

Lawrence, S. & Giles, C. L. (1999). Accessibility of information on the Web.

Nature, 400, 107-109.

Lucas, I. & Nissenbaum, H. (2000, June). The politics of search engine. IEEE Spectrum.

Maes, P. (1994). Agents that reduce work and information overload. Commu-nications of the ACM, 19(1), 31-40.

Nwana, H. S. (1996). Software agents: An overview. Knowledge Engineer-ing Review, 11(3), 205-244.

Searchengines.com. (2002). Last accessed June 10, 2002, from: http://

www.searchengines.com/searchEnginesRankings.html

Searchengines.com. (2002). Last accessed June 10, 2002, from: http://

www.searchengines.com/search_engine_statistics.html

Sullivan, D. (ed.). Nielsen//NetRatings search engine ratings. Current Audi-ence Reach. Last accessed May 9, 2002, from: http://

www.searchenginewatch.com/reports/netratings.html

Watson, M. (1997). Intelligent Java Applications for the Internet and Intranet. San Francisco, CA: Morgan Kaufmann.

Dans le document Intelligent Agents for Data Mining and (Page 39-44)