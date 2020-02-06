Photo credit: CC0 Public Domain

The World Wide Web has grown immensely since its introduction to science and research in 1991 and its subsequent expansion into the public and commercial sectors. It was originally a network of linked sites and other digital resources. It became clear very early on that some resources were so extensive that it would make more sense to dynamically generate the materials required by individual users than to save each individual digital entity as a unique object.

Nowadays countless websites are dynamic. With each individual visit, information and data are retrieved dynamically from a back-end database and displayed to the user when required. While static pages can easily be discovered by search engines, access to database content that controls dynamic websites is not possible. As early as 2001, when there were already several terabytes of public static web data, it was estimated that the “invisible web” or “hidden web” should not be confused with the “dark web”, which was larger than the visible resources.

In the International Journal of Business Intelligence and Data Mining, a team from India describes how they developed an intelligent multi-agent architecture based on genetic algorithms that can be used to extract information from the invisible web. The tools could make it possible to spin, scratch and catalog even materials that are supposedly not permitted by conventional search engines for a variety of applications.

D. Weslin from Bharathiar University and Joshva Devadas from the Vellore Institute of Technology describe the details and benefits of their approach in the latest issue of the journal. “The experimental results show that the proposed architecture offers better precision and memory than the existing web crawlers,” the team writes.

