webcat

Webcat

Introduction

WebCat (stands for Web Categorizer) is a set of software packages that will be installed on host (FreeBSD, Linux or other) operating systems and runs as an Internet spider or robot (with user selected crawling policy) and fetch the pages to categorize the contents. The process of categorization is done by the use of classification algorithms which are bundled with the system although an API is available to let the end user load his/her own classification algorithm. WebCat will let the user to define a set of web sites that are similar and then ask WebCat to find all other similar web sites around the Internet. The fetched new web pages and sites could be presented in a formatted web page for the use of non-technical users.

Sample Use Case

The customer will give a few pages of criminal data to the system and then asks WebCat to find the similar pages. The list of pages might be used with LI Tools to letthe legal organizations trace the criminals, or the list of those pages might be used by GateWatch to limit the bandwidth of those users going to see those pages. It is also possible to find some specific news related to a corporate’s business and let them be displayed for internal users on special pages by the use of WebCat Refiner

Product description

WebCat has three major components. The most important part is the Core WebCat. Core is a server running of UNIXlike operating systems that does (1) crawl the web and fetch the web pages of sites, (2) stores the fetched data in a storage area, (3) runs multiple classifiers to classify the fetched pages, (4) displays the category of fetched pages in a way to compare the output of all activated classifiers, (5) responds to commands of WebCat Admin and (60) feeds the WebCat Refiner. WebCat Admin is the second important part of WebCat. It runs on (maybe different) web server to manage the WebCat Core. Admin has two types of users. Managers of Core who control the bandwidth usage of crawler, add/remove users, activate/deactivate the crawlers and start/stop the classifiers and crawlers. Classifier users are the otherusers who log into the Admin and then act as a human classifier to help the system move some sites from grey lists into predefined categories. For non-technical users, WebCat Refiner which is the third major component, creates some predesigned web pages.