Gigablast favicon

Gigablast

Gigablast is a powerful, opensource, new search engine that does real-time indexing!

Features Scalable to thousands of servers. Has scaled to over 12 billion web pages on over 200 servers. A dual quad core, with 32GB ram, and two 160GB Intel SSDs, running 8 Gigablast instances, can do about 8 qps (queries per second) on an index of 10 million pages. Drives will be close to maximum storage capacity. Doubling index size will more or less halve qps rate. (Performance metrics can be made about ten times faster but I have not got around to it yet. Drive space usage will probably remain about the same because it is already pretty efficient.) 1 million web pages requires 28.6GB of drive space. That includes the index, meta information and the compressed HTML of all the web pages. Spider rate is around 1 page per second per core. So a dual quad core can spider and index 8 pages per second which is 691,200 pages per day. 4GB of RAM required per Gigablast instance. (instance = process) Live demo at http://www.gigablast.com/ Written in C/C++ for optimal performance. Over 500,000 lines of C/C++. 100% custom. A single binary. The web server, database and everything else is all contained in this source code in a highly efficient manner. Makes administration and troubleshooting easier. Reliable. Has been tested in live production since 2002 on billions of queries on an index of over 12 billion unique web pages, 24 billion mirrored. Super fast and efficient. One of a small handful of search engines that have hit such big numbers. The only open source search engine that has. Supports all languages. Can give results in specified languages a boost over others at query time. Uses UTF-8 representation internally. Track record. Has been used by many clients. Has been successfully used in distributed enterprise software. Cached web pages with query term highlighting.