Building a search engine
By Rayed
Currently I am working on a search engine for a large website, the search engine is based on PHP+MySQL.
You may wonder why not use LIKE in SQL for searching! Of course LIKE will work with few documents, but how about tens of thousand of documents it would be very slow, since the DBMS would have top scan all reconrds one by one in your database to find the word you are looking for.
I decided to work on my own search engine after testing few applications, mainly mnogosearch, and PHPDig, both program didn’t fit my need exactly, plus the support for Arabic wasn’t that good either.
The engine should index 45000 document, currently the indexer can index 70 documents per second, which mean it would take 10 hours to index the whole thing.
The search engine needs a lot of tuning, like removing common words, removing Arabic Harkat, and Hamzat, hopefully I will end up with a nice reusable search engine.
I’ll post a link when we put in production.