Working of Search Engine
Indexer : The indexer which is coded in the indexer.php file does the function of Indexing. It has mainly two functions: 1. It parses the html and retrieves all the information from the page and about the page. 2. It stores this information in the database. The required html files that have to be parsed, are stored in a particular directory. This directory is then placed in the folder htdocs in Apache. We suppose the name of the directory is ‘abc’. The indexer page when run in the browser shows certain entries. Here the first field refers to the location of the directory, which contains the pages to be parsed i.e. in our case the ‘abc’ directory. So the location of the directory is ‘http://localhost/abc’ which is entered. Leave all the other fields blank and tick on the ‘Indexer should find files via directory scan’ & ‘Show list of indexed files’. Now click on Start Indexing to start the process. On completion of Indexing, It displays the page title, page url & the no. of words parsed per page. Working of Indexer : First it takes the directory name & location the user has input in the directory field. It checks the directory for available html files stored in it. Each page (file) is parsed and sentences in the content of the page are converted into tokens for storing in an specific text array. For each file the page url, page title, size of page, indexed words, last index date etc. are stored in an page array & an unique id is assigned to it.
On running the search. Then the query is searched in the database tsep_search in the indexed_words field for each id i. If no results are found a message ‘No Pages Found’ will be displayed. In order to search using the search engine. The results displayed are divided into two parts: 1. Firstly proper formatting of the query is done i.e. punctuations etc. If any stopwords are found in the query. Next the query is compared with the stopwords in the database tsep_stopwords. Search : The file used for searching in the database is search.php . Based on the no. the query is saved in a variable. are removed from the query.e.This information of the page is then entered into the tsep_search database created in MySql in the tsep database. you will be presented with a search field and a search button. Next it checks the directory for the next html file & carries out the same operation specified above on each file till all files are indexed.php file has to be run. Based on the search preference and user history. Enter the query you want to search in the search field and press the search button. symbols. then such a message is displayed to the user accordingly. the search. 2. The time taken for searching will be shown above and below the results. for each page. of times the query appeared on the particular page.
.php. slashes. if any are found. You will be presented with the searched results. Working of Search : When the query is inserted and searched button is clicked.
It is then checked whether the user clicked on the any search results for that query. Thus the typeoflog field shows whether logstring is a query or a link. The fields in the log are idlog. If a result is clicked. by the same IP Address. then such results are shown to the user indicating that he clicked on these results previously for the same query.e. it is sent to the log. The pages are arranged according to the number of times the query appears in the page. logstring. Whenever a user searches for a query or clicks on a result. Thus pages are arranged in such a manner. of times is shown first. then the typeoflog is set to 2.Suppose that the searched query is ‘computer’. History is checked by comparing with the logs table in the database. That means the page in which ‘computer’ appears the maximum no. The typeoflog is set to 1 if query was entered.
. Search Results using User Preference: Another type of search results which are shown. If the above conditions are found to be true. typeoflog and urlid. are on the basis of user history. IP Address. time of entry. Here it is checked whether the given query was entered before by same user i. Resolved IP.php for storing in the database. Logging : A separate database called tsep_logs is created for storing the logs. This logs database is used while calculating the user history for preference based searching.