free-articles-zone.com

תפריט Free Articles

Free Articles Authors

Publishers Zone

מאמרים
Free Articles


Free Articles DB search

WebNavigator Tutorial: create a search script to find jobs and cheap hardware in 4 steps


Category: Internet and Online Businesses  >>  RSS

By simone colombara   [ 24/08/2005 ]
 | [ viewed 1124 times ] Article word count: 610  

Publishing Free Articles Zone articles is subject to our Publisher's Terms Of Service

 Add to Favorites
 Email to a friend
 Publish this Article
 Print this article
 Article direct link
 email Article Author
 Report this article
                                                                                         

Tutorial: create a search script in 4 steps.

WebNavigator helps creating search agents to retrieve dynamic data from the Web usually not accessible from Google. Thanks to a scheduling engine, it retrieves results and notifies update via mail, html or RSS.
WebNavigator is a free open source project.


Creating a search script isn't an easy task. And it is also very dependent on the site structure you are about to query. The following pages give some ideas on how to proceed.

1: set up the navigation
Define the script name and the commands to be performed on the site to navigate to the results.

<?xml version=" 1.0" encoding=" UTF-8"?>
< webscript name='demo shop sites'>
< gatherdata>
< webcommands name="kelkoo">
setUp value=" http://www.kelkoo.co.uk"/>
beginAt value="/"/>
< setFormElement name=" siteSearchQuery" value=" toilet roll"/>
< submit/>

To find the name of input field, open the page in an HTML editor (we use Mozilla) and click on the INPUT field.

2: Define the result area
First of all visually identify the result area. Then, use an HTML editor to find the exact starting and ending string that defines the result area. Use a regular expression to select the whole area.

Use the group expression (.)* to represent the result area.
< result_selectRegEx>
<![CDATA[
< div class=" mod_std_sub" >(.)*< div id=" pages" class=" pageDiv">
]]>
</result_selectRegEx>

To check the regular expression you typed, use a regex editor (we use the QuickREx eclipse plugin).
Verify that the highlight area is what you expected.


3: Define the result data structure

This step can be tricky.

If the results are included in tables, or in rows of a table, use the corresponding

< result_define_data_structure_as_tables/>

This is the case, but to make an example let's say that we want to define results data structure as regular expression.

< result_define_data_structure_as_regex>
<![CDATA[
< div class=" width">\s*
]]>
</result_define_data_structure_as_regex>

This way, we tells to the parsing engine that every result is defined from the matching < div class=" width">\s* to the next matching < div class=" width" > \s* (we call this a start/end strategy).

It is also possible to define results as group expression (we call this group strategy). Note that it is not easy to balance the tags in HTML using Regex expressions.


4: Upload and run the script

Use the script management menu to upload the script, run it and examine both the whole results and the detailed data parsed.

The whole script that we have created is:

<?xml version=" 1.0" encoding=" UTF-8"?>
< webscript name='demo shop sites'>
< gatherdata>
< webcommands name=" kelkoo">
setUp value=" http://www.kelkoo.co.uk"/>
beginAt value="/"/>
< setFormElement name=" siteSearchQuery" value=" toilet roll"/>
< submit/>
< result_selectRegEx>
<![CDATA[
< div class=" mod_std_sub">(.)*< div id=" pages" class=" pageDiv">
]]>
</result_selectRegEx>
< result_define_data_structure_as_tables/>
< result_setIfNew/>
</webcommands>
</gatherdata>
</webscript>

More information on WebNavigator are available on the project site.

About the author:
WebNavigator web site
http://webnavigator.sourceforge.net
Author Simone Colombara href="mailto:scolombara@users.sourceforge.netscolombara@users.sourceforge.net">">scolombara@users.sourceforge.net


Article Source: http://www.Free-Articles-Zone.com


Article tags: webnavigator, dynamic search engine, java
 

     Recent articles about RSS

     Most popular articles about RSS

     More articles by simone colombara

Recent article RSS  |  Business | Finance | Computers and Technology | Arts and Entertainment | Internet and Online Businesses | Health and Fitness | Self improvement | Sports and Recreation | Education and Reference | Fashion | Automotive | Legal | Home and Family | Travel | Food and Drink | News and Society | Shopping and Product Reviews | Communications | Insurance | Real Estate | Home Improvement | Pets | Cancer |
© 2008 All Rights Reserved. Free Articles | online marketing
Israel Travel | Israel Spa