Professional Documents
Culture Documents
Wrapper Learning Algorithm
Wrapper Learning Algorithm
usage data.
mining, extraction and integration of useful data, information and knowledge from Web page contents.
Wrapper- A program for extracting structured data
Sample
Extract Phone number from the ff. HTML code.
Name: Joels <p> Phone: <i> (310) 777-1111 </i><p>
R1: SkipTo(i) This rule means that the system should start from the beginning of the page and skip all the tokens until it sees the first <i> tag. <i> is a landmark.
Similarly, to identify the end of the text to be extracted, we can use: R2: SkipTo(</i>) R1 is called the start rule and R2 is called the end rule.