View Full Version : how to extract the anchor text
roger nie
10-29-2009, 02:59 AM
these days ,i'm working at refining the content of a html document.I extract all the hyperlinks and the corresponding anchor text in a html document.But while extracting the anchor text ,i just use the character matching technology,such as IndexOf (),...but it takes long time.
Do you have other methods to extrct the anchor text of an hyperlink,and tell me .Thanks!
rberinger
10-29-2009, 12:24 PM
these days ,i'm working at refining the content of a html document.I extract all the hyperlinks and the corresponding anchor text in a html document.But while extracting the anchor text ,i just use the character matching technology,such as IndexOf (),...but it takes long time.
Do you have other methods to extrct the anchor text of an hyperlink,and tell me .Thanks!
There are several ways of doing this efficently (jQuery, XPath from DekiScript) if you provide a more specific "road map" of what your trying to accomplish I'm sure we can help. What are you doing with the text once you extract it?
I guess he's referring to SGMLReader.
roger nie
11-04-2009, 06:36 AM
There are several ways of doing this efficently (jQuery, XPath from DekiScript) if you provide a more specific "road map" of what your trying to accomplish I'm sure we can help. What are you doing with the text once you extract it?
i'm working at a meta mobile search engine ,i've constructed the search engine,and it works.But while used in mobile phones,the web page is too big ,so i want to extract the main content of a web page for the search result.
bye the way,the engine is constructed in ASP.NET
roger nie
11-04-2009, 06:38 AM
I guess he's referring to SGMLReader.
I've used SGMLReader for formating the html documents.but when i come to the next step,new question appears.
Powered by vBulletin™ Version 4.1.3 Copyright © 2013 vBulletin Solutions, Inc. All rights reserved.