Class JsoupBasedHtmlParser
java.lang.Object
org.apache.jmeter.protocol.http.parser.BaseParser
org.apache.jmeter.protocol.http.parser.HTMLParser
org.apache.jmeter.protocol.http.parser.JsoupBasedHtmlParser
- All Implemented Interfaces:
LinkExtractorParser
Parser based on JSOUP
- Since:
- 2.10
TODO Factor out common code between
LagartoBasedHtmlParser
and this one (adapter pattern)
-
Field Summary
Fields inherited from class org.apache.jmeter.protocol.http.parser.HTMLParser
ATT_ARCHIVE, ATT_BACKGROUND, ATT_CODE, ATT_CODEBASE, ATT_DATA, ATT_HREF, ATT_IS_IMAGE, ATT_REL, ATT_SRC, ATT_STYLE, ATT_TYPE, DEFAULT_PARSER, ICON, IE_UA, IE_UA_PATTERN, PARSER_CLASSNAME, PRELOAD, SHORTCUT_ICON, STYLESHEET, TAG_APPLET, TAG_BASE, TAG_BGSOUND, TAG_BODY, TAG_EMBED, TAG_FRAME, TAG_IFRAME, TAG_IMAGE, TAG_INPUT, TAG_LINK, TAG_OBJECT, TAG_SCRIPT
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptiongetEmbeddedResourceURLs
(String userAgent, byte[] html, URL baseUrl, URLCollection coll, String encoding) Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...Methods inherited from class org.apache.jmeter.protocol.http.parser.HTMLParser
extractIEVersion, getEmbeddedResourceURLs, getEmbeddedResourceURLs, isEnableConditionalComments, normalizeUrlValue
Methods inherited from class org.apache.jmeter.protocol.http.parser.BaseParser
getParser, isReusable
-
Constructor Details
-
JsoupBasedHtmlParser
public JsoupBasedHtmlParser()
-
-
Method Details
-
getEmbeddedResourceURLs
public Iterator<URL> getEmbeddedResourceURLs(String userAgent, byte[] html, URL baseUrl, URLCollection coll, String encoding) throws HTMLParseException Description copied from class:HTMLParser
Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...All URLs should be added to the Collection.
Malformed URLs can be reported to the caller by having the Iterator return the corresponding RL String. Overall problems parsing the html should be reported by throwing an HTMLParseException.
N.B. The Iterator returns URLs, but the Collection will contain objects of class URLString.
- Specified by:
getEmbeddedResourceURLs
in classHTMLParser
- Parameters:
userAgent
- User Agenthtml
- HTML codebaseUrl
- Base URL from which the HTML code was obtainedcoll
- URLCollectionencoding
- Charset- Returns:
- an Iterator for the resource URLs
- Throws:
HTMLParseException
- when parsing thehtml
fails
-