Home

Web Crawling and Exclude Lists

Tell me about web crawling

The web crawler automatically follows links it encounters in web page HTML. The web crawler makes requests to the web server just like the web application makes itself when users take certain actions. For example, when the crawler encounters a button in a form, the crawler submits a request for the button URI to the web server. These actions made by the web crawler could cause an undesirable effect. ExamplesExamples

1) If the button is designed to "delete all" data, like accounts and configurations, then all data would be deleted.

2) In an administrator web application, an administrator may click a button to change the authentication type for the subscription account, changing the authentication behavior for all users of the web application.

Tell me about Global Settings

Many times our customers have a desire to exclude sensitive resources that they do not want scan for any purpose. Sometimes these exclusions are based on URL data, and occasionally by IP address. Now you can define global exclusions across the whole subscription (all web applications), as well as exclude testing based on IP address.

What about exclude/allow lists?

It is best practice to define exclude/allow lists to ensure that parts of the web application will not be scanned/will be scanned. You can also add comments to aid users on why specific excludelists or allowlists entries were created.

- Setup an exclude list to identify the URLs you do not want our service to scan. Any link that matches an exclude list entry will not be scanned unless it also matches an allow list entry.

- Setup an allow list to identify the URLs you want to be sure that our service will scan. When you setup an allow list only (no exclude list), no links will be crawled unless they match an allow list entry.

What about POST exclude black lists?

It is a good practice to define POST data lists to ensure blocking of form submission for POST requests in your web application as this could have unwanted side effects like mass emailing.

- Setup a POST data exclude list of entries to identify POST requests with body you want to block from form submission. Our service blocks form submission for any POST request with data that matches the specified entry and does not submit the blocked POST data (for example, form fields) during all scan phases. ExampleExample

Consider the following POST request for a web application:

POST /bodgeit/contact.jsp HTTP/1.1
Host: www.qualys.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/50.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8
Accept-Language: en-US,en;q=0.5
Referer: http://www.qualys.com/bodgeit/contact.jsp
Cookie: JSESSIONID=35D77CBC6EC093213D4840DA053A2BFF
Content-Length: 78
Content-Type: application/x-www-form-urlencoded
Connection: close

user=johndoe&yourname=John&comments=hello%20there&anticsrf=0.19000114507535215

where,

body of the request is user=johndoe&yourname=John&comments=hello%20there&anticsrf=0.19000114507535215

To prevent the scanner from sending this particular request, you need to specify something in the POST Data Exclude List that is unique to this request body. In this case, you could enter "yourname". You may not want to enter "anticsrf", as it would very likely be in all POST requests for this web application.

What about logout regular expression?

It is good practice to define logout regular expression to ensure that the logout links of your web application will not be scanned.

- Setup the logout regular expressions to identify the logout links you want to exclude form scanning. Any link that matches the logout regular expression entry will not be scanned.

What about parameters?

It is a good practice to define parameters to ensure that the parameters will be excluded from testing to improve a scan’s efficiency and effectiveness. Exclusions can be defined for URL parameters, request body parameters, or cookies.