Best Practices - Host Detection List

Some background

When API calls are done to pull large sets of data, the backend will process data by streaming that information in batches to ensure data integrity and preventing overloading the backend services. That means that there will be brief periods of speeds declining while the next batch is being retrieved and processed to stream back to the client. However, the overall speed averages itself out in the long run.

You also need to keep in mind the contributing factors that could impact performance on a shared resource. Such as performing data pulls during peak usage, which will hit congestion and speeds will not be as fast as those conducted during off peak hours. There are also additional factors from the use of optional parameters used in API calls that do extra processing before streaming the data, active_kernels_only being an example.

We have been, and will continue to innovate and re-architect the capabilities of processing large amount of encrypted data for streaming through API to scale to our customers needs. While being able to provide customers with all of their Vulnerability information as quickly as possible is a primary focal point, it should be innovated in such a way that keeps data integrity in the forefront of every release. To do this, it takes time, effort, and dedicated resources to ensure full testing is done to account for all aspects. With that in mind, the use of automation, threading, and parallelism are techniques to that can assist with increasing performance with data pulls.

Multi-Threading

While fetching host information in an automated fashion, you can make use of multi-threading to collect data in batch sizes for optimum performance.

Maximum benefit has seen when the batch size is set evenly throughout the number of parallel threads used. For example, a host detection call resulting in a return of 100k assets, and using 10 threads in parallel, would benefit the most by using a batch size of (100,000 / 10) = 10,000. To reduce having one thread slow down the entire process by hitting a congested server, you can break this out further into batches of 5,000 hosts, resulting in 20 output files.

Looking for help? Check our examples here

Qualys API - Host List Detection API samples - Multithreading (GitHub)