OpenVAS 9 Tips for Large Environments Introduction The OpenVAS vulnerability scanner ( https://www.openvas.org ) has a great UI, an up-to-date library of high-quality tests, and permissive licensing. You can purchase a turn-key appliance from Greenbone Networks ( https://www.greenbone.net ). It is popular among penetration testers and is easy to set up on Kali Linux. Deploying a multi-user OpenVAS system for a large enterprise network presents performance challenges, both when scanning and reporting. These are some lessons learned from scaling OpenVAS to scan 10,000 nodes. OpenVAS Structure, Architecture, and Installation To understand the scaling behavior of OpenVAS, first we need to review the structure of the system. There are 3 primary daemons: openvassd, gvmd, and gsad. openvassd implements the Nessus Attack Scripting Language to probe the targets. openvasmd (called gvmd in OpenVAS 9 and earlier) orchestrates openvassd, defining scan tasks to run and collecting the results. gsad is a light-weight web front-end to gvmd's XML API, making it easy to click through target definitions and reports. For large workloads and segmented networks, OpenVAS supports master-slave operation, with one (or more!) master instance(s) requesting scans from many scanner instances. This can be accomplished with remote openvassd daemons directly, or by communication between a master and scanner gvmd. The communication between openvassd and gvmd is chatty and has no native authentication, so it's better to use gvmd->gvmd architecture in a distributed deployment. Master servers should run all 3 daemons, but scanner servers need only run openvassd and gvmd. OpenVAS is fully multi-tenant. Any number of users can be administrators, who have full control over the features of the software but cannot see or change the data of other administrators. To share data between users, administrators must create permission records that authorize the data sharing. This is also how lower-privileged users gain access to data. Depending on your use case, this security model may be perfect, or it may be a huge administrative hassle. It also has serious performance implications. More on this later. There are no official binary packages of OpenVAS, and in my opinion, production users should compile from source code. OpenVAS is a complicated system, and compiling from source will ensure you understand the system well enough to troubleshoot it when problems arise. Operators will often need to use strace or gdb to determine the nature of a problem, and compiling from source will give good context for those investigations. Reporting Performance Reporting is the first place you'll struggle with OpenVAS performance in a large deployment. Each scan will produce 10 - 1000 line item results per host, which need to be joined against several other SQL tables to produce a report. This means that SQL performance is the bottleneck that controls the user experience when clicking through the gsad web interface. First, you certainly should run a distributed master-scanner architecture, even if you only need one scanner node. Scanning and reporting are both highly CPU dependent, and distributed architecture will protect your reporting performance from the load of scanning. While the OpenVAS project recommends PostgreSQL for production deployments, experience shows that SQLite is necessary for the best performance in a large environment. This is because gvmd issues hundreds of thousands of SQL queries while displaying a large report, and it does so completely serially. Therefore, SQL query latency is the dominating factor in reporting performance. PostgreSQL is well known for scaling well under heavy concurrent load, but gvmd never creates a load like that. SQLite has lower latency at these light loads, and therefore performs significantly better than PostgreSQL for OpenVAS. Single-core performance of the underlying hardware is the next most powerful influence on reporting performance. gvmd uses only one thread per request when calculating reports, so high core counts are unnecessary. Compared to an ordinary Xeon E5-2697v3 VMware ESXi guest VM, bare metal Linux on a 5.0GHz overclocked i7-8700k system ran OpenVAS reporting workflows approximately twice as fast. Fast hardware is a must for the manager server of a large OpenVAS deployment. SQLite compiler options can sometimes have a strong impact on the reporting performance of OpenVAS. When running OpenVAS on ESXi, switching from the stock Debian SQLite to upstream SQLite compiled with the CFLAGS from Clear Linux gave a massive 5x performance increase. Surprisingly, on a 5.0GHz i7-8700k system, the CFLAGS used to compile SQLite had virtually no effect on OpenVAS performance. The systemd unit file option "Environment" can be used to force-load an optimized SQLite via LD_PRELOAD for testing different CFLAGS settings to determine what's best in a given environment. The Clear Linux CFLAGS can be found on benchmark blogs from organizations like Phoronix. OpenVAS's permissions model is also a performance drag in addition to an administrative overhead. In a large environment with many OpenVAS users, it can be beneficial to modify the permissions model to avoid that overhead. A relatively simple patch can modify the OpenVAS security model so that all users can read everything without explicit permissions. This reduces the number of SQL queries significantly for about a 2x performance improvement. At the end of a scan is a wrap-up phase that takes place on the controlling manager server; all the results are reviewed and summarized into other parts of the database. This causes a huge read IO load, so ensure the manager server has enough RAM to cache the entire SQLite database; 8GB is probably sufficient. It eventually causes a huge write IO load, which is many sequential writes with queue depth of 1. Certainly it is important to have the manager server on SSD storage, and it's likely that this phase would be improved by running on Intel Optane or Samsung Z-SSD latency-optimized disk. Scanning Architecture The first thing to note about scanning is that reporting performance needs drive the best designs for scanning workflow. There are several ways to view reports in OpenVAS, but the quickest is to view the scan history for a given task. Therefore, it is important to structure the scan tasks according to anticipated reporting needs. For example, organizing scan tasks according to business unit and system criticality will steer users toward using the higher-performance workflows in OpenVAS. Another way that scan task organization can help performance is by keeping scan tasks small enough. Smaller scan tasks finish sooner and produce fewer results with then feel more responsive when reading the reports. When a task's reports start to get near 50,000 line items, it's a good idea to split that task into two or more new smaller tasks. Have enough scanner nodes. To avoid bloating firewall logs, one scanner node per network zone is a good start. Then add additional nodes to share the load when a given zone's scanner gets too busy; VMs are just fine for scanner nodes. Scan Performance The port-scanning phase of a vulnerability scan is dominated by the nmap options used by that task. These can be configured by duplicating the default "Full and Fast" scan type and altering the configuration parameters. Tuning nmap scan performance is a balancing act between accuracy and performance that is discussed in many places. The nmap book's performance chapter is a great place to start. OpenVAS supports "network-level scanning" but that feature is poorly supported, reduces scan accuracy, and does not improve performance; stick with the default one-host-at-a-time port scanning, and tune the nmap parameters as appropriate. The testing phase of a vulnerability scan is dominated by the amount of parallelism configured in the task options. With enough target hosts available, openvassd will spawn out up to host_limit x test_limit jobs. The maximum practical parallelism is eventually limited by the single-threaded redis server used to store intermediate results. That limit is about 16 cores in a scanner node, and a practical maximum parallelism is about 16 hosts and 4 tests per host. Most NASL tests are not CPU-intensive, so there is a benefit to running parallelism higher than scanner CPU count. A small openvassd patch can improve scanner CPU utilization by loosening tight retry spin loops during connection attempts; openvassd calls poll with a zero timeout in many places; updating that timeout to 2ms allows the process to sleep, wasting less CPU time. Future Work Reporting performance can be further improved by using a higher-performance SQLite replacement such as LiteTree. Initial testing showed 1.5x to 2x performance improvement compared to standard SQLite on a 5.0GHz i7-8700k system. Further testing is needed to validate the reliability and long-term performance of LiteTree. Since the maximum effective parallelism of a scan task is limited by the single-threaded nature of redis, improving redis performance may allow more parallelism. Bare metal systems with high clock speeds and memory frequencies can have redis benchmarks double or more an ordinary vitalized server. Further testing is needed to determine whether that translates into the ability to utilize more than 16 cores in an OpenVAS scanner node. Further testing is needed to determine which hardware specifications have the most effect on SQLite and redis performance for OpenVAS. High clock speed, high memory frequency, and tight memory timings all usually come in the same package, but detailed testing could reveal which of those settings is most important and allow further tuning options. Database maintenance can affect reporting performance; often a gvmd job will be seen consuming a full CPU core running queries like "delete from report_counts where ..." that never seems to finish. report_counts is a cache that improves performance of the dashboard display, so it is not critical to preen the data so carefully. A small operational improvement can be made by scheduling a cron job that does "echo 'delete from report_counts;' | sqlite3 /path/to/tasks.db" to take advantage of SQLite's optimization for delete queries without where clauses. Further improvement may be available by removing the "where" clause in the openvasmd code, to simply truncate the table every time. openvassd 6 ( part of OpenVAS 10 ) does not seem to have the poll() problem, perhaps because of the refactoring that took place. gsad 8 ( part of OpenVAS 10 ) is totally rewritten in ReactJS. This shifts much of the reporting burden onto the client browser, which may make it harder to get good responsiveness on large result sets. More testing is needed. gvmd 9 ( currently in development) has dropped SQLite support. Without extensive database improvements, this seems that it will cause serious performance issues. More testing is needed.