security talk & philosophizing



QA Lab Monitoring with ELK

ELK

I got into the ELK (Elasticsearch, Logstash, Kibana) stack when I started working with an IDS called Suricata. I needed a front end to discern the data events happening on the network and ELK was a natural fit for Suricata. 

ELK provides a way of shipping and analyzing data from various sources. Data events can be logs, system stats (memory, cpu), application data (NGINX, Apache, Suricata, HAPROXY), database statistics (MYSQL slow log, rate of SQL commands, POOL efficiency, etc). 

QA Lab Monitoring

I went on to create many dashboards using the ELK stack, including monitors for my QA lab. 

I’ve replicated the process at home using a Debian box/server to run various simulated environments to be monitored. Currently, on my local network at home, I’m monitoring a VM with a MYSQL installation that powers a WordPress install. I’m also monitoring the CPU, Memory, and Diskspace of each member of the home lab. 

Example Dashboards of the Home lab

QA Lab

For my home lab I used a Debian box/server to setup several VMs. One VM runs as the ELK server. The ELK server will be the central repository of all the data streams. The other VMs are used to run simple apps, simulating QA machines. The non-ELK VMs are monitored with Metricbeat and Filebeat. These agents monitor specified logs or data elements and send those data chunks to the ELK server, which processes the data into visualizations. 

Beats

Elastic has several different types of monitoring agents (i.e. Beats). For a full and complete list, I would refer to Elastic’s official pages. I have experience with Metricbeat, Filebeat and Packetbeat. 

I don’t use Packetbeat, as it tended to have a heavy drawdown in my environments, however it can produce some amazing results while it operates as a packet sniffer. It can sniff sip traffic, as well as sql queries.

Metricbeat monitors system information. By default (without installing modules), it monitors information you might get with top or df – system level information. It captures the moment by moment changes of CPU, Memory (free, used, etc.), Diskspace (free, used), and processes running. 

Filebeat is a log parser/shipper. Identifying a log path, Filebeat will capture new entries and ship them to ELK for processing. By default it monitors /var/log/* but can be configured to log and parse log data of different applications.

Modules

Each Beat has a set of modules that can be installed. These modules not only offer application configuration, but also ship dashboards to Kibana. For example, the MYSQL module in Filebeat can be configured to monitor the slow.log (noting long queries). Metricbeat also has modules, and the MYSQL module of Metricbeat has data on the database itself (POOL and connection data, amount of INSERTS, DELETES, etc.)

Modules are installed with the format: sudo [beat] modules enable [module name] like sudo metricbeat modules enable apache

For a full list of modules per agent, consider the official docs:

Installing a Beat

Documentation on installing beats can be found at Elastic’s main site.  Take this documentation on installing Metricbeat as an example. Once Metric beat is installed (using the steps off their official docs), the metricbeat.yml file needs to be edited. 

Briefly speaking: 

  • the section on the Kibana server needs to be uncommented and updated to reflect the ELK server IP address. 
  • the section on the ELK server needs to be uncommented and updated to reflect the ELK server IP, as well as the username/password for the elastic user.
  • If using ELK 8.* an SSL may be required, and that self-signed cert will need to be copied to each monitored machine, referenced in the metricbeat.yml file as well. 

Once the metricbeat.yml file is updated, running the command: 

sudo metricbeat setup -e 

will validate the changes and attempt a sample connection. If all works out the Metricbeat service can be started.

Validating the Beat

One easy way of validating a beat like Metricbeat is to login to the ELK frontend as the super user generated during install and validating the feed. Once logged in, the Discovery page (found under the left nav). 

The discovery page should show data flowing through. This data can be filtered down to a specific machine. This is handled by adding a filter (the + filter) and choosing to filter by: agent.hostname Then click “is” as the second filter value, via the dropdown. This will load a 3rd dropdown field. Clicking on it will show all available machines (by their hostname value) sending data to the ELK server. Selecting a value, will filter on its data in the Discovery pane. 

If no data is present, going back to the machine and checking the status of the beat, or re-running: [beat name] setup -e should indicate what issues might be blocking data transmission. 

Creating the QA Dashboard

While many modules and beats provide default dashboards, it may be useful to create a tailored dashboard. I use one that monitors CPU and Memory of each home network VM, including the top processes. 

To create a blank Dashboard: From the left menu, clicking Dashboard and then “Create Dashboard” will open a blank slate. 

Then click “Create Visualization” – this will present the scene below:

Metricbeat is set as the beat (left side). Valid fields are listed under the beat itself. If data is streaming to ELK, clicking on a field will show a popup with sample data.

CPU Monitor

With Metricbeat the CPU usage can get pushed to ELK. Picking a field like system.cpu.system.norm.pct might be a good candidate. If data shows up for it on the popover, dragging it to the center section will automatically load the CPU data. If the timeline is on the X axis the result will look like:

Dragging the agent.hostname field to the visualization will add a right side filter for every machine being monitored and will show their respective line graphs.

Processes by CPU

Using a field like system.process.cpu.total.pct in a stacked bar graph will provide information on what processes are consuming most CPU. This will be an aggregate display of all machines in the network, but if one filter on a specific machine, it will show its respective processes.

Metricbeat can be used to similarly plot Free or Used Memory, Diskspace and processes in memory. 

Installing modules for a webserver can provide even more interesting details. The Metricbeat MYSQL module will report on data pertaining to the Server Health. 

When the command: setup -e is run after installing a module, it uploads various dashboards. Below is an example of various pre-defined dashboards:

A slow log monitor for a database would be useful to spot poor performing queries and map them to various tests run in QA. 

There’s so much that can be monitored in ELK using their Beat agents – from sip to web, to application logs. It’s a great harmony with QA automation, and validating NGINX errors, Apache performance and server metrics while under load or test.