Splunk on VMware and EMC ScaleIO – A quick index performance test

The last few weeks I’ve been getting acquainted with Splunk, a powerful tool for searching, analysing and visualising logs and events that happen in your infrastructure, live application performance and any type of machine generated data. I read the performance blog post that Splunk had previously done on physical bare-metal hardware and Amazon ECS instances, and wanted to se what I could get in a virtual environment on top of EMC’s scale-out block storage ScaleIO (which I’ve written several posts on here).

Generally speaking, virtualising Splunk has been frowned upon as Splunk consumes a lot of resources, more and more as you add more data ingestion and more searches. Physical bare-metal servers have been the de facto standard for Splunk servers for years, but I still wanted to see what we could do with virtual instances of it. Here’s the setup:

4 Splunk 6.0 servers, configured in a VMware environment with 12 vCPUs and 12 GB RAM as is recommended in the Splunk Enterprise installation guide.
Each Splunk server has a ScaleIO volume attached to it for the entire /opt/splunk directory, containing the Splunk installation and all log and index files.
These ScaleIO volumes are running on top of EMC’s XtremSF PCIe Flash cards.

For the tests I used a standard tool for performance testing of Splunk, namely Splunkit. This tool can be used for generating a large log file, which can then be tested and indexed by Splunk itself.

To configure Splunkit like I did, edit the file called “pyro.properties” like this:

### SPLUNKIT PROPERTIES ###
# SPLUNK_HOME, the absolute path to the Splunk installation on this machine,
# e.g: on Linux: /home/user/splunk, usually ending in "/splunk"
# e.g. on Windows: C:\Program Files\Splunk

SPLUNK_HOME = /opt/splunk

# Host or IP of the server machine (this machine), as seen by the SplunkIt search user
# Server test process will bind to this address
# User's server_host (defined in splunkit-user/pyro.properties) must match this for proper test operation
# If left blank, will default to this machine's hostname

server_host = 127.0.0.1

# Admin-level login credentials of the Splunk instance
username = admin
password = yourpasswordhere

static_filesize_gb = 150

Then, create the log file by running the following command in the splunkit-server directory:

python bin/gendata.py

When the data has been generated, start the index test by running this command in the same directory:

python bin/indextest.py

Now login to your Splunk instance, and go to the Splunk-on-Splunk tab, and you should see something like this:

Image may be NSFW.
Clik here to view.

That graph will show you the current estimated indexing rate, which is always interesting (this one shows close to 30000KB/sec). But if you want to compare your indexing performance to other benchmarks, you can click the “View results” link to get to another search, and enter the following search term:

index=_internal host=“localhost.localdomain” source=“*metrics.log” eps=“*” group=per_index_thruput series=splunkit_idxtest

This will give you a view of your current “eps”, events per second, which you can then compare to other benchmarks like the ones I mentioned in the beginning of this post.

So what eps values did I get out of my virtualised Splunk Enterprise environment? Pretty good ones I must say. And note that this is on a ScaleIO shared scale-out block storage, not individual independent local drives in each server. Also, it’s one volume per server, not a striped volume across multiple virtual drives. So no LVMs or anything like that, and regular ext4 filesystems without any tuning. Your basic server setup so to say Image may be NSFW.
Clik here to view.

System	Splunk Version	Virtual Hardware	Average EPS
Splunk-Index1	6.0	12 vCPUs, 12 GB RAM	86931 eps
Splunk-Index2	6.0	12 vCPUs, 12 GB RAM	90242 eps
Splunk-Index3	6.0	12 vCPUs, 12 GB RAM	87199 eps
Splunk-Index4	6.0	12 vCPUs, 12 GB RAM	92792 eps

So as you can see, we’re surpassing the performance numbers of the tests mentioned before, which is great! However, it will be even more interesting when we continue to do massive log input and then add searches on top, to see if we can maintain performance or not. And according to the performance number we get from the ScaleIO environment (see below), we’re nowhere near saturated on disk right now, which hopefully means that we can squeeze out the searches without a heavy impact on the indexing performance.

Image may be NSFW.
Clik here to view.