code spelunking

Q: 'Whats under this rock?' A: 'more rocks...'

Leveraging Systemd Cgroup Integration to Provide SLAs on Fedora 18 & 19

| Comments

Background

In the not-so-distant past, enterprise data centers would create silos for specific services to gaurentee some metric of performance, or “Service Level Agreement” (SLA). However, this approach can be costly to create and maintain.

Enter the modern era of cloud computing, and one might wonder, “Why not just put it in VM?”. For some use cases this might work just fine, because the metrics are “good-enough”. Despite this flexibility, there are many cases where this approach simply won’t meet some measure of performance. I won’t elaborate on the details, but it doesn’t take Schrödinger math to figure this out, because sometimes the cat is dead even before you peak into the box. ;-)

Thus, in this post we will explore leveraging systemd cgroup integration to provide SLAs on Fedora.


References


Prerequisites

  • Fedora 18 or 19 box(es).
  • Make certain you’ve read the references, as I may gloss over some details in this post.

Getting Started

First you will need to choose a service which has been integrated with systemd that you can plan on tuning. In this example I will use ‘condor’, but you could use any service that you desire.

 sudo yum install condor

NOTE: You could do this with raw cgroups, but it becomes difficult to gaurentee performance unless every service is in a group. So systemd does a lot of the heavy lifting for us.

Next you will need to determine the metrics of performance that you want to provide for that service. For the purposes of simplicity, lets say we want to carve off 50% of the CPU for condor. You can also play with disk-io-bandwidth and network settings too, but I think I will leave that for another post as this can be complicated enough.

In order to divide up your machine you will first need to determine the existing shares on your machine. This can be done by dumping the current cgroup settings to a file which can then be analyzed to determine the new settings.

cgsnapshot -s > cgroup_snap.conf 

If you have a fairly basic setup you will notice the following pattern

# Configuration file generated by cgsnapshot
mount {
    cpuset = /sys/fs/cgroup/cpuset;
    cpu = /sys/fs/cgroup/cpu,cpuacct;
    cpuacct = /sys/fs/cgroup/cpu,cpuacct;
    memory = /sys/fs/cgroup/memory;
    devices = /sys/fs/cgroup/devices;
    freezer = /sys/fs/cgroup/freezer;
    net_cls = /sys/fs/cgroup/net_cls;
    blkio = /sys/fs/cgroup/blkio;
    perf_event = /sys/fs/cgroup/perf_event;
}

group system {
    cpu {
            cpu.rt_period_us="1000000";
            cpu.rt_runtime_us="0";
            cpu.cfs_period_us="100000";
            cpu.cfs_quota_us="-1";
            cpu.shares="1024";
    }
    cpuacct {
            cpuacct.usage="147354515620554";
    }
}

group system/condor.service {
    cpu {
            cpu.rt_period_us="1000000";
            cpu.rt_runtime_us="0";
            cpu.cfs_period_us="100000";
            cpu.cfs_quota_us="-1";
            cpu.shares="1024";
    }
    cpuacct {
            cpuacct.usage="146844720798260";
    }
}

... * services look ~= 

Analyzing your Configuration

One thing you will notice is that systemd creates an implied hierarchy on your machine by default, where each service has an equal amount of cpu.shares. This means when all services are contending for resources, each “service” gets an equal share.

Lets elaborate on shares a bit. Say you had two service S(a) = 1, and S(b) = 3 and each service has multiple processes all contending for CPU.

%CPU = service.cpu.share /(sum (service shares @ level)) 
%CPU[S(a)] = 1/4 = 25% 
%CPU[S(b)] = 3/4 = 75% 

So now lets extend this idea and create a simple hierarchy where there are two groups, with each group having two services:

            Share   Overall%
Group 1     1       25%
    S(a)        1       12.5%
    S(b)        1       12.5%
Group 2     3       75%
    S(c)        3       56.25%
    S(d)        1       18.75%

Hopefully this should be intuitive, however it can quickly goto plaid. Therefore, it’s important to have a handle on how many services you have planed for a given machine, and your intended hierarchy. Thus the cost of reliable performance is extra complexity, which isn’t so bad provided you’ve done your math.


Altering your Configuration##

So now lets provision condor such that it has 50% of the CPU. First we need to get a count of number of services that exist on the machine.

$ cgsnapshot -s | grep [.]service | wc -l
30

As you can see from the previous example, and from the documentation, the default cpu.shares given to a service is 1024. Thus if we want 50% CPU:

.50 = condor.cpu.shares/(1024*29 + condor.cpu.shares)
512*(29) + .50*condor.cpu.shares = condor.cpu.shares
14848 = (1-.50)*condor.cpu.shares 
condor.cpu.shares = 14848/.50 = 29696

Seem nutty? GOOD! Don’t do it this way! Instead lets use the idea of promoting a top level group:

vim /usr/lib/systemd/system/condor.service

[Service]
ControlGroup=cpu:/condor
CPUShares=1024

Once we exit we will need to restart the daemon and verify it worked.

systemctl daemon-reload
systemctl restart condor.service

You may need to remove any legacy cruft from cgroups, or reboot, but you can compare the two configuration files.

cgsnapshot -s > cgroup_snap_2.conf

Next you will want to submit a whole bunch of condor jobs, and try to load down the other services. To verify that your machine is behaving as expected you can run:

systemd-cgtop

In this example it can be difficult when you have 30 services to accurately test that you are guaranteed 50% so I would recommend that the reader promote a couple of services to the top level and have them compete in a controlled experiment.


In Summary

Systemd’s integration with cgroups is a many splendid thing, and when used correctly can give administrators and developers another tool in which to help create SLAs in their datacenter.

Comments