code spelunking

Q: 'Whats under this rock?' A: 'more rocks...'

Getting Started With Mesos on Fedora 21 and CentOS 7

| Comments

Background

For decades now, computer scientists have debated on how to coordinate groups of heterogeneous compute resources to solve a set of domain specific problems. “Scheduling” was the catch all moniker that was used to describe this space. This category of problems is old, therefore the scheduling universe is vast, and expansive.

The most recent generation of schedulers that have emerged, strive to address the problem of coordinating several distributed applications across a data-center. The reason why I find this interesting, is that many distributed applications reinvent aspects of a “scheduler”, often without realizing the depth and breadth of the domain they just stepped into. There is a great ACM article that highlights this point “There’s Just No Getting around It: You’re Building a Distributed System”.

Now-a-days, we’re seeing a Cambrian explosion of software stacks, each reinventing pieces of the scheduling wheel. That’s all well and good, but there is a much easier way. Enter Apache Mesos, a cluster manager that provides efficient resource isolation and sharing across distributed applications.

At its core, Mesos is a focused meta-scheduler that provides primitives to express a wide variety of scheduling patterns and use cases. Solutions are written atop of Mesos, and are targeted for a particular use case. By remaining focused at its core, Mesos is not architecturally encumbered by domain specific problems that often exist within other monolithic schedulers.


References


Overview

I often find software first impressions are usually pretty important, and if I have to spend hours setting up an application, then that usually colors my perspective about the technology.

Therefore, in this post I will run through how simple it is for you to get started with Mesos on Fedora 21 and CentOS 7. I’ll leave HA deployments for another post, because I want to outline just how simple it is to setup for “trying it out”.


Prerequisites

CentOS 7

Currently dependent packages have not been fully pulled into CentOS 7, or epel channels, but I’ve enabled the mesos.spec to build a bundled distribution for those who want to run on CentOS 7. For convenience, rpms can be found here

But for those who want to rebuild it for themselves, you can download the srpm and run:

$ mock --clean --init -r epel-7-x86_64 --rebuild mesos-0.20.0-2.f421ffd.fc21.src.rpm

You will also need to update your docker installation to 1.X.

Multi-Node Cluster

  • If you setting up a multi-node cluster, it is recommended that you have DNS setup.
  • You may also want to alter your firewall settings.

Installation

$ sudo yum install mesos python-mesos mesos-devel mesos-java

NOTE: If you are seeing missing dependencies such as (protobuf-python) on CentOS 7, then you do not have the epel repositories installed correctly.


Setup

Make sure docker is running:

$ systemctl start docker

Single Node

Out of the box, the package is configured to run mesos on a single host machine. Which is convenient for developers who just want to install and test their applications locally before submitting to their cluster.

Multi-Node

If you want to setup a multi-node cluster there is simply one parameter you need to set on your worker nodes /etc/mesos/mesos-slave-env.sh file:

export MESOS_master=yourmaster.yourdomain.com:5050

Running

Master Node

$ systemctl start mesos-master

and open a browser to localhost:5050

Worker Node

$ systemctl start mesos-slave

Now check the “Slaves” tab on the browser window to verify.

Smoke Test

$ mesos execute --command="/bin/sleep 10" --master="yourmaster.yourdomain.com:5050" --name="whizbang"

Verify that it ran under the “Frameworks” tab.


Summary

Distributed systems can be a complicated and thorny road, but setting up and deploying them doesn’t have to be, and it’s a breeze with Mesos.

HaPpY HaCkInG!

Hot Rod Hadoop With Tachyon on Fedora 21 (Rawhide)

| Comments

Background

Within the last couple of years we’ve witnessed a natural evolution in the “Big Data” ecosystem. Where the common theme that you’ve probably heard in the community that, “Memory is King”, and it is. Therefore, if you are looking for performance optimization in your stack, an “in memory” layer should be part of the equation. Enter Tachyon, which provides reliable file sharing across cluster frameworks.

Tachyon can be used to support different framworks, as well as different filesystems. So to bound the scope of this post, we will outline how to setup Tachyon on a local installation to boost performance of map-reduce application whose data is stored in HDFS on Fedora 21.


Special Thanks

Tachyon is a recent addition to the Fedora channels, and it would not have been possible without the efforts of Haoyuan Li, Gil Cattaneo, and William Benton


References


Prerequisites


Installation and Setup

Prior to installing Tachyon please ensure that you have setup your hadoop installation as outlined in the pre-reqs.

First you will need to install the tachyon package:

$ sudo yum install amplab-tachyon

Now you will need to update /etc/hadoop/core-site.xml configuration for hadoop to enable map-reduce to take advantage of tachyon, by appending the following snippet:

<property>
  <name>fs.tachyon.impl</name>
  <value>tachyon.hadoop.TFS</value>
</property>

Now that all the plumbing is in place you can restart hadoop

systemctl restart hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager

Next, make certain your local HDFS instance is up and running, then you will need to perform a tachyon format.

$ sudo runuser hdfs -s /bin/bash /bin/bash -c "tachyon.sh format"
> Formatting Tachyon @ localhost
> Deleting /var/lib/tachyon/journal/
> Formatting hdfs://localhost:8020/tachyon/data
> Formatting hdfs://localhost:8020/tachyon/workers

Initialization

Prior to running the daemons you will need to mount the in-memory filesystem.

$ sudo tachyon-mount.sh SudoMount

Now you can start the daemons.

$ sudo systemctl start tachyon-master tachyon-slave

For completeness you can inspect the logs which are located in the standard system location

$ ls -la /var/log/tachyon

Operation

Once you’ve verified tachyon is up and running, you can run a simple mapreduce application as seen below:

$ hadoop jar /usr/share/java/hadoop/hadoop-mapreduce-examples.jar wordcount
  tachyon://localhost:19998/user/tstclair/input/constitution.txt
  tachyon://localhost:19998/test1

You’ll notice tha tachyon prefix attached to the input and output locations. This enables hadoop to start the TFS shim which will load and write to tachyon. To verify you can run the following:

$ sudo runuser hdfs -s /bin/bash /bin/bash -c "tachyon.sh tfs ls /test1"
> 16.65 KB  02-17-2014 15:41:11:849  In Memory      /test1/part-r-00000
> 0.00 B    02-17-2014 15:41:12:366  In Memory      /test1/_SUCCESS

If you’re interested in grok’ing further you can probably find the part file under /mnt/ramdisk.

Summary

Tachyon provides reliable in memory file sharing across cluster frameworks, as we have seen in our simple example. It also enables some very interesting prospects for other back end filesystems.

In future posts we’ll explore more elaborate configurations using tachyon atop different frameworks and filesystems.

Bootstrapping Your MapReduce 2.X Programming on Fedora 20

| Comments

Picture Courtesy of Mauro Flores jr

Background

Recently the BIG DATA SIG has added Hadoop 2.0.5 (or 2.X series) to the Fedora channels. This marks the first addition into any OS-distribution which meets all the standards, and system integration requirements set forth by their steering committee(s). Don’t be fooled, bundling .jars into a package that looks like a .rpm or .deb != a compliant package (not even by a long shot).

So to give some props to all the effort that it took to lasso this elephant, this post will outline how to bootstrap the default installation for MapReduce development.


References


Prerequisites

  • Fedora 20 Machine

Installation and Setup (as root)

First you will need to install all the default hadoop packages and tools required.

yum install hadoop-common hadoop-hdfs hadoop-libhdfs hadoop-mapreduce hadoop-mapreduce-examples hadoop-yarn maven-* xmvn* 

Next you will need need to format your namenode:

runuser hdfs -s /bin/bash /bin/bash -c "hadoop namenode -format"

Once your namenode has been formatted you can now start the daemons using the default service methods:

systemctl start hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager

Finally you will want to create the default directories:

hdfs-create-dirs

Setting up a Users Sandbox (as root)

runuser hdfs -s /bin/bash /bin/bash -c "hadoop fs -mkdir /user/tstclair"
runuser hdfs -s /bin/bash /bin/bash -c "hadoop fs -chown tstclair /user/tstclair"

Running WordCount (as user)

For simplicity I’ve setup a WordCount example on github that you can copy.

git clone https://github.com/timothysc/hadoop-tests.github

Once it has downloaded you can put the example .txt file into your user location

cd hadoop-tests/WordCount
hadoop fs -put constitution.txt /user/tstclair

Now you can build WordCount against the system installed .jars.

mvn-rpmbuild package 

Finally you can run:

hadoop jar wordcount.jar org.myorg.WordCount /user/tstclair /user/tstclair/output 

Feel free to cat the part-0000 file to see the results.


In Summary

Hadoop 2.0.5 now acts like a standard package, with all the accoutrements folks have come to expect.

Giddyup!

Leveraging Systemd Cgroup Integration to Provide SLAs on Fedora 18 & 19

| Comments

Background

In the not-so-distant past, enterprise data centers would create silos for specific services to gaurentee some metric of performance, or “Service Level Agreement” (SLA). However, this approach can be costly to create and maintain.

Enter the modern era of cloud computing, and one might wonder, “Why not just put it in VM?”. For some use cases this might work just fine, because the metrics are “good-enough”. Despite this flexibility, there are many cases where this approach simply won’t meet some measure of performance. I won’t elaborate on the details, but it doesn’t take Schrödinger math to figure this out, because sometimes the cat is dead even before you peak into the box. ;-)

Thus, in this post we will explore leveraging systemd cgroup integration to provide SLAs on Fedora.


References


Prerequisites

  • Fedora 18 or 19 box(es).
  • Make certain you’ve read the references, as I may gloss over some details in this post.

Getting Started

First you will need to choose a service which has been integrated with systemd that you can plan on tuning. In this example I will use ‘condor’, but you could use any service that you desire.

 sudo yum install condor

NOTE: You could do this with raw cgroups, but it becomes difficult to gaurentee performance unless every service is in a group. So systemd does a lot of the heavy lifting for us.

Next you will need to determine the metrics of performance that you want to provide for that service. For the purposes of simplicity, lets say we want to carve off 50% of the CPU for condor. You can also play with disk-io-bandwidth and network settings too, but I think I will leave that for another post as this can be complicated enough.

In order to divide up your machine you will first need to determine the existing shares on your machine. This can be done by dumping the current cgroup settings to a file which can then be analyzed to determine the new settings.

cgsnapshot -s > cgroup_snap.conf 

If you have a fairly basic setup you will notice the following pattern

# Configuration file generated by cgsnapshot
mount {
    cpuset = /sys/fs/cgroup/cpuset;
    cpu = /sys/fs/cgroup/cpu,cpuacct;
    cpuacct = /sys/fs/cgroup/cpu,cpuacct;
    memory = /sys/fs/cgroup/memory;
    devices = /sys/fs/cgroup/devices;
    freezer = /sys/fs/cgroup/freezer;
    net_cls = /sys/fs/cgroup/net_cls;
    blkio = /sys/fs/cgroup/blkio;
    perf_event = /sys/fs/cgroup/perf_event;
}

group system {
    cpu {
            cpu.rt_period_us="1000000";
            cpu.rt_runtime_us="0";
            cpu.cfs_period_us="100000";
            cpu.cfs_quota_us="-1";
            cpu.shares="1024";
    }
    cpuacct {
            cpuacct.usage="147354515620554";
    }
}

group system/condor.service {
    cpu {
            cpu.rt_period_us="1000000";
            cpu.rt_runtime_us="0";
            cpu.cfs_period_us="100000";
            cpu.cfs_quota_us="-1";
            cpu.shares="1024";
    }
    cpuacct {
            cpuacct.usage="146844720798260";
    }
}

... * services look ~= 

Analyzing your Configuration

One thing you will notice is that systemd creates an implied hierarchy on your machine by default, where each service has an equal amount of cpu.shares. This means when all services are contending for resources, each “service” gets an equal share.

Lets elaborate on shares a bit. Say you had two service S(a) = 1, and S(b) = 3 and each service has multiple processes all contending for CPU.

%CPU = service.cpu.share /(sum (service shares @ level)) 
%CPU[S(a)] = 1/4 = 25% 
%CPU[S(b)] = 3/4 = 75% 

So now lets extend this idea and create a simple hierarchy where there are two groups, with each group having two services:

            Share   Overall%
Group 1     1       25%
    S(a)        1       12.5%
    S(b)        1       12.5%
Group 2     3       75%
    S(c)        3       56.25%
    S(d)        1       18.75%

Hopefully this should be intuitive, however it can quickly goto plaid. Therefore, it’s important to have a handle on how many services you have planed for a given machine, and your intended hierarchy. Thus the cost of reliable performance is extra complexity, which isn’t so bad provided you’ve done your math.


Altering your Configuration##

So now lets provision condor such that it has 50% of the CPU. First we need to get a count of number of services that exist on the machine.

$ cgsnapshot -s | grep [.]service | wc -l
30

As you can see from the previous example, and from the documentation, the default cpu.shares given to a service is 1024. Thus if we want 50% CPU:

.50 = condor.cpu.shares/(1024*29 + condor.cpu.shares)
512*(29) + .50*condor.cpu.shares = condor.cpu.shares
14848 = (1-.50)*condor.cpu.shares 
condor.cpu.shares = 14848/.50 = 29696

Seem nutty? GOOD! Don’t do it this way! Instead lets use the idea of promoting a top level group:

vim /usr/lib/systemd/system/condor.service

[Service]
ControlGroup=cpu:/condor
CPUShares=1024

Once we exit we will need to restart the daemon and verify it worked.

systemctl daemon-reload
systemctl restart condor.service

You may need to remove any legacy cruft from cgroups, or reboot, but you can compare the two configuration files.

cgsnapshot -s > cgroup_snap_2.conf

Next you will want to submit a whole bunch of condor jobs, and try to load down the other services. To verify that your machine is behaving as expected you can run:

systemd-cgtop

In this example it can be difficult when you have 30 services to accurately test that you are guaranteed 50% so I would recommend that the reader promote a couple of services to the top level and have them compete in a controlled experiment.


In Summary

Systemd’s integration with cgroups is a many splendid thing, and when used correctly can give administrators and developers another tool in which to help create SLAs in their datacenter.

Configuring a Personal Hadoop Development Environment on Fedora 18

| Comments

Background

The following post outlines a setup and configuration of a “personal hadoop” development environment that is much akin to a “personal condor” setup. The primary purpose is to have a single source for configuration and logs along with a soft-link to development built binaries such that switching to a different build is a matter of updating a soft-link while maintaining all other data and configuration.


Use Cases

  • Comparison testing in a local sandbox without altering an existing system installation.
  • Single source configuration and logs

References

Inter-webz:

Books:


Disclaimers

  • Currently this is a non-native development setup that uses the existing maven dependencies. For details on native packaging please visit https://fedoraproject.org/wiki/Features/Hadoop
  • The setup listed below is for creating “Single-Node-Cluster”

Prerequisites

Configure Password-less ssh

yum install openssh openssh-clients openssh-server
# generate a public/private key, if you don't already have one
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/*

# testing ssh:
ps -ef | grep sshd     # verify sshd is running
ssh localhost          # accept the certification when prompted
sudo passwd root       # Make sure the root has a password

Install Other Build Dependencies

yum install cmake git subversion dh-make ant autoconf automake sharutils libtool asciidoc xmlto curl protobuf-compiler gcc-c++ 

Install Java And Deps

yum install java-1.7.0-openjdk java-1.7.0-openjdk-devel java-1.7.0-openjdk-javadoc *maven*

append to your .bashrc file:

export JVM_ARGS="-Xmx1024m -XX:MaxPermSize=512m"
export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=512m"

NOTE: These instructions have been updated to build against OpenJDK 7 on F18. Currently (4/25/13), builds are clean but there are some test failures. To get a complete list of failed tests run:

 mvn install -Dmaven.test.failure.ignore=true

Building and Setting up a “personal-hadoop”

Building

git clone git://git.apache.org/hadoop-common.git
cd hadoop-common
git checkout -b branch-2.0.4-alpha origin/branch-2.0.4-alpha
mvn clean package -Pdist -DskipTests

Creating Your “personal-hadoop” Sandbox

In this configuration we default to /home/tstclair

cd ~
mkdir personal-hadoop
cd personal-hadoop
mkdir -p conf data name logs/yarn
ln -sf <your-git-loc>/hadoop-dist/target/hadoop-2.0.4-alpha home

Override your environment

append to your .bashrc file:

# Hadoop env override:
export HADOOP_BASE_DIR=${HOME}/personal-hadoop
export HADOOP_LOG_DIR=${HOME}/personal-hadoop/logs
export HADOOP_PID_DIR=${HADOOP_BASE_DIR}
export HADOOP_CONF_DIR=${HOME}/personal-hadoop/conf
export HADOOP_COMMON_HOME=${HOME}/personal-hadoop/home
export HADOOP_HDFS_HOME=${HADOOP_COMMON_HOME}
export HADOOP_MAPRED_HOME=${HADOOP_COMMON_HOME}
# Yarn env override:
export HADOOP_YARN_HOME=${HADOOP_COMMON_HOME}
export YARN_LOG_DIR=${HADOOP_LOG_DIR}/yarn
#classpath override to search hadoop loc
export CLASSPATH=/usr/share/java/:${HADOOP_COMMON_HOME}/share
#Finally update your PATH
export PATH=${HADOOP_COMMON_HOME}/bin:${HADOOP_COMMON_HOME}/sbin:${HADOOP_COMMON_HOME}/libexec:${PATH}

Verify your setup

source ~/.bashrc
which hadoop    # verify it should be ${HOME}/personal-hadoop/home/bin  
hadoop -help    # verify classpath is correct.

Creating Initial Single Configuration Node Setup

First copy in the default configuration files:

cp ${HADOOP_COMMON_HOME}/etc/hadoop/* ${HADOOP_BASE_DIR}/conf

NOTE: As your configuration testing space expands it is sometimes useful to have your conf directory to also be a softlink of configuration templates.

Next update your hdfs-site.xml with the following:

(hdfs-site.xml) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Override tstclair with your home directory -->

<configuration>

    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost/</value>
    </property>
    <property>
        <name>dfs.name.dir</name>
        <value>file:///home/tstclair/personal-hadoop/name</value>
    </property>
    <property>
        <name>dfs.http.address</name>
        <value>0.0.0.0:50070</value>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>file:///home/tstclair/personal-hadoop/data</value>
    </property>
    <property>
        <name>dfs.datanode.address</name>
        <value>0.0.0.0:50010</value>
    </property>
    <property>
        <name>dfs.datanode.http.address</name>
        <value>0.0.0.0:50075</value>
    </property>
    <property>
        <name>dfs.datanode.ipc.address</name>
        <value>0.0.0.0:50020</value>
    </property>

</configuration>

Append, or update, your mapred-site.xml with the following:

(mapred-site.xml) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Update or append these vars -->

<configuration>
    <property>
        <name>mapreduce.cluster.temp.dir</name>
        <value>
        </value>
        <description>No description</description>
        <final>true</final>
    </property>
    <property>
        <name>mapreduce.cluster.local.dir</name>
        <value>
        </value>
        <description>No description</description>
        <final>true</final>
    </property>
</configuration>

Finally update your yarn-site.xml with the following:

(yarn-site.xml) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<configuration>
    <!-- Site specific YARN configuration properties -->
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>localhost:8031</value>
        <description>host is the hostname of the resource manager and
                    port is the port on which the NodeManagers contact the Resource Manager.
        </description>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>localhost:8030</value>
        <description>host is the hostname of the resourcemanager and port is the port
                     on which the Applications in the cluster talk to the Resource Manager.
        </description>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
        <description>In case you do not want to use the default scheduler</description>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>localhost:8032</value>
        <description>the host is the hostname of the ResourceManager and the port is the port on
                    which the clients can talk to the Resource Manager. </description>
    </property>
    <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>
        </value>
        <description>the local directories used by the nodemanager</description>
    </property>
    <property>
        <name>yarn.nodemanager.address</name>
        <value>localhost:8034</value>
        <description>the nodemanagers bind to this port</description>
    </property>
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>10240</value>
        <description>the amount of memory on the NodeManager in GB</description>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce.shuffle</value>
        <description>shuffle service that needs to be set for Map Reduce to run </description>
    </property>
</configuration>

NOTE: You may notice that I’ve included default variables and their corresponding port numbers to ease default hunting.

Starting Your Single Node Hadoop Cluster

Format your namenode (only needed for the 1st setup):

hadoop namenode -format
#verify output is correct.

Start HDFS:

start-dfs.sh

open a browser to http://localhost:50070 and verify you have 1 live node.

Next start yarn:

start-yarn.sh

Verify the logs show it’s running normally.

Finally check to see if you can run an MR application:

cd ${HADOOP_COMMON_HOME}/share/hadoop/mapreduce
hadoop jar hadoop-mapreduce-example-2.0.4-alpha.jar randomwriter out

HAPPY HACKING!!!

Per-Process Mount Namespaces

| Comments

Background

“Isolation” in modern computing comes in many flavors and functions. Virtual machines, c-groups, chroots/jails, sanboxing, and namespaces all have a role to play. In this post we will review process mount namespace isolation, which allows a process to isolate its mount points from the outside world. Furthermore, it enables cleanup of that namespace, which is highly useful in grid applications.


References

Before you dive too deep into the code below, it’s important to read up on some kernel and system goodies:


Deep Thoughts

It may take some time to digest all of the reading and figure out how to apply namespaces to your application. I recommend re-reading the use cases outlined kernel documentation on shared subtrees. The nugget of goodness that we wish to apply here is under section 4B.

A process wants its mounts invisible to any other process, but
still be able to see the other system mounts.

Solution:

To begin with, the administrator can mark the entire mount tree
as shareable.

mount --make-rshared /

A new process can clone off a new namespace. And mark some part
of its namespace as slave

mount --make-rslave /myprivatetree

Hence forth any mounts within the /myprivatetree done by the
process will not show up in any other namespace. However mounts
done in the parent namespace under /myprivatetree still shows
up in the process's namespace.

In summary, you will want your parent process to recursively slave mount /your/loc, as not to pollute the inherited namespace especially in cases where the mount points have shared propagation enabled (Default in Fedora).


Testing

So I’ve created a simple test application which shows how you to hide subprocess mount points.

For more details, please checkout my tests repo.


Applications

  • Cleaning condor job mounts ;-)
  • Application security
  • User security

Override HTCondor Installation With Sudo

| Comments

Background

As a developer, I often find myself wanting to iterate on a build, and test in a sandboxed environment. Isolating the environment ensures that your changes work well in a controlled experiment, but it has one fundamental flaw, it’s not realistic. In order to truly test a complicated system it’s best to put it into some production environment. This is great for testing, but it has been known to give admins a headache, because you are mucking with a known good installation. So in this post we will set about the task of overriding an installtion of HTCondor using sudo while keeping the following requirements in mind:

Requirements

  • Don’t alter the existing system installation (binaries or config files).
  • Be able to reference any custom developer build.
  • Be able to easily change back to the known good installation with little/no effort.

Getting Started

Before you begin you will need a machine which already has HTCondor installed on it, and all the necessary development tools in order to create a custom build. You will also need to obtain the source tree for HTCondor, and follow the instructions on how to build condor. I typically ‘alias cmake’ in my environment with all the clever developer magic to make a sandbox’d installation out of the gate.

alias cmake='cmake -DCMAKE_INSTALL_PREFIX:PATH=${PWD}/release_dir \
-DBUILDID:STRING=tstclair_local -DWANT_CONTRIB:BOOL=TRUE \
-DWANT_FULL_DEPLOYMENT:BOOL=FALSE -D_VERBOSE:BOOL=TRUE \
-DWANT_MAN_PAGES:BOOL=TRUE -D_DEBUG:BOOL=TRUE'

cmake . && make install 

Lastly you will need an account which has sudo privs on the machine where you will be tinkering.


Setting up a Sandbox

Once you’ve created build for the target machine that you would like to test, you will need to create a sandbox location which is also accessible by the ‘condor’ user, I typically use /tmp.

mkdir /tmp/mycondor
cp -r release_dir /tmp/mycondor 

Next you will want to drop 3 files into your sandbox directory.

The first file is a simple bash script which kicks off your sandbox’d condor ensuring that all the correct environment variables are passed through sudo so that HTCondor can properly execute out of your sandbox.

(sudo_condor.sh) download
1
2
3
4
5
6
7
8
9
10
11
12
13
#!/bin/sh

foo=`pwd`

export BASE=${foo}
export CONDOR_CONFIG=${BASE}/override.sh\|
export CONDOR=${BASE}/release_dir
export PATH=${BASE}:${CONDOR}/lib:${CONDOR}/bin:${CONDOR}/sbin:${CONDOR}/libexec:${PATH}
export LD_LIBRARY_PATH=${CONDOR}/lib:${CONDOR}/libexec

#start up new condor
echo "Starting condor... @${CONDOR} with override ${CONDOR_CONFIG}"
sudo env PATH=$PATH CONDOR_CONFIG=$CONDOR_CONFIG LD_LIBRARY_PATH=$LD_LIBRARY_PATH BASE=${BASE} ${CONDOR}/sbin/condor_master

If you are testing your client tools, you will also want to mundge your PATH in your testing shell as seen in the script.

The next file is a script which acts as a piped config script. In HTCondor, there is a feature which allows admins to generate/mundge the parameters which can be passed in on intialization and reconfig. It turns out this is useful in meeting our previously mentioned requirement of not mucking with the existing configs while still being able to customize as seen below:

(override.sh) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#!/bin/sh

if [ -z "$BASE" ]; then
    echo "Using pwd"
    export BASE=${PWD}
fi

#take the original system config
cat /etc/condor/condor_config

# override the execution locations
echo "RELEASE_DIR=${BASE}/release_dir"
echo "LIB=${BASE}/release_dir/lib"
echo "INCLUDE=${BASE}/release_dir/include"
echo "LIBEXEC=${BASE}/release_dir/libexec"

#apply your custom setting overrides
cat ${BASE}/condor_config.local

The final file is an optional condor_config.local file, which you can create. This file is appended to the end of the existing config and allows the developers, or admins, to lay out any configuration that they desire or even override the system to behave the way they would like.

Finally you will need to adjust the access permission so that the ‘condor’ user can access the shared location.

sudo chown -R "root:root" /tmp/mycondor

Now lets rock N’ roll!

cd /tmp/mycondor
./sudo_condor.sh

You’ve now taken over a existing machine in your pool using all the configuration settings that were there, while still allowing your own config magic. You should take heed though, if you are testing any changes to condors internal files, you could possibly corrupt what is there. To avoid this, you can override the SPOOL and EXECUTE directories in your condor_config.local file.


Verify its correct

The easiest way to verify correctness is to check that the preamble in the logs(/var/log/condor) contains your BUILDID string, in this case it was ‘tstclair_local’.


Potential Use Cases

It’s fairly obvious why a developer would want to do this, but it has many potential use cases outside of just development, which include:

  • Beta testing a new installation prior to a pool upgrade.
  • Testing version compatibility across releases.
  • Playing with new shiney condor features in a existing pool.

Elastic Grid With Condor and oVirt Integration

| Comments

Background

Gone are the days where an IT administrator could procure a dedicated compute cluster for a single task, so it is often the case where admins are asked to do more with existing resources where possible, especially those which are underutilized. There are several existing solutions to oversubscription, but few that remain “general purpose” while adapting to the environment as the load within the cluster changes. Enter Condor, which has been most well known for its batch processing capabilities, but can also be leveraged in many ways as an IaaS tool when coupled with oVirt.

There have been numerous refs in the past to using the two tools together, but in this post we will explore the idea of using Condor’s integration with oVirt to spin the resources directly from Condor, and briefly cover how administrators could use this capability to spin resources “on demand”.


Setup

Before you begin, you will need to configure oVirt and a deltacloud server such that your preconfigured images can be spun via the deltacloud api remotely. To verify, you can run a simple test program to ensure that it works from a remote machine, as if it were run from condor.

(deltacloud_test.c) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#include <stdio.h>
#include <stdlib.h>
#include <libdeltacloud/libdeltacloud.h>

int main()
{
        struct deltacloud_api api;
        struct deltacloud_instance instance;
        int ret = 2;

        // replace url and email with your domain info.
        if (deltacloud_initialize(&api, "http://ovirt.yourdomain.com:3002/api", "vdcadmin@ovirt.yourdomain.com", "123456") < 0)
        {
                fprintf(stderr, "Failed to initialize libdeltacloud: %s\n", deltacloud_get_last_error_string());
                return 1;
        }

        // get the instance directly by name, replace with your image name
        if ( deltacloud_get_instance_by_name(&api, "kvm_test_image_64", &instance) )
        {
                fprintf(stderr, "Failed to get deltacloud instances: %s\n", deltacloud_get_last_error_string());
                return ret;
        }

        printf("We did it!\n");
        printf("---Here are the details---\n");
        printf("href =%s\n",instance.href);
        printf("id =%s\n",instance.id);
        printf("name =%s\n",instance.name);
        printf("owner_id =%s\n",instance.owner_id);
        printf("image_id =%s\n",instance.image_id);
        printf("image_href =%s\n",instance.image_href);
        printf("realm_id =%s\n",instance.realm_id);
        printf("realm_href =%s\n",instance.realm_href);
        printf("state =%s\n",instance.state);
        printf("launch_time =%s\n",instance.launch_time);

        // from here we can start the instance itself.
        // deltacloud_instance_start(&api, &instance)

        ret = 0;

        deltacloud_free(&api);
        return ret;
}

Once this is done, you will then need to install condor-deltacloud-gahp on the submit machines where you want to spin the resources.


Spinning a oVirt Instance with Condor

Provided you’ve setup all the pieces above, you should be able to just submit a grid universe job which referenced the images that you wish to start up.

universe = grid
grid_resource = deltacloud http://ovirt.yourdomain.com:3002/api
executable = ovirt_spin_test
deltacloud_username = vdcadmin@ovirt.yourdomain.com
deltacloud_password_file = user_pwd

# Just specify the rhevm instance name
deltacloud_instance_name = kvm_test_image_64

log = job_deltacloud_basic_$(cluster)_$(process).log
notification = NEVER
queue

Potential Use Cases

Given the tight level of integration from Condor and oVirt via deltacloud there are a liteny of use cases which could be crafted by administrators to enable the auto spinning of images from condor, which include:

  • Using the JobRouter to configure for overflow
  • Using condor-cron to spin images based on time based activites
  • Using DAGMan workflows along with a monitoring activity to spin and clean resources, based on availability

Dust Off Nuke It From Orbit

| Comments

So in an effort to single source the numerous blogs, and other random bitz which have been smatter across the internet, I’ve decided to start putting all my professional efforts into one loc where other developers can easily find sources, blogs, etc.