Sunday, November 17, 2013

Configure IPTables with Ansible.

If you need to configure IPTables on the fly using Ansible, this is a really quick way to do it (and requires no extra dependencies). This mechanism relies on the lineinfile module, which allows you to idempotently add/verify/remove lines of text inside a file.  I then use with_items and list all of the protocols and ports I want available on the box.

* Note, this was validated on Ansible 1.4.0.

# This is an example Ansible playbook.
- hosts: all
    - name: Open the correct IPTables ports
      lineinfile: dest=/etc/sysconfig/iptables
                  regexp="^-A INPUT -p {{item.protocol}} -m {{item.protocol}} --dport {{item.port}} -j ACCEPT$"
                  line="-A INPUT -p {{item.protocol}} -m {{item.protocol}} --dport {{item.port}} -j ACCEPT"
                  insertafter="^:OUTPUT ACCEPT \[\d*:\d*\]$"
        - { protocol: tcp, port: 80 }
        - { protocol: tcp, port: 443 }
        - { protocol: tcp, port: 389 }
        - { protocol: tcp, port: 636 }
        - { protocol: tcp, port: 88 }
        - { protocol: tcp, port: 464 }
        - { protocol: tcp, port: 53 }
        - { protocol: udp, port: 88 }
        - { protocol: udp, port: 464 }
        - { protocol: udp, port: 53 }
        - { protocol: udp, port: 123 }
        - restart iptables

      - name: restart iptables
        action: service name=iptables state=restarted

This is (admittedly) a very simple example, but you should be able to see the value in the approach and adapt it to more complex scenarios.

Good luck!

Thursday, October 31, 2013

Meetup @ CREL Tech Talk SDSU: YARN - Method of Sharing Clusters Beyond Hadoop

Omkar Joshi presented (virtually) YARN - Method of Sharing Clusters Beyond Hadoop last night at a tech talk hosted by UCSD's Center for Research in Entertainment and Learning (CREL).  The talk covered the YARN architecture and discuss the interactions between services like the Resource, Application, and Node Manager.  The big take away was that YARN is ready to go; release 2+ is out and rolled into all major Hadoop distributions target Hadoop 2.0.

Omkar presented via Google Hangout.  There were about 20 attendees at the meet-up.

The audience asked a number of questions about YARN and it's status.  Here's a summarized version of the Q&A (from what I can remember):

1.  How does YARN ensure MapReduce processes are run on the same machine as the data it's going to process (locality of data)?

This is not so much a function of YARN, as much as the implementation of the MapReduce YARN ApplicationMaster (i.e. the YARN app that has replaced the traditional MapReduce framework).  The MapReduce ApplicationMaster communicates with the NameNode to determine which DataNodes have the blocks that needs to be processed and then requests MapReduce containers be launched specifically on those nodes (to process said blocks).  Therefore, the YARN Resource Manager is unaware of HDFS and the location of blocks, it's the MapReduce ApplicationMaster's job to figure that out.

2.  Is YARN rack-aware?

Yes YARN is rack-aware and you can request resources based on rack id.

3.  Is the YARN API stable? -- given that the old Distributed Shell example used to be broken in the source repository...

In short, yes.  The API experienced a brief period of instability as they migrated from version 1 to version 2.  The example projects have been fixed and checked in.

4.  How does YARN fit into the new set of Apache Platform as a Service project (Mesos, Stratos, etc.)?

YARN currently only has one implementation of the ResourceManager, but it would be possible to implement one on top of Mesos (but this would be non-trivial).

The subtext I got here is that there's a lot of overlap in the ecosystem and the efforts are necessarily in sync.

5.  Is YARN production ready?

Yes.  Many organizations are already using it and some popular frameworks like Storm and Spark have been ported to the platform.

6.  Is there a commitment by the major vendors to support YARN?

Yes, Hortonworks and Cloudera are specifically supporting development and rolling features into their distributions.

Saturday, October 26, 2013

Node-Webkit-based Timer Application

I needed an app that I could use to set time limits on work periods, as well as, ensure unruly kids met their time-out obligations, and was fairly underwhelmed with the offerings available.  Probably the best app for OSX was the Classic Timer application, which is $0.99 on the App Store.  Generally this is a pretty reasonable price for so little functionality and I didn't hesitate to buy it the first time.  I quickly out grew the application when I need to do simple things like "add 30 seconds" to an ongoing timer.  My wife also need something similar, but is currently on a PC, and I sure as heck was not going to buy something that simple a second time.

So I endeavored to build my own timer.  Since I'm on a Node-Webkit kick, and it's cross platform, I thought it would be the perfect framework for writing such an application.  So, in about 3 hours worth of effort, here's what I was able to kick out:

The application was incredibly simple to write.   Note, that outside of the packaging provided by Node-Webkit, this application does not rely on any Node.js functionality.  It's basically a combination of a number of great open source frameworks:
  • JQuery - which is really only used to facilitate Bootstrap.
  • Bootstrap 3 - which provides the nice styling elements.
  • AngularJS - web framework for composing the application.
  • Buzz - a pretty simple and effective audio library that abstracts the HTML5 audio API.
The NW-Timer application demonstrates some pretty interesting things about the Node-Webkit application framework.

Window Options - as you can see, the toolbar is completely removed from the application.  It basically looks like a native application, minus the iconic Bootstrap style.  The frame (close, minimize, maximize buttons and title), as Node-Webkit refers to it, can also be removed.  I've also set the min, max, and default height and width of the application.  This is all done using the package.json manifest file:
 "name": "nw-timer",
 "version": "0.0.1",
 "main": "index.html",
 "description": "Timer application.",
 "window": {
     "title": "NW Timer",
     "width": 400,
     "height": 520,
     "min-width": 400,
     "max-width": 400,
     "min-height": 520,
     "max-height": 520,
     "toolbar": false,
     "frame": true

Playing Audio - one gotcha I encountered about Node-Webkit is that it does not natively play all the audio and video formats you might be accustomed to in Safari or Chrome due to licensing issues (documented here).  I started by using an MP3 format which worked great when I ran the application using Chrome (remember I had no Node.js dependencies), but didn't play in the Node-Webkit container.  This was because the format was simply not supported.  I converted the MP3 to OGG, configured Buzz to include the alternate format, and it worked perfectly.

Here's an example of using the Buzz framework to play the alarm:
# Instantiate a reference to the sound:
alarmSound = new buzz.sound "sounds/alarm", { formats: [ "ogg", "mp3" ] }

# Play the sound:
playAlarm = ->
    alarmPlaying = true

# Stop the sound:
stopAlarm = ->
    alarmPlaying = false

# Done!

Packaging the Application - finally, the only thing you have to do to convert this web application into a Node-Webkit application package is "zip" it up.  I created a tiny script to "build" the package (please note that I didn't care if unnecessary artifacts also make it into the package):
# Check to see if an older version of the package exists
if [ -f "Timer.nw" ]; then
  echo "Removing the old version of the timer application"
  rm Timer.nw

# Zip the contents of the directory
zip -r Timer.nw *

# Done!

Final Thoughts

I find building GUI applications using Node-Webkit far more enjoyable than using traditional Desktop application frameworks.  First, they are far easier and leverage a skillset that many of us have cultivated over the last decade (web development).  More importantly, Node-Webkit only provide the "Window", you are free to choose the development framework that best fits your needs (Ember, AngularJS, Backbone, Knockout, etc.).  Another thing that makes the framework appealing is access to all of those rocking HTML5 API's, particularly the embedded databases provided natively with Webkit.  I see very little reason why this framework could not be used to build Rich Internet Applications for businesses.

You are welcome to do whatever you want with the NW-Timer application (even sell it) as long as you attribute the source you use back to me (it's ASL 2.0).  More importantly, I hope you are starting to learn the value of the Node-Webkit framework as an alternative to traditional, platform-specific, Desktop technologies.

Tuesday, October 22, 2013

Node-Webkit - an example of AngularJS using AMQP.

I just recently discovered Node-Webkit and it's pretty awesome.  In this post, I'm going to show you how I created an AngularJS application that communicates using AMQP.

That's right.  AngularJS using AMQP.  This isn't some gimmick where I'm calling to an Express backend via websockets and the server is communicating to RabbitMQ.  One of the Angular controllers is literally using AMQP.

Honestly, this isn't magic.  Node-Webkit provides the essential binding for Webkit (Chromium) to use the Node.js runtime.  All I need to do is wire up the application.

Node-Webkit has an application structure that's kind of a combination between a Node.js application and a web application.  You could probably even use MimosaJS without any changes to build and test the application if you wanted to.  Node-Webkit requires a package.json to existing the root folder, with a property called main to point at the HTML page that serves as the entry point of the application.
    "name": "nw-demo",
    "version": "0.0.1",
    "main": "index.html",
    "dependencies": {
        "amqp": "0.1.7",
        "uuid": "1.4.1"
You can even, and should, specify your Node.js dependencies via NPM.  In our case, I need the Node-AMQP and Node-UUID libraries.

The entry point, index.html, is simply a webpage.  In a Node-Webkit application, you bootstrap your application like you would a web app.  Instead of loading dependent libraries via HTTP, there accessed from the file system.  In my case, I'm using AngularJS, which has special semantics for wiring up an application.  The following is an abbreviated sample of my index.html:

Bootstrapping the Application
<!DOCTYPE html>
<html lang="en" ng-app="AmqpApp">
  <!-- ... -->
  <script type="text/javascript" src="vendor/angular.min.js"></script>
  <script type="text/javascript" src="lib/amqp.js"></script>
  <script type="text/javascript" src="lib/app.js"></script>
  <!-- ... -->
  <div ng-controller="MainCtrl" class="container">
Form controls for publishing.
      <h2>Publish a Message</h2>
      <div class="row">
        <div class="col-md-12">
          <strong>Message ID</strong>
        <div class="col-md-12">
          <input type="text" 
      <br />
      <div class="row">
        <div class="col-md-6">
          <strong>Headers (JSON)</strong>
          <textarea class="form-control" 
        <div class="col-md-6">
          <strong>Body (String or JSON)</strong>
          <textarea class="form-control" 
      <br />
      <div class="row">
        <div class="col-md-6">
          <button type="button" 
                  class="btn btn-default" 
            Use Sample
        <div class="col-md-6 text-right">
          <button type="button" 
                  class="btn btn-default" 
            <i class="glyphicon glyphicon-send"></i>
          <button class="btn btn-danger" type="button"
            <i class="glyphicon glyphicon-heart-empty"></i>
          <button class="btn btn-danger" type="button"
            <i class="glyphicon glyphicon-heart"></i>
Form controls displaying received messages.
      <h2>Messages Received</h2>
      <div class="row">
        <div class="col-md-12" ng-show="messages.length > 0">
          <table class="table table-striped">
                  <button type="button"
                      class="btn btn-default btn-xs"
                      ng-show="messages.length > 0"
                    <i class="glyphicon glyphicon-remove"></i>
                    <span>Clear Messages</span>
              <tr ng-repeat="message in messages">
                  <i class="glyphicon glyphicon-envelope"></i>
                  <div ng-repeat="(key, value) in message.headers">
                    <strong>{{key}}:</strong> {{value}}
                  <div ng-repeat="(key, value) in message.body">
                    <strong>{{key}}:</strong> {{value}}
        <div class="col-md-12" ng-show="messages.length == 0">
          <div class="jumbotron">
            <lead><strong>No messages in the queue.</strong><br />
              Try clicking on the 
              <i class="glyphicon glyphicon-heart-empty"></i>
              button to start receiving messages.</lead>
This is what the layout looks like.
If you noticed above, there are two JavaScript libraries I rely on to implement the controller functionality. These were actually written in CoffeeScript.  The controller code is pretty simple:
uuid = require("uuid")

Ex1Headers =
  principal: ""
  event_type: ""

Ex1Body =
  foo: "bar"
  bar: 123
  foobar: [ "foo", "bar", 123 ]

app = angular.module 'AmqpApp', []

app.controller 'MainCtrl', ($scope) ->

  $scope.id_box = uuid.v4()

  $scope.useSample = ->
    $scope.header_box = JSON.stringify Ex1Headers, undefined, 4
    $scope.body_box = JSON.stringify Ex1Body, undefined, 4


  $scope.messages = []

  message_handler = (message) ->
    $scope.$apply ->
      $scope.messages.push message

  amqpConnection = new AmqpConnection(message_handler)

  $scope.usingHeartbeat = false

  $scope.heartbeatOff = ->
    $scope.usingHeartbeat = false

  $scope.heartbeatOn = ->
    $scope.usingHeartbeat = true

  $scope.clearMessages = ->
    $scope.messages.length = 0

  $scope.publish = ->
    headers = {}

      headers = JSON.parse $scope.header_box
    catch e1
      alert("Invalid Headers input.")
      return = $scope.id_box

    body = null

      bjson = JSON.parse $scope.body_box
      body = bjson if bjson?
    catch e2
      body = { msg: $scope.body_box }

      amqpConnection.publish(body, headers)
      $scope.id_box = uuid.v4()
    catch e3
      alert("Error publishing message: " + e3)

The AMQP Connection class is also not too difficult to follow:
amqp = require("amqp")
uuid = require("uuid")
gui  = require("nw.gui")

class AmqpConnection

  constructor: (@handler) ->
    @connection = amqp.createConnection { 
         host: "localhost", 
         login: "guest", 
         password: "guest" 
    @connection.on "ready", =>
      @queue_name = uuid.v4()
      gui.Window.get().on "close", => @connection.end()
      @connection.queue @queue_name, (q) =>
        @q = q
        @q.bind "#"
        @q.subscribe @receive

  publish: (message, headers) =>
    headers = headers ? {}
    headers.sent = new Date().getTime()
    @connection.publish @queue_name, message, { 
      headers: headers, 
      contentType: "application/json" 

  receive: (msg, headers, deliveryInfo) =>
    headers.received = new Date().getTime()
    message = { headers: headers, body: msg }
    @handler message

  startHeartbeat: (interval) =>
    interval = interval ? 1500
    sendHeartbeat = =>
      @publish { ping: "pong!" }
    @heartbeat_handle = setInterval(sendHeartbeat, interval)

  stopHeartbeat: =>
    if @heartbeat_handle?

window.AmqpConnection = AmqpConnection

One thing you've probably noticed is the use of the Node.js require directive for importing dependencies.  This will play some havoc with RequireJS or AMD/CommonJS loader, but the Node-Webkit site has some discussion on work arounds.

Ok, when the publish button is pressed, you will see something like this:
When the heartbeat is turned on, you will repeatedly get messages:
And if you browse the RabbitMQ Management Console, you will see that our application is connected:
That's it.

You can find all of the source code on Github:

Saturday, October 12, 2013

Install Ansible From Source (Github) on OSX 10.8 (Mountain Lion)

If you want to use Ansible on OSX, I recommend installing it from source; once you know how to do it, you'll always be able to stay on the most up to date version.  We're going to assume you have Git installed.  If you don't, please go install it (

Here's how you do it.

1.  Install the Pip, a Python package manager.
sudo easy_install pip
2.  Install the Ansible's Python library dependencies.
sudo pip install paramiko PyYAML jinja2
3.  Clone the Ansible repository.
git clone git://
4.  Move into the ansible directory.
cd ansible
5.  Make and install.
sudo make install
6. Verify the install.
ansible --version

Have fun.


If you plan to use passwords with Ansible, you need to install sshpass. You can do this easily using Macports:
sudo port install sshpass

Friday, October 11, 2013

5 Simple Object Marshaling and Transformation Techniques in Cucumber-JVM

There's not a lot of blog posts, or formal documentation, on some of the cooler tricks you can do to marshal objects from DataTables or extracted strings in Gherkin expressions.  Since I'm working with Cucumber-JVM a lot these days, I thought I would share 5 simples marshaling/transformation techniques I've discovered that will help you clean up those Step Definitions.

1.  Implicit Type Conversion.

Cucumber can convert strings to integer, double, or boolean values simply by specifying primitive values in the function signature of a step definition.

Scenario: An example of implicit type conversion

  When I need numbers or booleans

  Then sometimes it's easier to let Cucumber convert values 1 or 1.1 or true instead of me

@Then("^sometimes it's easier to let Cucumber convert values (\\d+)" 
       + "or ((?:\\d|\\.)+) or (true|false) instead of me$")
public void sometimes_it_s_easier_to_let_cucumber_convert_values(
        int integer, double decimal, boolean bool) {

  assertEquals(1, integer);
  assertEquals(1.1d, decimal, 0);

2.  Implicit Conversion to a List<String>.

Cucumber can also convert comma separated strings into a List<String> by specifying a List<String> function argument in a step definition.

Scenario: An example of implicit conversion of lists

  When I need a bunch of items

  Then sometimes it's easier to deal with a list: apples, bananas, oranges

@Then("^sometimes it's easier to deal with a list: ((?:\\s|\\w|,)+)$")
public void sometimes_its_easier_to_deal_with_a_lists(List<String> list) {

      Arrays.asList("apples", "oranges", "bananas")));

3.  Explicitly Convert a Date or Calendar object using a Formatter.

Some strings are a little more complex or arbitrary to parse, so you can help Cucumber by telling it the format of the string to parse.  By default, Cucumber supports Date and Calendar strings, but the JavaDoc alludes to other possibilities.

Scenario: Convert more complex values using special formats

  When I need to do something with dates

  Then I should be able to use 10/31/2013 or 10/31/2013 12:32:22 and get a Date object back

@Then("^I should be able to use ((?:\\d|\\/)+) or ((?:\\d|\\/|:|\\s)+)" 
      + "and get a Date object back$")
public void I_should_be_able_to_use_or_AM_and_get_a_Date_object_back(
        @Format("MM/dd/yyyy") Calendar actualCalendar1,
        @Format("MM/dd/yyyy HH:mm:ss") Calendar actualCalendar2)  {

  Calendar expectedCalendar1 = Calendar.getInstance();
  expectedCalendar1.set(2013, 9, 31, 0, 0, 0);

  Calendar expectedCalendar2 = Calendar.getInstance();
  expectedCalendar2.set(2013, 9, 31, 12, 32, 22);



4.  Explicitly Convert a String to a <T> (strongly-typed object).

Cucumber provides a method of using a Transformer object to translate a string to a strongly typed object.  One only has to extend the Transformer abstract class, and use the @Transform annotation in the step definition's signature.

Scenario: Transform something more complex using a custom transform

  When I need to work with IP Addresses or Phone Numbers

  Then I should be able to parse or 555-555-5555 and get custom objects back

@Then("^I should be able to parse (\\d+(?:[.]\\d+){3}) or" 
      + " (\\d+(?:-\\d+){2}) and get custom objects back$")
public void I_should_be_able_to_parse_or_and_get_custom_objects_back(
        @Transform(IPAddressTransformer.class) InetAddress ipAddress,
        @Transform(PhoneNumberTransformer.class) PhoneNumber phoneNumber) {


  assertEquals(555, phoneNumber.getAreaCode());
  assertEquals(555, phoneNumber.getPrefix());
  assertEquals(5555, phoneNumber.getLineNumber());

5.  Convert a DataTable to a List<T>.

Finally, there's a nice mechanism for converting a DataTable into a List<T>, where <T> is a strongly-typed object represented by each table row.  The top row is considered a header row; the column names should be the name of a primitive property on your object type.  Cucumber will smartly handle the header, so you don't need to use camel-casing ("firstName" can be "First Name" as the column name).

Another nice feature Cucumber offers is an implicit conversion of the DataTable to the List<T> (specified in the function signature).  Alternatively, if you want access to the DataTable, you can accept it as an argument and call dataTable.asList(Type type) to manually perform the conversion.

Scenario: Transform a data table into a list of strongly typed objects

  When I need to specify a lot of data as a table

  Then I should be able to get a list of real objects back:
    | First Name | Last Name | Age | Is Male |
    | Obi-Wan    | Kenobi    | 55  | true    |
    | Han        | Solo      | 35  | true    |
    | Luke       | Skywalker | 24  | true    |
    | Leia       | Organa    | 24  | false   |

@Then("^I should be able to get a list of real objects back:$")
public void I_should_be_able_to_get_a_list_of_real_objects_back(List<Person> persons)  {

  assertEquals(4, persons.size());

Well, I hope this helped.  In a future post, I will talk about some of the lessons I've learned in using Cucumber-JVM, particularly around it's effective use in a complex project.

You can find the source (in the example3 packages) on Github:

Wednesday, September 25, 2013

Register an RSA Public Key using an Ansible Playbook

In the previous post, I demonstrated a script that automates the adding of an RSA Public Key to a remote host.  By doing this, we get the ability to perform "password-less" logins.

My original purpose for doing this was so I could remotely manage those hosts via Ansible (  It turns out, Ansible has a much more effective way of performing this task.

Ansible has the notion of "playbooks" which are essentially scripts/configuration similar to a Chef Recipe or a Puppet script.

This is the Ansible playbook for adding an RSA Public Key, located at ~/.ssh/ on my local machine, to a remote host (call this file register-key.yml):

- hosts: '{{ host }}'
  remote_user: '{{ user }}'

  - name: Add RSA key to the remote host
    authorized_key: user='{{ user }}' key="{{ lookup('file', '~/.ssh/') }}"

The script basically says, for the following hosts, as the remote user (user), add the contents of the ~/.ssh/ on my local system to the authorized_key file for the remote user (user).

Note that user and host are both variables we will pass in.  The lookup keyword is literally a function to pull the contents of the 'file' specified as the second argument.

The next thing you need is an inventory file. This inventory file lets Ansible know the available resources for it to command. Here is a really simple inventory file called inventory.txt:

Your Ansible inventory file can be a full list of servers, referenced by hostname or IP Address.  The computers can be grouped into categories, and have special configuration applied individually.

We execute the Ansible playbook like this:

# Register key with the remote host
ansible-playbook register-key.yml -i inventory.txt \
--ask-pass --extra-vars "user=myremoteuser" 

The previous command basically says:
  1. Execute the  register-key.yml playbook.
  2. Use the inventory described in inventory.txt
  3. Ask me for the remote user's password.  We don't have the key installed yet, so we will need to log in as the remote user.
  4. Use the following values for variables defined in the playbook:
    1. user - myremoteuser
    2. host - (this was in the inventory, remember?)
Your output should look something like this:

Yes, it really is that simple.

Update (9/26/13): I discovered an interesting caveat when using this script on a brand-new VM.  If you have never logged into that VM via SSH before, the script will fail due to an authentication error.  At first, I thought the remote server (the VM) was throwing the error.  After some investigation, I realized that it was actually the local machine.  The reason is that Ansible does not automatically accept the remote machine's certificate when the very first connection is made.  Remember the preamble you get when you connect to a remote machine for the first time in SSH?  Your machine asks whether you accept the remote machines certificate.  The same thing is happening here, it's just, Ansible is sensible and doesn't violate your trust by automagically accepting the certificate.

So, in short, you will need to log into the remote machine via SSH just once, if only to accept that machines SSH certificate.  

BASH Script to Register RSA Pulic Key with Remote Host

I wanted to be able to script the installation of an RSA public key on a remote host so I could have "password-less" access to the host via SSH.  This is what I came up with.

Note:  I am a developer, not a BASH ninja.  If you have recommendations to improve this script, let me know.

Before I show the script, this is how you use it:

# The User and Host are together.  
# I decided to not make it any more complicated than it needed to be.
sh user@remotehost

You will be prompted for passwords as necessary.

Here's a look at the console output of the script:

And here's the BASH script (named


# Variables



echo $line

echo "This script will prompt during the process since"
echo "you have not yet installed the RSA key with the"
echo "server.  Please be patient; this should be done"
echo "shortly."

echo $line

# Ensure the RSA Public Key exists.
if [ ! -e ~/.ssh/  ]

    echo "No public SSH key found..."

    echo "Generating SSH key.  Follow the prompts:"

    echo $line

    ssh-keygen -t rsa

    echo $line

    if [ ! $? ]

        echo "Key Generation was not successful.  Exiting."

        exit 1

echo $line

echo "Ensuring remote .ssh directory exists."
echo "You will need to enter the remote host's password."

ssh $USER_AND_HOST "mkdir -p ~/.ssh/"

echo $line

echo "Copying key to remote host."
echo "You will need to enter the remote host's password."

echo $line

scp ~/.ssh/ "$USER_AND_HOST:~"

echo $line

echo "Adding the key to the set of 'authorized_keys'..."
echo "You will need to enter the remote host's password."

ssh $USER_AND_HOST "cat >> .ssh/authorized_keys"

echo $line

echo "Cleaning up key and testing SSH access..."

ssh $USER_AND_HOST "rm ~/"


echo $line

if [ RESULT ]
   echo "Your public key was successfully added to the host."
   echo "Epic fail mate!  Your key was NOT added to the host."

echo $line

echo "Ensure you can access the server using the following command:"
echo "ssh $USER_AND_HOST \"ls -lah ~/.ssh\""

In the next post, I'm going to demonstrate how to reduce the number of operations considerably by performing this task using Ansible.

Sunday, February 10, 2013

Integrating Spring Security with Dropwizard

Dropwizard is a great framework for streamlining Java web apps (I'd argue any app) for deployment in a production environment.  It's an answer to the complexity of Java Web and Application Containers, which tend to be overkill for 90% of your use-cases.
As a framework, Dropwizard is still a little young (I won't say immature because it's a rock-solid environment).    Dropwizard comes with two authenticators, Basic and OAuth, and it's SSL-based Client Authentication is relatively undocumented.  We needed certificated-based client authentication, but didn't want to have to also force users to do BasicAuth in addition to using a certificateust so we could inject a "principal" object (i.e. @Auth) in our JAX-RS controllers.
More importantly, we wanted the rich features of Spring Security, which not only includes support for other authentication mechanisms like SPNEGO+Kerberos, but also route-based and expression-based security.
With that said, we did the hard work in figuring out how to integrate Spring Security with Dropwizard and this is the resulting work:  To use our set of extensions, follow the instructions below:

Configure Spring Security in your applicationContext.xml

Here is an example for certificate-based authentication.
  <security:intercept-url pattern="/*" access="ROLE_USER" />
  <security:intercept-url pattern="/admin/*" access="ROLE_ADMIN" />
  <security:x509 subject-principal-regex="CN=(.*?)," />

      <security:user name="Super Awesome Client" authorities="ROLE_USER" />
      <security:user name="The Boss" authorities="ROLE_USER, ROLE_ADMIN" />

Initialize your Spring Application Context.

Initialize your Spring Application Context in your Dropwizard Service class. We explicitly require the location of theapplicationContext in our Dropwizard Configuration class.
ApplicationContext applicationContext = 
  new FileSystemXmlApplicationContext(

Register Spring Security with the Dropwizard Environment.

public void run(BlahBlahConfiguration configuration, Environment environment) throws Exception {

  ApplicationContext applicationContext = 
    new FileSystemXmlApplicationContext(

  new SpringSecurityAuthProvider(applicationContext).registerProvider(environment);

Use @Auth UserDetails userDetails in your JAX-RS controllers.

We're going to inject the Spring Security UserDetails context into your controllers (why invent a new User object?).
public ChittyChat getChittyChatOnTopic(@Auth UserDetails userDetails, @PathParam("topic") String topic){
   // ... get Chitty-Chat ...

Get to working on your app!

You're done. Spring Security is filtering requests prior to you're resources being called.  It it also possible to use the Spring Security annotations if your resources are instantiated and managed by Spring or by using the "component-scan" capability (see

Sunday, January 27, 2013

Top 10 Technologies I Want My Software Engineers to Know In 2013

Last year I blogged about the top ten technologies I wanted my engineers at Berico Technologies to learn in 2012.  The post was so popular, I've decided to make it a tradition.  In addition to providing my new top ten list, I want to provide a little retrospective on the technologies that made the list and those that fell off this year to explain why I've increased or devaluated their importance.

Without further adieu, I present:

Top 10 Technologies I want my Software Engineers to Know in 2013.

1.  The Modern Client-Side Web Development Stack
2.  Node.js
3.  Modern Messaging Frameworks
4.  Hadoop + HBase
5.  Clojure + Leiningen
6.  Twitter Storm
7.  Lucene/Solr
8.  Graph Databases
9.  A Platform as a Service Offering
10. Apache ZooKeeper


1.  The Modern Client-Side Web Development Stack:  Let's face it, the faster growing "sector" of the Enterprise stack is the Client-Side.  We have not only seen an explosion of new client-side Application/MVC frameworks, but also the adoption of a whole new process for composing and building web applications.  And we should emphasize the term "application"!  Modern websites are as complex as their desktop predecessors (perhaps even more), as browsers continue to be come more capable, and user expectations grow.

Unfortunately, no single framework or library is worthy of being in the top ten on its own.  I will say, however, that the combination of these frameworks represents a fundamental shift in our community away from Server-side MVC and GWT-like frameworks towards hyper-capable clients.

There are a number of frameworks/libraries of note:


-  Application/MVC Frameworks:  Ember.js, Batman.js, Angular.js, and Knockout.js.

Please note that I've left Backbone and Sammy.js off the list in part because I think their popularity are starting to wane as the newer breed of frameworks offer more capabilities.  Another framework generating a lot of excitement is Meteor.js, which attempts to provide a seamless stack (client-server-database) with a simple API.  I have only glanced at Meteor's documentation, but it looks promising.

-  Application Composition:  Require.js.

-  Build Tools:  Mimosa.js, Yeoman.

Mimosa.js is a new build/deployment framework developed by one of my friends David Bashford.  While it hasn't gotten a lot of attention from the community, our company uses it for nearly all of it's projects and the number of people forking/starting it on Github seems to double each month.

-  Languages:  CoffeeScript, ClojureScript, TypeScript.

At this point, I don't think there is any dominant JavaScript alternative.  My opinion is that a team (not an individual) should adopt the one that best fits their collective personality.

-  Visualization:  D3.

D3 scored us a number of big wins in the visualization department last year.  Our company is also beginning to cultivate a number of D3 proficient engineers.  I see us continuing to spread the D3 goodness around our team indefinitely.
2 Node.js:  In my mind, last year Node.js (well Express and other web frameworks) proved it was a viable alternative to Ruby on Rails.  I believe this will be the year in which Node.js unseats Rails.  Numerous PaaS vendors began supporting the platform last year (OpenShift, Cloud Foundry, Heroku, Azure) and I'm sure many more will follow suit (where are you Google App Engine?).

More importantly, Node.js IS NOT JUST A WEB FRAMEWORK!  Applications and libraries in a number of different domains are being written on top of the platform, and I can't wait to see what comes next (a NoSQL database?, a distributed processing framework?). 

3.  Modern Messaging Frameworks:  one of the top ten technologies last year was RabbitMQ.  Let start by saying that my respect, admiration and appreciation of RabbitMQ has only grown during 2012 (there is no other AMQP broker in my book).  This year, I'm broadening the scope by including "modern messaging frameworks" as one of the top ten competencies in 2013.

The shift to include more messaging technologies came from the realization that there is a need in modern cloud architectures to have multiple messaging platforms.  AMQP, and specifically RabbitMQ, is representative of the "reliable messaging" tier of brokers (like JMS, MSMQ, TIBCO).  However, there is a distinct need for "high-throughput" and "batching" message brokers that sacrifice security and reliability for performance.  The dominant brokers I'm looking at in each tier are:

Reliable Messaging:  AMQP/RabbitMQ
High-Throughput:  ZeroMQ
Batching:  Apache Kafka


4.  Hadoop + HBase:  If it isn't obvious that the Hadoop ecosystem is incredibly important right now, you are living under a rock.  Almost every RDBMS vendor is embracing Hadoop with tie-ins to their database.  Many, including Oracle and SQL Server, will allow you to ship SQL tables to Hadoop, execute a MapReduce job, and pull the results back into the database.  As for HBase, it remains critically important as an operationally-proven petabyte scale NoSQL database.

This year, I don't expect any revolutionary changes to occur on the Hadoop + HBase platform, though I think the software will continue to mature as vendors adopt the platform and create commercial extensions.  The thing to watch for is the frameworks built on top of Hadoop, like Impala (released by Cloudera last year).  There's also the potential for the start of an Apache project attempting to clone "Spanner", Google's "Globally-Distributed Database".


5.  Clojure + Leiningen:  This will probably be the most upsetting item on this list, particularly from the Scala crowd.  Last year I wanted my engineers to learn a JVM language.  This year, I only want them to learn one: Clojure.  The decision to buck all other languages from this list came from the collective frustration our engineers faced last year using both Scala and JRuby (and our great experiences with Clojure).

So what happened with Scala and JRuby?

I think the success of JavaScript last year may have deemphasized Ruby's importance, which in turn deemphasized JRuby.  I personally like the Ruby language, but I constantly found myself struggling to find a good use-case for its application.  Another problem I think Java developers have with Ruby is needing to learn Ruby Gems in addition to dealing with Java dependency management.  Frankly, not too many people wanted to learn JRuby, favoring Scala or Clojure.

Scala, on the other hand, is a language I think many of us learned to hate last year.  On the surface, it appears to be a decent language.  However, the more we learned it's syntax, the more we realized how needlessly complex and obtuse it could be.  In the hands of a really great engineer, it can be very elegant.  In the hands of everyone else, it can be difficult to read and understand.  More importantly, we didn't particularly enjoy Scala's mechanisms for interoperability with Java (it seemed strange in many cases).

The big eye-opener was Clojure.  Once getting past all of those parentheses, I think many people realized a how simple and elegant the language was.  Interop with Java is extremely clean.  I personally found my self up and running in a couple of hours, using all of my favorite Java libraries without any issues.  This year we will continue to evangelize the language, pushing people and projects toward the platform.

6.  Twitter Storm:  Storm is doing for real-time computing, what Hadoop did for batch processing.  I think we are just now seeing the Storm rise in popularity and I expect it will become even more popular as developers start building frameworks on top of it.  Our company, Berico Technologies, already has plans for building an data ingestion framework and a real time data manipulation framework on top of it.  I would imagine many other developers are actively doing the same as I write this.

7.  Lucene/Solr:  This entry was at risk of falling off the top ten list, but stayed on the list primarily because of the promise of Solr Cloud this year.  Search is no longer a feature, but rather a requirement for many applications, and Lucene-based indexes will remain the dominant implementation.

8.  Graph Databases:  As graphs become more mainstream, I think engineers are starting to realize the value of databases optimized for joins.  The clear leader in this market is Neo4j, but I think it will start to see some competition from highly-scalable, distributed implementations like Titan.

More importantly, there has been a trend towards polyglot architectures (combining a graph database [for joins] with a relational database [for queries]).  Frameworks like Spring Data Neo4j simplify the development of these systems by providing annotation-driven Data Access Layer functionality similar to Hibernate and JPA.

In terms of usability, a framework to highlight is Tinkerpop's Blueprints, an abstraction for graph databases.  In addition to Blueprints, Tinkerpop has also written a number of complimentary frameworks (Pipes, Gremlin, Frames, Furnance, and Rexter) enhancing the usability of your graph.

9.  A Platform as a Service Offering:  Platforms like RedHat OpenShift, VMWare Cloud Foundry, Windows Azure, Heroku and Google App Engine are the way of the future in my opinion.  Being able to compose an application and not worry about installing and maintaining the services it relies upon is quite liberating for an engineer.  There is a clear cost and time savings in employing such solutions.  More importantly, I want my engineers thinking about PaaS design and it's implications for applications so they can build and/or employ them for our customers who don't have the luxury of using a commercial offering.

10.  Apache Zookeeper:  This probably seems like an odd addition to the list.  ZooKeeper is a framework that enables reliable communication and coordination between nodes in a cluster.  The framework is used to perform centralized configuration, key-value storage, distributed queues, leadership election, synchronization, failure detection and recovery, etc.  ZooKeeper is a key component in number of important distributed platforms like Apache Hadoop, Apache Solr, Apache HBase, Twitter Storm, Neo4j HA, and Apache Kafka to name a few.  Simply put ZooKeeper is underpinning of a number of important applications and knowledge of it couldn't hurt.  More importantly, there's nothing the prevents you from building your own distributed application with ZooKeeper.

Favorable mention.

Finally, I wanted to give a "favorable mention" to a number of frameworks that didn't make the list:

- Spring Framework:  Still the trusted backbone of all of our Java applications, and I just as dependent on it as I was three years ago.
- Redis: We're using it successfully on a couple of projects, the only downside is the lack of security which prevents us from using it on all of our projects.
- MongoDB:  We use MongoDB on a couple of projects.  It's certainly proven itself to be the document database of choice for our company.
- Riak:  Incredibly interesting NoSQL offering.  We aren't using it at the moment, but a couple of our engineers have used it at other companies and we're genuinely fascinated by it.
- Datomic:  Another incredibly interesting database from the key contributors and creator of the Clojure language.  Datomic offers the ability to keep a temporal history of mutations to a records stored within, making it uniquely suitable for some of our client's problem-sets.
- LMAX Disruptor:  A framework for performing multithreaded, no-lock operations on a ring buffer.  Developers at LMAX have optimized the framework to work with the underlying hardware, like ensuring variables are cached effectively in the L1.

Sliding off the list from last year.

These are the frameworks that feel off the list this year and why.

- Spring Framework:  Spring is still incredibly important, but it's at the point that we are taking it for granted.  Knowledge of the framework is practically mandatory in our company, so it's not as important of a technology to be learned this year.
- Ruby on Rails:  We are no longer building on Rails.  Rails is still a great framework, but it's being overshadowed by the prospect of an end-to-end JavaScript web stack.  We've also had some issues with Ruby's thread model, making it incredibly difficult for us to integrate messaging into the Rails stack.
- Redis:  Redis is still a great key-value store, but it's lack of security features makes it difficult for our company to use in our client's architectures.  We still love it, however.
- CoffeeScript:  I still write in CoffeeScript all the time, but its time to acknowledge that there are a number of great new compile-to-JavaScript languages out there.  For my .NET developers, I can't consciously recommend them learn CoffeeScript when they get great support for TypeScript in Visual Studio.
- OSGi:  OSGi became a great frustration last year for us.  The API is antiquated (i.e.: no use of Generics, registration of components is not straight forward), containers function inconsistently (we gave up on JBoss 7's OSGi support), and it's a real pain in the ass to have to bundle libraries you don't write.  I think the idea of a component architecture is awesome, but I think it needs to be apart of the core Java platform and not an external framework.
- RabbitMQ:  We still love and use RabbitMQ all the time.  In fact, I just wrote 8 posts on the topic!  RabbitMQ was not so much as demoted as expanded to include ZeroMQ and Kafka.
- A new JVM Language:  I don't advocate staying with the Java language, but I want to warn you that you may be frustrated with your options.  As you've already seen, I'm advocating you learn Clojure above all other languages.  If you don't learn Clojure, try give Scala a try.  We may even be surprised this year with a resurgence in Groovy's popularity as optimizations of the JVM makes the language much faster.  Outside of those choices, I think you will find learning a language off the platform more rewarding.

RabbitMQ Configuration and Management Series

Lately, I've been working heavily with RabbitMQ and wanted to share many of the things I have learned in the process.  Specifically, I wanted to demonstrate how to configure and administer RabbitMQ from the point of installing the OS, to configuring the firewall, clustering the brokers, enabling SSL, and load-balancing the cluster.  Basically, what you would want in a cluster if you had to deliver a messaging solution in a hostile environment.

I will continue to write more articles about RabbitMQ, so please come back to this page to get the latest index of articles.  I hope you find these articles helpful.

I recommend reading the documentation in the following order:
  1. Installing RabbitMQ on CentOS 6.3 Minimal - demonstrates how to perform a complete install of RabbitMQ on CentOS 6.3 Minimal (headless, no unnecessary packages), including two ways to install Erlang.
  2. Enabling RabbitMQ Management Console - an essential component for managing RabbitMQ, this is probably the second thing you should install once you have broker.
  3. Configuring iptables for a single instance of RabbitMQ - don't just turn off you firewall!  Configure iptables to allow clients to connect to the broker without exposing the rest of your system.
  4. Configuring iptables for a RabbitMQ Cluster - the process for allowing a cluster to communicate with the firewall on is a little more involved.  This post will show you how to do it.
  5. Clustering RabbitMQ - will show you how to cluster RabbitMQ brokers
  6. Configuring SSL for RabbitMQ - lock your RabbitMQ instances down by configuring them to use SSL.  I'll show you how to do this, including the generation of certificates.
  7. Securing the RabbitMQ Management Console with SSL - What's the point of locking down the AMQP port if you don't lock down the Management Console?  This post will show you how to enable SSL-authentication for the Management Console.
  8. Binding non-SSL-capable AMQP Clients to SSL RabbitMQ - some clients do not support AMQP/SSL.  I'll show you a generic solution for using a non-SSL-capable client (Node.js in the example) with a RabbitMQ instance protected by SSL.

Binding non-SSL-capable AMQP Clients to SSL RabbitMQ

This is an article in the RabbitMQ Configuration and Management Series.  For more articles like this, please visit the series' index.

Many languages support AMQP, but not all support SSL. Probably the most popular of the languages not supporting SSL is Node.js. Fortunately, there is a very easy solution that does not involve you rewriting a client library, and it works for every language.
The solution is to utilize stunnel, a process that will initiate an SSL connection (via OpenSSL), wrapping the underlying TCP-based communication initiated by the client. More plainly, your application connects to a port on localhost using AMQP, stunnel connects to the broker via SSL, and then pipes your local request to the AMQP broker through the SSL tunnel.
stunnel can be installed on most major operating systems (Windows, UNIX, OSX, Linux); but we will talk primary about how to do it on OSX and CentOS.
  1. Install stunnel.
    On OSX: sudo port install stunnel
    On CentOS: sudo yum install stunnel
  2. Next, we need to configure stunnel.
Create a local file and name it something like stunnel.conf.
vi stunnel.conf
Add the following to the stunnel.conf:
client = yes
cert = {path to cert}/{client name}.certkey.pem

accept = {local port the client will communicate on}
connect = {broker IP address}:{broker SSL port}
Dissecting the file:
  • client: Indicates that we are a client contacting a server protected by SSL.
  • foreground:  Should stunnel run in the console instead of in the background.
  • cert:  A file with both the certificate and private key.
  • [amqp]:  This is a header for a route registration (we are calling it "amqp").
  • accept:  The incoming port to accept communications.
  • connect:  The port, or optionally, host and port to establish an SSL connection with.
Using our previous example:
client = yes
cert = app01.certkey.pem

accept = 5673
connect =
  1. Generate the client certificate.
    Using the certificate scripts in CMF-AMQP-Configuration:
    sh {client} {password}
    For example:
    sh app01 password123
    stunnel requires our key and certificate to be collocated in the same file. To do this, lets create a new file and append the contents of {client}.key.pem and {client}.cert.pem generated by

    cd {path/to/cert/dir}
    cat {client}.cert.pem >> {client}.keycert.pem
    cat {client}.key.pem >> {client}.keycert.pem

    For example:
    cd client/
    cat app01.cert.pem >> app01.keycert.pem
    cat app01.key.pem >> app01.keycert.pem
    The "keycert" file is now ready to be used with stunnel.
  2. Start stunnel.
    stunnel {path/to/configuartion}/stunnel.conf
    For example, assuming we are in the directory of stunnel.conf:
    stunnel stunnel.conf
  3. Now, simply start your client, binding to the port on localhost you chose stunnel to accept connections on.

Verifying stunnel Works

The CMF-AMQP-Configuration repository contains a test client to demonstrate this capability.

Install Node.js if you don't already have it installed.

Clone the CMF-AMQP-Configuration repository on GitHub:
git clone

Change into the node-test-client directory:
cd node-test-client

Install the the project's dependencies:
npm install

Edit the config.js file and add the correct stunnel and connection settings:
vi config.js
Using the settings from the example:
module.exports = { 
  host: "localhost", 
  port: 5673, 
  vhost: "/", 
  login: "guest", 
  password: "guest", 
  publishingInterval: 50 
Now run the example:
node run_test.js
If everything works as prescribed, you should see the following output in the console: