Thursday, October 31, 2013

Meetup @ CREL Tech Talk SDSU: YARN - Method of Sharing Clusters Beyond Hadoop

Omkar Joshi presented (virtually) YARN - Method of Sharing Clusters Beyond Hadoop last night at a tech talk hosted by UCSD's Center for Research in Entertainment and Learning (CREL).  The talk covered the YARN architecture and discuss the interactions between services like the Resource, Application, and Node Manager.  The big take away was that YARN is ready to go; release 2+ is out and rolled into all major Hadoop distributions target Hadoop 2.0.

Omkar presented via Google Hangout.  There were about 20 attendees at the meet-up.

The audience asked a number of questions about YARN and it's status.  Here's a summarized version of the Q&A (from what I can remember):

1.  How does YARN ensure MapReduce processes are run on the same machine as the data it's going to process (locality of data)?

This is not so much a function of YARN, as much as the implementation of the MapReduce YARN ApplicationMaster (i.e. the YARN app that has replaced the traditional MapReduce framework).  The MapReduce ApplicationMaster communicates with the NameNode to determine which DataNodes have the blocks that needs to be processed and then requests MapReduce containers be launched specifically on those nodes (to process said blocks).  Therefore, the YARN Resource Manager is unaware of HDFS and the location of blocks, it's the MapReduce ApplicationMaster's job to figure that out.

2.  Is YARN rack-aware?

Yes YARN is rack-aware and you can request resources based on rack id.

3.  Is the YARN API stable? -- given that the old Distributed Shell example used to be broken in the source repository...

In short, yes.  The API experienced a brief period of instability as they migrated from version 1 to version 2.  The example projects have been fixed and checked in.

4.  How does YARN fit into the new set of Apache Platform as a Service project (Mesos, Stratos, etc.)?

YARN currently only has one implementation of the ResourceManager, but it would be possible to implement one on top of Mesos (but this would be non-trivial).

The subtext I got here is that there's a lot of overlap in the ecosystem and the efforts are necessarily in sync.

5.  Is YARN production ready?

Yes.  Many organizations are already using it and some popular frameworks like Storm and Spark have been ported to the platform.

6.  Is there a commitment by the major vendors to support YARN?

Yes, Hortonworks and Cloudera are specifically supporting development and rolling features into their distributions.

Saturday, October 26, 2013

Node-Webkit-based Timer Application

I needed an app that I could use to set time limits on work periods, as well as, ensure unruly kids met their time-out obligations, and was fairly underwhelmed with the offerings available.  Probably the best app for OSX was the Classic Timer application, which is $0.99 on the App Store.  Generally this is a pretty reasonable price for so little functionality and I didn't hesitate to buy it the first time.  I quickly out grew the application when I need to do simple things like "add 30 seconds" to an ongoing timer.  My wife also need something similar, but is currently on a PC, and I sure as heck was not going to buy something that simple a second time.

So I endeavored to build my own timer.  Since I'm on a Node-Webkit kick, and it's cross platform, I thought it would be the perfect framework for writing such an application.  So, in about 3 hours worth of effort, here's what I was able to kick out:

The application was incredibly simple to write.   Note, that outside of the packaging provided by Node-Webkit, this application does not rely on any Node.js functionality.  It's basically a combination of a number of great open source frameworks:
  • JQuery - which is really only used to facilitate Bootstrap.
  • Bootstrap 3 - which provides the nice styling elements.
  • AngularJS - web framework for composing the application.
  • Buzz - a pretty simple and effective audio library that abstracts the HTML5 audio API.
The NW-Timer application demonstrates some pretty interesting things about the Node-Webkit application framework.

Window Options - as you can see, the toolbar is completely removed from the application.  It basically looks like a native application, minus the iconic Bootstrap style.  The frame (close, minimize, maximize buttons and title), as Node-Webkit refers to it, can also be removed.  I've also set the min, max, and default height and width of the application.  This is all done using the package.json manifest file:
 "name": "nw-timer",
 "version": "0.0.1",
 "main": "index.html",
 "description": "Timer application.",
 "window": {
     "title": "NW Timer",
     "width": 400,
     "height": 520,
     "min-width": 400,
     "max-width": 400,
     "min-height": 520,
     "max-height": 520,
     "toolbar": false,
     "frame": true

Playing Audio - one gotcha I encountered about Node-Webkit is that it does not natively play all the audio and video formats you might be accustomed to in Safari or Chrome due to licensing issues (documented here).  I started by using an MP3 format which worked great when I ran the application using Chrome (remember I had no Node.js dependencies), but didn't play in the Node-Webkit container.  This was because the format was simply not supported.  I converted the MP3 to OGG, configured Buzz to include the alternate format, and it worked perfectly.

Here's an example of using the Buzz framework to play the alarm:
# Instantiate a reference to the sound:
alarmSound = new buzz.sound "sounds/alarm", { formats: [ "ogg", "mp3" ] }

# Play the sound:
playAlarm = ->
    alarmPlaying = true

# Stop the sound:
stopAlarm = ->
    alarmPlaying = false

# Done!

Packaging the Application - finally, the only thing you have to do to convert this web application into a Node-Webkit application package is "zip" it up.  I created a tiny script to "build" the package (please note that I didn't care if unnecessary artifacts also make it into the package):
# Check to see if an older version of the package exists
if [ -f "Timer.nw" ]; then
  echo "Removing the old version of the timer application"
  rm Timer.nw

# Zip the contents of the directory
zip -r Timer.nw *

# Done!

Final Thoughts

I find building GUI applications using Node-Webkit far more enjoyable than using traditional Desktop application frameworks.  First, they are far easier and leverage a skillset that many of us have cultivated over the last decade (web development).  More importantly, Node-Webkit only provide the "Window", you are free to choose the development framework that best fits your needs (Ember, AngularJS, Backbone, Knockout, etc.).  Another thing that makes the framework appealing is access to all of those rocking HTML5 API's, particularly the embedded databases provided natively with Webkit.  I see very little reason why this framework could not be used to build Rich Internet Applications for businesses.

You are welcome to do whatever you want with the NW-Timer application (even sell it) as long as you attribute the source you use back to me (it's ASL 2.0).  More importantly, I hope you are starting to learn the value of the Node-Webkit framework as an alternative to traditional, platform-specific, Desktop technologies.

Tuesday, October 22, 2013

Node-Webkit - an example of AngularJS using AMQP.

I just recently discovered Node-Webkit and it's pretty awesome.  In this post, I'm going to show you how I created an AngularJS application that communicates using AMQP.

That's right.  AngularJS using AMQP.  This isn't some gimmick where I'm calling to an Express backend via websockets and the server is communicating to RabbitMQ.  One of the Angular controllers is literally using AMQP.

Honestly, this isn't magic.  Node-Webkit provides the essential binding for Webkit (Chromium) to use the Node.js runtime.  All I need to do is wire up the application.

Node-Webkit has an application structure that's kind of a combination between a Node.js application and a web application.  You could probably even use MimosaJS without any changes to build and test the application if you wanted to.  Node-Webkit requires a package.json to existing the root folder, with a property called main to point at the HTML page that serves as the entry point of the application.
    "name": "nw-demo",
    "version": "0.0.1",
    "main": "index.html",
    "dependencies": {
        "amqp": "0.1.7",
        "uuid": "1.4.1"
You can even, and should, specify your Node.js dependencies via NPM.  In our case, I need the Node-AMQP and Node-UUID libraries.

The entry point, index.html, is simply a webpage.  In a Node-Webkit application, you bootstrap your application like you would a web app.  Instead of loading dependent libraries via HTTP, there accessed from the file system.  In my case, I'm using AngularJS, which has special semantics for wiring up an application.  The following is an abbreviated sample of my index.html:

Bootstrapping the Application
<!DOCTYPE html>
<html lang="en" ng-app="AmqpApp">
  <!-- ... -->
  <script type="text/javascript" src="vendor/angular.min.js"></script>
  <script type="text/javascript" src="lib/amqp.js"></script>
  <script type="text/javascript" src="lib/app.js"></script>
  <!-- ... -->
  <div ng-controller="MainCtrl" class="container">
Form controls for publishing.
      <h2>Publish a Message</h2>
      <div class="row">
        <div class="col-md-12">
          <strong>Message ID</strong>
        <div class="col-md-12">
          <input type="text" 
      <br />
      <div class="row">
        <div class="col-md-6">
          <strong>Headers (JSON)</strong>
          <textarea class="form-control" 
        <div class="col-md-6">
          <strong>Body (String or JSON)</strong>
          <textarea class="form-control" 
      <br />
      <div class="row">
        <div class="col-md-6">
          <button type="button" 
                  class="btn btn-default" 
            Use Sample
        <div class="col-md-6 text-right">
          <button type="button" 
                  class="btn btn-default" 
            <i class="glyphicon glyphicon-send"></i>
          <button class="btn btn-danger" type="button"
            <i class="glyphicon glyphicon-heart-empty"></i>
          <button class="btn btn-danger" type="button"
            <i class="glyphicon glyphicon-heart"></i>
Form controls displaying received messages.
      <h2>Messages Received</h2>
      <div class="row">
        <div class="col-md-12" ng-show="messages.length > 0">
          <table class="table table-striped">
                  <button type="button"
                      class="btn btn-default btn-xs"
                      ng-show="messages.length > 0"
                    <i class="glyphicon glyphicon-remove"></i>
                    <span>Clear Messages</span>
              <tr ng-repeat="message in messages">
                  <i class="glyphicon glyphicon-envelope"></i>
                  <div ng-repeat="(key, value) in message.headers">
                    <strong>{{key}}:</strong> {{value}}
                  <div ng-repeat="(key, value) in message.body">
                    <strong>{{key}}:</strong> {{value}}
        <div class="col-md-12" ng-show="messages.length == 0">
          <div class="jumbotron">
            <lead><strong>No messages in the queue.</strong><br />
              Try clicking on the 
              <i class="glyphicon glyphicon-heart-empty"></i>
              button to start receiving messages.</lead>
This is what the layout looks like.
If you noticed above, there are two JavaScript libraries I rely on to implement the controller functionality. These were actually written in CoffeeScript.  The controller code is pretty simple:
uuid = require("uuid")

Ex1Headers =
  principal: ""
  event_type: ""

Ex1Body =
  foo: "bar"
  bar: 123
  foobar: [ "foo", "bar", 123 ]

app = angular.module 'AmqpApp', []

app.controller 'MainCtrl', ($scope) ->

  $scope.id_box = uuid.v4()

  $scope.useSample = ->
    $scope.header_box = JSON.stringify Ex1Headers, undefined, 4
    $scope.body_box = JSON.stringify Ex1Body, undefined, 4


  $scope.messages = []

  message_handler = (message) ->
    $scope.$apply ->
      $scope.messages.push message

  amqpConnection = new AmqpConnection(message_handler)

  $scope.usingHeartbeat = false

  $scope.heartbeatOff = ->
    $scope.usingHeartbeat = false

  $scope.heartbeatOn = ->
    $scope.usingHeartbeat = true

  $scope.clearMessages = ->
    $scope.messages.length = 0

  $scope.publish = ->
    headers = {}

      headers = JSON.parse $scope.header_box
    catch e1
      alert("Invalid Headers input.")
      return = $scope.id_box

    body = null

      bjson = JSON.parse $scope.body_box
      body = bjson if bjson?
    catch e2
      body = { msg: $scope.body_box }

      amqpConnection.publish(body, headers)
      $scope.id_box = uuid.v4()
    catch e3
      alert("Error publishing message: " + e3)

The AMQP Connection class is also not too difficult to follow:
amqp = require("amqp")
uuid = require("uuid")
gui  = require("nw.gui")

class AmqpConnection

  constructor: (@handler) ->
    @connection = amqp.createConnection { 
         host: "localhost", 
         login: "guest", 
         password: "guest" 
    @connection.on "ready", =>
      @queue_name = uuid.v4()
      gui.Window.get().on "close", => @connection.end()
      @connection.queue @queue_name, (q) =>
        @q = q
        @q.bind "#"
        @q.subscribe @receive

  publish: (message, headers) =>
    headers = headers ? {}
    headers.sent = new Date().getTime()
    @connection.publish @queue_name, message, { 
      headers: headers, 
      contentType: "application/json" 

  receive: (msg, headers, deliveryInfo) =>
    headers.received = new Date().getTime()
    message = { headers: headers, body: msg }
    @handler message

  startHeartbeat: (interval) =>
    interval = interval ? 1500
    sendHeartbeat = =>
      @publish { ping: "pong!" }
    @heartbeat_handle = setInterval(sendHeartbeat, interval)

  stopHeartbeat: =>
    if @heartbeat_handle?

window.AmqpConnection = AmqpConnection

One thing you've probably noticed is the use of the Node.js require directive for importing dependencies.  This will play some havoc with RequireJS or AMD/CommonJS loader, but the Node-Webkit site has some discussion on work arounds.

Ok, when the publish button is pressed, you will see something like this:
When the heartbeat is turned on, you will repeatedly get messages:
And if you browse the RabbitMQ Management Console, you will see that our application is connected:
That's it.

You can find all of the source code on Github:

Saturday, October 12, 2013

Install Ansible From Source (Github) on OSX 10.8 (Mountain Lion)

If you want to use Ansible on OSX, I recommend installing it from source; once you know how to do it, you'll always be able to stay on the most up to date version.  We're going to assume you have Git installed.  If you don't, please go install it (

Here's how you do it.

1.  Install the Pip, a Python package manager.
sudo easy_install pip
2.  Install the Ansible's Python library dependencies.
sudo pip install paramiko PyYAML jinja2
3.  Clone the Ansible repository.
git clone git://
4.  Move into the ansible directory.
cd ansible
5.  Make and install.
sudo make install
6. Verify the install.
ansible --version

Have fun.


If you plan to use passwords with Ansible, you need to install sshpass. You can do this easily using Macports:
sudo port install sshpass

Friday, October 11, 2013

5 Simple Object Marshaling and Transformation Techniques in Cucumber-JVM

There's not a lot of blog posts, or formal documentation, on some of the cooler tricks you can do to marshal objects from DataTables or extracted strings in Gherkin expressions.  Since I'm working with Cucumber-JVM a lot these days, I thought I would share 5 simples marshaling/transformation techniques I've discovered that will help you clean up those Step Definitions.

1.  Implicit Type Conversion.

Cucumber can convert strings to integer, double, or boolean values simply by specifying primitive values in the function signature of a step definition.

Scenario: An example of implicit type conversion

  When I need numbers or booleans

  Then sometimes it's easier to let Cucumber convert values 1 or 1.1 or true instead of me

@Then("^sometimes it's easier to let Cucumber convert values (\\d+)" 
       + "or ((?:\\d|\\.)+) or (true|false) instead of me$")
public void sometimes_it_s_easier_to_let_cucumber_convert_values(
        int integer, double decimal, boolean bool) {

  assertEquals(1, integer);
  assertEquals(1.1d, decimal, 0);

2.  Implicit Conversion to a List<String>.

Cucumber can also convert comma separated strings into a List<String> by specifying a List<String> function argument in a step definition.

Scenario: An example of implicit conversion of lists

  When I need a bunch of items

  Then sometimes it's easier to deal with a list: apples, bananas, oranges

@Then("^sometimes it's easier to deal with a list: ((?:\\s|\\w|,)+)$")
public void sometimes_its_easier_to_deal_with_a_lists(List<String> list) {

      Arrays.asList("apples", "oranges", "bananas")));

3.  Explicitly Convert a Date or Calendar object using a Formatter.

Some strings are a little more complex or arbitrary to parse, so you can help Cucumber by telling it the format of the string to parse.  By default, Cucumber supports Date and Calendar strings, but the JavaDoc alludes to other possibilities.

Scenario: Convert more complex values using special formats

  When I need to do something with dates

  Then I should be able to use 10/31/2013 or 10/31/2013 12:32:22 and get a Date object back

@Then("^I should be able to use ((?:\\d|\\/)+) or ((?:\\d|\\/|:|\\s)+)" 
      + "and get a Date object back$")
public void I_should_be_able_to_use_or_AM_and_get_a_Date_object_back(
        @Format("MM/dd/yyyy") Calendar actualCalendar1,
        @Format("MM/dd/yyyy HH:mm:ss") Calendar actualCalendar2)  {

  Calendar expectedCalendar1 = Calendar.getInstance();
  expectedCalendar1.set(2013, 9, 31, 0, 0, 0);

  Calendar expectedCalendar2 = Calendar.getInstance();
  expectedCalendar2.set(2013, 9, 31, 12, 32, 22);



4.  Explicitly Convert a String to a <T> (strongly-typed object).

Cucumber provides a method of using a Transformer object to translate a string to a strongly typed object.  One only has to extend the Transformer abstract class, and use the @Transform annotation in the step definition's signature.

Scenario: Transform something more complex using a custom transform

  When I need to work with IP Addresses or Phone Numbers

  Then I should be able to parse or 555-555-5555 and get custom objects back

@Then("^I should be able to parse (\\d+(?:[.]\\d+){3}) or" 
      + " (\\d+(?:-\\d+){2}) and get custom objects back$")
public void I_should_be_able_to_parse_or_and_get_custom_objects_back(
        @Transform(IPAddressTransformer.class) InetAddress ipAddress,
        @Transform(PhoneNumberTransformer.class) PhoneNumber phoneNumber) {


  assertEquals(555, phoneNumber.getAreaCode());
  assertEquals(555, phoneNumber.getPrefix());
  assertEquals(5555, phoneNumber.getLineNumber());

5.  Convert a DataTable to a List<T>.

Finally, there's a nice mechanism for converting a DataTable into a List<T>, where <T> is a strongly-typed object represented by each table row.  The top row is considered a header row; the column names should be the name of a primitive property on your object type.  Cucumber will smartly handle the header, so you don't need to use camel-casing ("firstName" can be "First Name" as the column name).

Another nice feature Cucumber offers is an implicit conversion of the DataTable to the List<T> (specified in the function signature).  Alternatively, if you want access to the DataTable, you can accept it as an argument and call dataTable.asList(Type type) to manually perform the conversion.

Scenario: Transform a data table into a list of strongly typed objects

  When I need to specify a lot of data as a table

  Then I should be able to get a list of real objects back:
    | First Name | Last Name | Age | Is Male |
    | Obi-Wan    | Kenobi    | 55  | true    |
    | Han        | Solo      | 35  | true    |
    | Luke       | Skywalker | 24  | true    |
    | Leia       | Organa    | 24  | false   |

@Then("^I should be able to get a list of real objects back:$")
public void I_should_be_able_to_get_a_list_of_real_objects_back(List<Person> persons)  {

  assertEquals(4, persons.size());

Well, I hope this helped.  In a future post, I will talk about some of the lessons I've learned in using Cucumber-JVM, particularly around it's effective use in a complex project.

You can find the source (in the example3 packages) on Github: