Thursday, June 19, 2014

Moving to Silvrback

I will no longer be using Blogger as my platform.  There are some things to like about the platform, but more and more I find it to be suboptimal, particularly for developers who want to post code.  I won't be closing the site down, but I do plan to not write any more content within Blogger.

You can view my new blog at

Friday, June 13, 2014

Enable JMX on Amazon EC2

I ran into an issue this last week enabling JMX for a Java process on an Amazon EC2 instance.  After a couple of hours researching (and finding a couple of answers on StackOverflow), I wanted to share a complete answer to how this is done.

1.  Unblock RMI and JMX Ports.

First, you have to ensure both the JMX and RMI ports are available to the process that's going to make the JMX connection.  One issue many devs/admins are having within EC2 is that their JMX clients are unable to establish a connection to RMI.  Therefore, you need to make sure your instance's security policy, as well as your firewall (if using one), allow both ports to be accessed.  In this example, I'm specifying 9011 and 9010 for JMX and RMI respectively.

2.  Resolve the Public DNS Name of your EC2 Instance.

Another problem is that when your JMX client makes a connection, the JMX server returns the wrong connection information for the server's RMI port.  If you don't override the java.rmi.server.hostname, the server will return the default hostname for the EC2 instance, which is not publicly resolvable.  To fix this, you can request the public DNS name of an instance from the "metadata server" within EC2.  A call made by an EC2 instance to the URL will return the public DNS name of that instance.  In the example below, you will see the call made using wget for this information.

3.  Prefer the IPv4 Stack.

Finally, the last issue we ran into was the JMX server binding to IPv6 addresses instead of IPv4.  You can instruct Java to prefer the IPv4 stack with the following flag:

The rest of the configuration is pretty standard.  This is a fragment of a BASH script showing the composition of JAVA_OPTS you may use when starting your Java process:




RMI_HOST=`wget -q -O -`

JAVA_OPTS="${JAVA_OPTS} -Djava.rmi.server.hostname=${RMI_HOST}"


I would like to thank the StackOverflow commentors for help with this issue.  You can see the original posts on these threads:

Thursday, February 20, 2014

Install RabbitMQ Management Console in Travis CI.

If you are writing integration tests and need a local RabbitMQ instance to test against, you can actually spin up an instance in Travis CI (this is built into the platform).  One thing that does not come with Travis's RabbitMQ instance is the RabbitMQ Management Console.  The RabbitMQ Management Console is the friendly web UI for RabbitMQ, but it also provides RESTful access for configuring the server.

Adding the RabbitMQ Management Console is easy.  Simply add some install tasks:

  - rabbitmq
    - sudo rabbitmq-plugins enable rabbitmq_management
    - sudo rabbitmq-plugins enable rabbitmq_federation
    - sudo rabbitmq-plugins enable rabbitmq_federation_management
    - sudo service rabbitmq-server restart

The services property instructs Travis CI to start RabbitMQ.  I don't know if this is necessary because I'm uncertain if it instructs Travis CI to use a specific VM, or if it simply starts the RabbitMQ service.  My guess is that it is unnecessary, but since it does add much overhead (i.e. time) to the testing, I've left it in the configuration.

The install command allows us to execute command line tasks on a CentOS box (the build machine).  The Travis CI guys have graciously given us access to RabbitMQ's sbin folder, so we can easily add new plugins.  Keep in mind these tasks have to be ran with sudo.  If you do not, the build will fail do to permissions issues.

Finally, for the purposes of my tests, I need two RabbitMQ instances alive so I can perform federation.  You can easily launch a second RabbitMQ instance with the following command in Travis CI:

    - sudo RABBITMQ_NODE_PORT=5673 RABBITMQ_SERVER_START_ARGS="-rabbitmq_management listener [{port,15673}]" RABBITMQ_NODENAME=hare rabbitmq-server -detached

You will need to ensure that the second RabbitMQ node's name does not conflict with the current one.  In the command about, you will notice I specified:  RABBITMQ_NODENAME=hare.  We need to make sure the local Travis CI build box responds to hare on the network.  You can add /etc/hosts aliases in Travis CI like this:

    - rabbit
    - hare

And that's it!  You should have two RabbitMQ server instances, both with their own Management Console instances running in Travis CI.

For the complete .travis.yml head over to my open source Java bindings and utilities for RabbitMQ called Rabbid Management:   

Wednesday, February 12, 2014

Top 10 Technologies I Want My Software Engineers to Know In 2014

While a little late in coming this year, here are my top ten technologies/concepts I want my Software Engineers to Know in 2014.

  1. Learn AngularJS and then Ember.
  2. Compile-to-JavaScript language (CoffeeScript, Dart, TypeScript, Clojure Script).
  3. HTTP/REST + AMQP + MQTT + Kafka (learn all).
  4. Scala or Clojure (not both).
  5. Git, Ansible, and Vagrant.
  6. Amazon Web Services.
  7. Concurrency and Distributed Programming.
  8. Python or Node.js.
  9. Know your NoSQL: Neo4j, Redis, MongoDB, HBase, and Cassandra.
  10. Lambda Architectures.

I think my choices this year reflect a lot of the lessons we (friends, my company, the industry) have learned over 2013.  In many respects, I think 2013 was a sobering year for computing where hype met reality.  These lessons have made me reconsider what is a really important skill verse knowledge of a gimmick technology.  This year you will find less cutting edge technologies and more of a focus on practice (DevOps, architecture, and old-fashioned engineering).

1.  Learn AngularJS and then Ember.

I think the dust is starting to settle on the Client-side MVC battles.  While there may be many viable frameworks out there, I think the communities around AngularJS and Ember have firmly placed them in the top spots.

If you are doing web development and haven't chosen a framework, I recommend you start with AngularJS.  AngularJS is pretty easy to pick up, a decently powerful.  The framework encourages a much more robust programming paradigm reminiscent of Java or C# with it's use of dependency injection, singletons, and factories.  In general, I think the introduction of these programming patterns are one of Angular's biggest strengths, since the patterns lend themselves to better decoupling and testability.  Angular, however, is a little weak in the areas of in-page routing and complex multi-model/aggregate view construction.

This is where Ember shines.  Ember offers incredibly powerful constructs like nested views and controllers, which are deeply tied to routing.  Ember also offers a number of really excellent features like computed values and listeners on model objects, and a very opinionated persistence layer.  Ember's weakness is its learning curve.  It's an incredibly sophisticated technology that requires a large learning investment before you can be productive.  Ember is also extremely opinionated.  If you don't agree with some of those opinions, the rest of the framework will fight you (from the persistence layer to routing to views).  However, if you agree to follow Ember's rules, you will find yourself writing very little code and accomplishing amazing things.

2.  Compile-to-JavaScript language (CoffeeScript, Dart, TypeScript, Clojure Script).

There's a lot of good JavaScript alternatives out there, and it's really going to come down to preference.  Many of these languages not only attempt to solve the inadequacies of JavaScript, but also some of the issues with the browser environment.  CoffeeScript is a great unopinionated alternative to JavaScript (the language does not add anything but language features).  Other languages like Dart attempt to solve issues like importing libraries (replacing the need for frameworks like RequireJS).  So, look around and find the language that best fits your style.

3.  HTTP/REST + AMQP + MQTT + Kafka (learn all).

Learn to communicate!  Well, what I should say is that no app is an island.  Scalable engineering implies the ability to distribute tasks across machines, and this implies that you have a mechanism to communicate your intent between applications.  There are many protocols one can use to accomplish this task, but I want to emphasize that there is no single mechanism that solves every communication challenge.One recurring lesson I saw many people learn in 2013 is that architectures need to support multiple protocols.

On the synchronous side, I don't think you get better than HTTP, particularly via a RESTful interface.  It's not that there aren't better (or faster protocols); hell I could have told you to go with Protobuf or Thrift or Avro.  The thing with HTTP is that it's ubiquitous.  There are ton of frameworks that do everything from securing communication, to simplifying development, and load-balancing/scaling the protocol.  You get this for free in pretty much every language!

On the asynchronous side, there are a lot of special-purposed technologies that you should be aware of.  While some technologies are faster, they come with some serious tradeoffs (security, reliability, features).  Learn the use cases of these technologies, and don't be surprised if you need to employ multiple technologies to meet your needs.

My recommendations are the following protocols/frameworks (along with their general purpose):

AMQP (Advanced Message Queuing Protocol) - Secure, reliable, feature-rich.  Generally, this should be your default choice when connecting multiple backend applications together in PubSub. RabbitMQ is a wonderful broker, which I highly recommend, but there are other decent options like Apache QPid and Apache Apollo (next generation of ActiveMQ).

MQTT - low power, simpler pubsub technology commonly used on devices.  The Facebook app uses MQTT to synchronize state between your phone and their site.  The nice thing about MQTT is that the protocol is now supported by many AMQP brokers like RabbitMQ, so you get it for free and can internally bridge AMQP apps with MQTT devices.

Kafka - a messaging framework created by LinkedIn designed for high-throughput messaging.  Kafka has a unique architecture that includes partitioning message streams and batch message transfers.  Kafka is a great tool, but it lacks some of the security and routing features a more traditional AMQP broker would offer, so I generally would not recommend starting with Kafka until you know you have a scalability problem.  It also offers fewer clients than technologies like AMQP, so you might find yourself in a bind if you're looking for an Erlang client for the latest version (0.8).

A final favorable mention goes to ZeroMQ.  ZeroMQ is a great technology (it's blindingly fast), but it does not have a lot of the features of a centralized broker (message persistence and filtering/routing).

4.  Scala or Clojure (not both).

Like many people, I've spent the last couple of years at the periphery of languages like Clojure and Scala, but I never grew out of small projects written in either language.  Over the last couple of months, inspired by some Scala developers I met, I decided to dedicate myself full-time to learning Scala (to great reward).

My efforts to learn both Clojure and Scala led me to the realization that you should not actually try to learn both (at least not at the same time).  Both languages are uniquely complex.  Scala has a breadth of syntactic features that go beyond the simple additions of first-class functions to the Java language.  Clojure, while not syntactically complex, requires developers to change their traditional object-oriented mindset to one of functional constructs like side-effect free functions, immutable data structures, recursion (vice iteration), etc.

From my own experience, I will say that I think Clojure is definitely the more elegant language.  But I think that a transition to Scala will be more natural for traditional Java, C#, and C++ developers.  Last year, I dissuaded people from learning Scala, so let me formally apologize.  Looking back, I realize that it's far more likely for a Java developer to migrate to Scala than to Clojure.

Regardless of which language you choose, my recommendation is to only focus on one.  Write a lot of code in that language; if you're still not satisfied, contemplate learning the other.

5.  Git, Ansible, and Vagrant.

DevOps is here to stay (and thank god!).  The era of "runs on my box" is over.  If you cannot check out a project on Github or Bitbucket, provision a development environment, and get to developing, you need to change your process.  In the past, too much time was spent migrating software over to a development or deployment environment and manually installing/configuring it.  The practice encouraged a number of incredibly bad habits like having a different development and deployment profiles, which leads to brittle software and bugs.

Vagrant is a virtual machine automation tool.  Using a configuration file, it is capable of acquiring prebuilt virtual machines (aka "boxes") from a central repository and applying provisioning scripts/tasks against that virtual machine to achieve a fully configured state (ready to develop!).  The Vagrant configuration also allows you to configure the underlying VM by specifying network settings, as well as, indicating local shared directories (between the VM and the host machine).  Vagrant supports a number of provisioning technologies including BASH scripts, Puppet, Chef, and Ansible.

Ansible is an elegant provisioning and management technology.  It's like Puppet and Chef, but instead of writing provisioning tasks in Ruby you declare tasks via YAML (you're not programming in some imperative or OOP style).  Instead of requiring software, typically in the form of a daemon, to be installed on the box, Ansible uses SSH to copy over Python scripts which executes your tasks on the local box.  Since most Linux OS's come with Python preinstalled, Ansible supports a wide variety of platforms out of the box.

Finally, Ansible scripts and Vagrant configuration can and should be checked into source control.  In fact, it's not uncommon for the files to be apart of your development project's repository.  Typically, the Vagrant configuration sits at the root of the repository, and your build folder is mounted as a directory on your VM. This way, every time you recompile the project, the files are available of the VM machine.  This makes developing web applications incredibly easy without having to install the runtime on your development box.

6.  Amazon Web Services.

Last year I recommended learning a Platform as a Service.  Ironically, the one PaaS I left off was Amazon's.  While some of the platforms remain compelling (Heroku), I find myself drawn more and more to the Amazon platform.  While Amazon lacks some of the richer features found in Heroku, it also doesn't get as opinionated.  Your provisioning options are essentially limited to Elastic Beanstalk (full control of your application by Amazon) or creating your own EC2 instance.  While most developers would desire something more in the middle, I'm increasingly finding myself either wanting a platform to take care of all of my concerns, or virtually none of them.  While Amazon may not have some of the cool framework/database specific features Heroku may have, you still get a lot: load balancing, DNS, queuing, VM hosting, MapReduce, storage, monitoring, etc.  More importantly, the prices on AWS have gone down or stayed flat over the last couple of years!

7.  Concurrency and Distributed Programming.

This is not so much a technology, but an area of study.  There are a lot of frameworks in this category, so may I recommend the most obvious: "java.util.concurrent".  I've spent a lot of time over the last couple of months revisiting the topic of concurrency and I have found a lot of gaps in my own knowledge (and I know many others have the same gaps).  While there are many frameworks that simplify concurrent programming, Akka and Guava in particular, knowing how they solve the problem is incredibly important.  You may find that they have made tradeoffs that you don't agree with.  This is often pointed out by the emerging "mechanical sympathy" community lead by Martin Thompson.  Martin's LMAX Disruptor framework, is proving the importance of understanding the design of the processor when writing software.

Distributed programming is another topic that goes hand-in-hand with concurrency.  Typically, developers benefit from the hard work other people have invested in this topic in the form of NoSQL databases and distributed platforms like Hadoop.  But as architectures become more sophisticated, many are realizing that these frameworks don't meet all of their needs.  Last year I recommend Apache Zookeeper as a critical technology to learn, and I continue to support that notion.  I would also like to recommend the companion Apache Curator project that makes working with ZooKeeper easier.  Netflix's Hystrix framework is another excellent tool for writing software in environments that are likely to experience failure.

I recommend, if you haven't done it already, picking up the following books:
- Java Concurrency in Practice
- Java Threads

Unfortunately, I don't have any recommendations on distributed computing books, but would welcome any suggestions.

8.  Python or Node.js.

In addition to have Java/C#/C++ chops I think it's increasingly important to learn a good scriptable platform/language.  Confidence in either language will allow you to write the installers or glue code needed to get your apps up and running on virtually any environment.  Personally, I don't think you could go wrong with either Python or Node.js, and I don't think you need to learn both.  While, Node.js is the darling platform at the moment (and reuses your JavaScript knowledge), Python is nothing to scoff at, remaining one of the most important and widely used languages/platforms around.

9.  Know your NoSQL: Neo4j, Redis, MongoDB, HBase, and Cassandra.

There is now a plethora of NoSQL solutions available to developers nowadays and none of them solve every use case you might have.  This is the reason why so many companies are employing polyglot architectures.  So, I think it's important that you know which problems the different NoSQL solve.  The list provided above are solutions I would recommend because of their relative maturity and openness.  There are many more viable solutions, and if I left one off the list that you love, I apologize.  Regardless of your feelings, these are the databases I find particularly interesting that you should check out (maybe take a code test-drive).

10.  Lambda Architectures.

So what happened to Hadoop and Storm, which were both on the top ten list last year?  Well, they're simply getting merged together as Lambda Architectures!

This is definitely going to be the new buzz word in 2014, but it does have a ton of merit.  Lambda Architecture is the principle of combining batch processing and real-time analysis into a coherent architecture.  The idea is to provide accurate views of data, where those views are served from both the batch (historical data) and real-time (recent data) layer.  Nathan Marz's new book Big Data (Manning) discusses the implication of this type of architecture and essentially how to build one.  As Nathan points out, this type of architecture has existed for decades, it's just become a lot more complex because of the amount of data we now have to work with.  In recent years, we've seen open source frameworks developed to address aspects of this problem (Hadoop, Hive, Pig, Impala, Spark, Storm, Samza), but we have yet to see the combination of the technologies into an out of the box Lambda Architecture.  There's some indication that this will be the next big play in the cloud technology landscape; we're already hearing rumblings of Lambda Architecture frameworks/platforms like Lambdoop (


As we come out of the hype cycle, I think we will find that 2014 is a year of addressing serious technology challenges.

Webpages continue to dominate the User Interface, with new technologies like Apache Cordova cross-compiling web apps to mobile.  Having skill in both JavaScript (better yet, a compile-to-JavaScript language) and Client-side MVC is essential to reaching end-users.

The JVM remains the dominant enterprise platform and probably will remain so for some time.  This doesn't mean you don't need to upgrade your language skills.  Engineers in other languages are writing twice as much code as you while you write a tedious amount of boilerplate code to do virtually nothing.  Give yourself the edge by adopting Scala or Clojure.

Knowing another JVM language is also not enough.  You need to understand core programming paradigms like concurrent and distributed computing if you're going to remain relevant.  The Java backend is increasingly shifting further away from the web-tier (it's simply too easy to write an web app in Node.js compared to Java).  This means your skills are best served in moving and crunching data; so get savvy.

And part of being savvy is knowing your options, particularly in NoSQL.  I've recommended a number of mature NoSQL frameworks, but the list is not exhaustive (I've left plenty of interesting solutions off the list: Riak, TitanDB, LevelDB, RethinkDB, MarkLogic XML Server, Accumulo, CouchDB, etc). Read up on these solutions and attempt to apply them to your current use cases.

Many organizations are still struggling with things like continuous deployment, but tools are getting better.  I see a lot of promising work coming out of the DevOps community, especially Ansible, Docker, and Vagrant.  Developers also need to have the skill to script installations when using BASH or an automation framework becomes impractical.  Languages like Python and JavaScript/Node.js fill this role.

Once apps are ready to deploy, having an environment like AWS to deploy onto is essential.  EC2 still remains the ideal platform to deploy complex applications without sandbox limitations found in most PaaS's.  Simply point those Ansible playbooks you developed for your Vagrant environment to your remote EC2 instance and get your application up and running in minutes.

Lambda Architectures also represent a fundamental shift away from the concept that Hadoop can do everything, to a need for layered multi-pipeline analytic systems.  Hopefully we will see startups like Lambdoop or old-timers like Hortonworks and Cloudera being to bridge the capability gaps between batch and real-time.

I hope this list has helped you.  If you have any questions or want to call BS on any of the points, I'd be happy to address your comments.