Wednesday, February 12, 2014

Top 10 Technologies I Want My Software Engineers to Know In 2014

While a little late in coming this year, here are my top ten technologies/concepts I want my Software Engineers to Know in 2014.

  1. Learn AngularJS and then Ember.
  2. Compile-to-JavaScript language (CoffeeScript, Dart, TypeScript, Clojure Script).
  3. HTTP/REST + AMQP + MQTT + Kafka (learn all).
  4. Scala or Clojure (not both).
  5. Git, Ansible, and Vagrant.
  6. Amazon Web Services.
  7. Concurrency and Distributed Programming.
  8. Python or Node.js.
  9. Know your NoSQL: Neo4j, Redis, MongoDB, HBase, and Cassandra.
  10. Lambda Architectures.

I think my choices this year reflect a lot of the lessons we (friends, my company, the industry) have learned over 2013.  In many respects, I think 2013 was a sobering year for computing where hype met reality.  These lessons have made me reconsider what is a really important skill verse knowledge of a gimmick technology.  This year you will find less cutting edge technologies and more of a focus on practice (DevOps, architecture, and old-fashioned engineering).

1.  Learn AngularJS and then Ember.

I think the dust is starting to settle on the Client-side MVC battles.  While there may be many viable frameworks out there, I think the communities around AngularJS and Ember have firmly placed them in the top spots.

If you are doing web development and haven't chosen a framework, I recommend you start with AngularJS.  AngularJS is pretty easy to pick up, a decently powerful.  The framework encourages a much more robust programming paradigm reminiscent of Java or C# with it's use of dependency injection, singletons, and factories.  In general, I think the introduction of these programming patterns are one of Angular's biggest strengths, since the patterns lend themselves to better decoupling and testability.  Angular, however, is a little weak in the areas of in-page routing and complex multi-model/aggregate view construction.

This is where Ember shines.  Ember offers incredibly powerful constructs like nested views and controllers, which are deeply tied to routing.  Ember also offers a number of really excellent features like computed values and listeners on model objects, and a very opinionated persistence layer.  Ember's weakness is its learning curve.  It's an incredibly sophisticated technology that requires a large learning investment before you can be productive.  Ember is also extremely opinionated.  If you don't agree with some of those opinions, the rest of the framework will fight you (from the persistence layer to routing to views).  However, if you agree to follow Ember's rules, you will find yourself writing very little code and accomplishing amazing things.

2.  Compile-to-JavaScript language (CoffeeScript, Dart, TypeScript, Clojure Script).

There's a lot of good JavaScript alternatives out there, and it's really going to come down to preference.  Many of these languages not only attempt to solve the inadequacies of JavaScript, but also some of the issues with the browser environment.  CoffeeScript is a great unopinionated alternative to JavaScript (the language does not add anything but language features).  Other languages like Dart attempt to solve issues like importing libraries (replacing the need for frameworks like RequireJS).  So, look around and find the language that best fits your style.

3.  HTTP/REST + AMQP + MQTT + Kafka (learn all).

Learn to communicate!  Well, what I should say is that no app is an island.  Scalable engineering implies the ability to distribute tasks across machines, and this implies that you have a mechanism to communicate your intent between applications.  There are many protocols one can use to accomplish this task, but I want to emphasize that there is no single mechanism that solves every communication challenge.One recurring lesson I saw many people learn in 2013 is that architectures need to support multiple protocols.

On the synchronous side, I don't think you get better than HTTP, particularly via a RESTful interface.  It's not that there aren't better (or faster protocols); hell I could have told you to go with Protobuf or Thrift or Avro.  The thing with HTTP is that it's ubiquitous.  There are ton of frameworks that do everything from securing communication, to simplifying development, and load-balancing/scaling the protocol.  You get this for free in pretty much every language!

On the asynchronous side, there are a lot of special-purposed technologies that you should be aware of.  While some technologies are faster, they come with some serious tradeoffs (security, reliability, features).  Learn the use cases of these technologies, and don't be surprised if you need to employ multiple technologies to meet your needs.

My recommendations are the following protocols/frameworks (along with their general purpose):

AMQP (Advanced Message Queuing Protocol) - Secure, reliable, feature-rich.  Generally, this should be your default choice when connecting multiple backend applications together in PubSub. RabbitMQ is a wonderful broker, which I highly recommend, but there are other decent options like Apache QPid and Apache Apollo (next generation of ActiveMQ).

MQTT - low power, simpler pubsub technology commonly used on devices.  The Facebook app uses MQTT to synchronize state between your phone and their site.  The nice thing about MQTT is that the protocol is now supported by many AMQP brokers like RabbitMQ, so you get it for free and can internally bridge AMQP apps with MQTT devices.

Kafka - a messaging framework created by LinkedIn designed for high-throughput messaging.  Kafka has a unique architecture that includes partitioning message streams and batch message transfers.  Kafka is a great tool, but it lacks some of the security and routing features a more traditional AMQP broker would offer, so I generally would not recommend starting with Kafka until you know you have a scalability problem.  It also offers fewer clients than technologies like AMQP, so you might find yourself in a bind if you're looking for an Erlang client for the latest version (0.8).

A final favorable mention goes to ZeroMQ.  ZeroMQ is a great technology (it's blindingly fast), but it does not have a lot of the features of a centralized broker (message persistence and filtering/routing).

4.  Scala or Clojure (not both).

Like many people, I've spent the last couple of years at the periphery of languages like Clojure and Scala, but I never grew out of small projects written in either language.  Over the last couple of months, inspired by some Scala developers I met, I decided to dedicate myself full-time to learning Scala (to great reward).

My efforts to learn both Clojure and Scala led me to the realization that you should not actually try to learn both (at least not at the same time).  Both languages are uniquely complex.  Scala has a breadth of syntactic features that go beyond the simple additions of first-class functions to the Java language.  Clojure, while not syntactically complex, requires developers to change their traditional object-oriented mindset to one of functional constructs like side-effect free functions, immutable data structures, recursion (vice iteration), etc.

From my own experience, I will say that I think Clojure is definitely the more elegant language.  But I think that a transition to Scala will be more natural for traditional Java, C#, and C++ developers.  Last year, I dissuaded people from learning Scala, so let me formally apologize.  Looking back, I realize that it's far more likely for a Java developer to migrate to Scala than to Clojure.

Regardless of which language you choose, my recommendation is to only focus on one.  Write a lot of code in that language; if you're still not satisfied, contemplate learning the other.

5.  Git, Ansible, and Vagrant.

DevOps is here to stay (and thank god!).  The era of "runs on my box" is over.  If you cannot check out a project on Github or Bitbucket, provision a development environment, and get to developing, you need to change your process.  In the past, too much time was spent migrating software over to a development or deployment environment and manually installing/configuring it.  The practice encouraged a number of incredibly bad habits like having a different development and deployment profiles, which leads to brittle software and bugs.

Vagrant is a virtual machine automation tool.  Using a configuration file, it is capable of acquiring prebuilt virtual machines (aka "boxes") from a central repository and applying provisioning scripts/tasks against that virtual machine to achieve a fully configured state (ready to develop!).  The Vagrant configuration also allows you to configure the underlying VM by specifying network settings, as well as, indicating local shared directories (between the VM and the host machine).  Vagrant supports a number of provisioning technologies including BASH scripts, Puppet, Chef, and Ansible.

Ansible is an elegant provisioning and management technology.  It's like Puppet and Chef, but instead of writing provisioning tasks in Ruby you declare tasks via YAML (you're not programming in some imperative or OOP style).  Instead of requiring software, typically in the form of a daemon, to be installed on the box, Ansible uses SSH to copy over Python scripts which executes your tasks on the local box.  Since most Linux OS's come with Python preinstalled, Ansible supports a wide variety of platforms out of the box.

Finally, Ansible scripts and Vagrant configuration can and should be checked into source control.  In fact, it's not uncommon for the files to be apart of your development project's repository.  Typically, the Vagrant configuration sits at the root of the repository, and your build folder is mounted as a directory on your VM. This way, every time you recompile the project, the files are available of the VM machine.  This makes developing web applications incredibly easy without having to install the runtime on your development box.

6.  Amazon Web Services.

Last year I recommended learning a Platform as a Service.  Ironically, the one PaaS I left off was Amazon's.  While some of the platforms remain compelling (Heroku), I find myself drawn more and more to the Amazon platform.  While Amazon lacks some of the richer features found in Heroku, it also doesn't get as opinionated.  Your provisioning options are essentially limited to Elastic Beanstalk (full control of your application by Amazon) or creating your own EC2 instance.  While most developers would desire something more in the middle, I'm increasingly finding myself either wanting a platform to take care of all of my concerns, or virtually none of them.  While Amazon may not have some of the cool framework/database specific features Heroku may have, you still get a lot: load balancing, DNS, queuing, VM hosting, MapReduce, storage, monitoring, etc.  More importantly, the prices on AWS have gone down or stayed flat over the last couple of years!

7.  Concurrency and Distributed Programming.

This is not so much a technology, but an area of study.  There are a lot of frameworks in this category, so may I recommend the most obvious: "java.util.concurrent".  I've spent a lot of time over the last couple of months revisiting the topic of concurrency and I have found a lot of gaps in my own knowledge (and I know many others have the same gaps).  While there are many frameworks that simplify concurrent programming, Akka and Guava in particular, knowing how they solve the problem is incredibly important.  You may find that they have made tradeoffs that you don't agree with.  This is often pointed out by the emerging "mechanical sympathy" community lead by Martin Thompson.  Martin's LMAX Disruptor framework, is proving the importance of understanding the design of the processor when writing software.

Distributed programming is another topic that goes hand-in-hand with concurrency.  Typically, developers benefit from the hard work other people have invested in this topic in the form of NoSQL databases and distributed platforms like Hadoop.  But as architectures become more sophisticated, many are realizing that these frameworks don't meet all of their needs.  Last year I recommend Apache Zookeeper as a critical technology to learn, and I continue to support that notion.  I would also like to recommend the companion Apache Curator project that makes working with ZooKeeper easier.  Netflix's Hystrix framework is another excellent tool for writing software in environments that are likely to experience failure.

I recommend, if you haven't done it already, picking up the following books:
- Java Concurrency in Practice
- Java Threads

Unfortunately, I don't have any recommendations on distributed computing books, but would welcome any suggestions.

8.  Python or Node.js.

In addition to have Java/C#/C++ chops I think it's increasingly important to learn a good scriptable platform/language.  Confidence in either language will allow you to write the installers or glue code needed to get your apps up and running on virtually any environment.  Personally, I don't think you could go wrong with either Python or Node.js, and I don't think you need to learn both.  While, Node.js is the darling platform at the moment (and reuses your JavaScript knowledge), Python is nothing to scoff at, remaining one of the most important and widely used languages/platforms around.

9.  Know your NoSQL: Neo4j, Redis, MongoDB, HBase, and Cassandra.

There is now a plethora of NoSQL solutions available to developers nowadays and none of them solve every use case you might have.  This is the reason why so many companies are employing polyglot architectures.  So, I think it's important that you know which problems the different NoSQL solve.  The list provided above are solutions I would recommend because of their relative maturity and openness.  There are many more viable solutions, and if I left one off the list that you love, I apologize.  Regardless of your feelings, these are the databases I find particularly interesting that you should check out (maybe take a code test-drive).

10.  Lambda Architectures.

So what happened to Hadoop and Storm, which were both on the top ten list last year?  Well, they're simply getting merged together as Lambda Architectures!

This is definitely going to be the new buzz word in 2014, but it does have a ton of merit.  Lambda Architecture is the principle of combining batch processing and real-time analysis into a coherent architecture.  The idea is to provide accurate views of data, where those views are served from both the batch (historical data) and real-time (recent data) layer.  Nathan Marz's new book Big Data (Manning) discusses the implication of this type of architecture and essentially how to build one.  As Nathan points out, this type of architecture has existed for decades, it's just become a lot more complex because of the amount of data we now have to work with.  In recent years, we've seen open source frameworks developed to address aspects of this problem (Hadoop, Hive, Pig, Impala, Spark, Storm, Samza), but we have yet to see the combination of the technologies into an out of the box Lambda Architecture.  There's some indication that this will be the next big play in the cloud technology landscape; we're already hearing rumblings of Lambda Architecture frameworks/platforms like Lambdoop (http://www.lambdoop.com/).

Conclusion

As we come out of the hype cycle, I think we will find that 2014 is a year of addressing serious technology challenges.

Webpages continue to dominate the User Interface, with new technologies like Apache Cordova cross-compiling web apps to mobile.  Having skill in both JavaScript (better yet, a compile-to-JavaScript language) and Client-side MVC is essential to reaching end-users.

The JVM remains the dominant enterprise platform and probably will remain so for some time.  This doesn't mean you don't need to upgrade your language skills.  Engineers in other languages are writing twice as much code as you while you write a tedious amount of boilerplate code to do virtually nothing.  Give yourself the edge by adopting Scala or Clojure.

Knowing another JVM language is also not enough.  You need to understand core programming paradigms like concurrent and distributed computing if you're going to remain relevant.  The Java backend is increasingly shifting further away from the web-tier (it's simply too easy to write an web app in Node.js compared to Java).  This means your skills are best served in moving and crunching data; so get savvy.

And part of being savvy is knowing your options, particularly in NoSQL.  I've recommended a number of mature NoSQL frameworks, but the list is not exhaustive (I've left plenty of interesting solutions off the list: Riak, TitanDB, LevelDB, RethinkDB, MarkLogic XML Server, Accumulo, CouchDB, etc). Read up on these solutions and attempt to apply them to your current use cases.

Many organizations are still struggling with things like continuous deployment, but tools are getting better.  I see a lot of promising work coming out of the DevOps community, especially Ansible, Docker, and Vagrant.  Developers also need to have the skill to script installations when using BASH or an automation framework becomes impractical.  Languages like Python and JavaScript/Node.js fill this role.

Once apps are ready to deploy, having an environment like AWS to deploy onto is essential.  EC2 still remains the ideal platform to deploy complex applications without sandbox limitations found in most PaaS's.  Simply point those Ansible playbooks you developed for your Vagrant environment to your remote EC2 instance and get your application up and running in minutes.

Lambda Architectures also represent a fundamental shift away from the concept that Hadoop can do everything, to a need for layered multi-pipeline analytic systems.  Hopefully we will see startups like Lambdoop or old-timers like Hortonworks and Cloudera being to bridge the capability gaps between batch and real-time.

I hope this list has helped you.  If you have any questions or want to call BS on any of the points, I'd be happy to address your comments.

5 comments:

  1. Another great post!

    I'm keen to catch up with what the technologies the real world is doing and using, as opposed to my standard Java technology bubble I appear to have found myself in for several years now. These lists go a long way to point me in a direction as to what is worthwhile focusing myself and my team's learning on. My reading into these, so far, has already highlighted several eye opening technologies that could be really good ways to us to go in the near future. And I've only just scratched the surface!

    ReplyDelete
    Replies
    1. Thanks Ian. I'm glad you found it useful.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Great list! Couldn't agree more. I especially agree that knowing the basics of concurrent and distributed programming and queing mechanism are essential.

    I'm a Java Enterprise & Oracle Developer. But never got friendly with Javascript and general web development. Could you recommend a good book for learning Javascript?

    ReplyDelete
    Replies
    1. Michael, thank you for the comment (and sorry for the slow reply). I started developing in JavaScript in 1999, so I haven't really kept up with JavaScript literature. I've heard good things about "Eloquent JavaScript" and "Secrets of the JavaScript Ninja". The later book is by John Resig, creator of jQuery and considered one of the though leaders of the JavaScript community.

      Delete