Thursday, October 31, 2013

Meetup @ CREL Tech Talk SDSU: YARN - Method of Sharing Clusters Beyond Hadoop

Omkar Joshi presented (virtually) YARN - Method of Sharing Clusters Beyond Hadoop last night at a tech talk hosted by UCSD's Center for Research in Entertainment and Learning (CREL).  The talk covered the YARN architecture and discuss the interactions between services like the Resource, Application, and Node Manager.  The big take away was that YARN is ready to go; release 2+ is out and rolled into all major Hadoop distributions target Hadoop 2.0.

Omkar presented via Google Hangout.  There were about 20 attendees at the meet-up.

The audience asked a number of questions about YARN and it's status.  Here's a summarized version of the Q&A (from what I can remember):

1.  How does YARN ensure MapReduce processes are run on the same machine as the data it's going to process (locality of data)?

This is not so much a function of YARN, as much as the implementation of the MapReduce YARN ApplicationMaster (i.e. the YARN app that has replaced the traditional MapReduce framework).  The MapReduce ApplicationMaster communicates with the NameNode to determine which DataNodes have the blocks that needs to be processed and then requests MapReduce containers be launched specifically on those nodes (to process said blocks).  Therefore, the YARN Resource Manager is unaware of HDFS and the location of blocks, it's the MapReduce ApplicationMaster's job to figure that out.

2.  Is YARN rack-aware?

Yes YARN is rack-aware and you can request resources based on rack id.

3.  Is the YARN API stable? -- given that the old Distributed Shell example used to be broken in the source repository...

In short, yes.  The API experienced a brief period of instability as they migrated from version 1 to version 2.  The example projects have been fixed and checked in.

4.  How does YARN fit into the new set of Apache Platform as a Service project (Mesos, Stratos, etc.)?

YARN currently only has one implementation of the ResourceManager, but it would be possible to implement one on top of Mesos (but this would be non-trivial).

The subtext I got here is that there's a lot of overlap in the ecosystem and the efforts are necessarily in sync.

5.  Is YARN production ready?

Yes.  Many organizations are already using it and some popular frameworks like Storm and Spark have been ported to the platform.

6.  Is there a commitment by the major vendors to support YARN?

Yes, Hortonworks and Cloudera are specifically supporting development and rolling features into their distributions.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.