Friday, October 12, 2012

Convert Lat, Lon to MGRS (vice versa) in Java

Problem


Need to convert between MGRS and Latitude, Longitude pairs in Java. 

There are a couple of libraries out there, but they have unfriendly open source licenses (no redistribute unless your project is same license, requires commercial license in production, etc.).

Potential


NASA has converted the old NGA GDAL libraries written in C to Java embedded in NASA's World Wind application.  Unfortunately, that library is huge and littered with dependencies on AWT, graphics libraries, etc.  World Wind also has it's own logging facade embedded in the implementation.

On the other hand, the World Wind project has a very nice license:  NASA Open Source Agreement v1.3.

Solution


1.  Rip out all the unnecessary code out of World Wind.
2.  Attribute the code.
3.  Provide the changes to the world:  https://github.com/Berico-Technologies/Geo-Coordinate-Conversion-Java
4.  Move on and do more important things.

Example


package com.berico.coords;

public class Example 
{
  public static void main( String[] args )
  {  
      String mgrs = Coordinates.mgrsFromLatLon(37.10, -112.12);
      
      System.out.println(mgrs);
      
      double[] latLon = Coordinates.latLonFromMgrs(mgrs);
      
      System.out.println(
        String.format("%s, %s", latLon[0], latLon[1]));
  }
}

Which prints...
12SVG 00476 06553
37.10000369375159, -112.1199968717142

Get it on GitHub: https://github.com/Berico-Technologies/Geo-Coordinate-Conversion-Java




Sunday, September 30, 2012

Introducing "CLAVIN" (Cartographic Location And Vicinity INdexer)

What is a Geotagger?


If your work involves finding meaning in unstructured text, you may have at one time or another, worked with semantic technologies like entity extraction.  An Entity Extractor promotes words in text to concepts; this is typically realized in the form of entity tagging, where an ontology is associated with a word or phrase (e.g. PERSON, PLACE, TIME, ORGANIZATION).  Once entities have been "tagged", the next step is to "resolve" them to a global concept or entity (entity resolution).  For instance, we not only want to know that "Barack Obama" is a PERSON, we also want any reference to Barack Obama to point to one "identifier".  This way, we can associate an entity across all of documents they have occurred in, allowing us to do things like build a global graph of concepts or perform faceted searches against those concepts.

Perhaps one of the most important forms of entity resolution is associating locations to geographic coordinates, commonly known as Geotagging (which may also encompass the entity resolution step).  For instance, we not only want to know that New York City is a LOCATION, but also, that it's center latitude and longitude is 40.7142° N, 74.0064° W, the location is in the "New York State" administrative district, and in the country of the United States of America.  More sophisticated resolution techniques might even include the polygonal boundaries of the location, and an association to a semantic graph of concepts related to the city and it's history.

The Problem


For years the Geotagging market has been dominated by a very small number couple commercial products; many entity extractors can identify locations, but few actually resolve that location to a fixed point in space.   As far as I'm aware, the most used is MetaCarta (http://www.metacarta.com/), one I have personally used on a number of projects.  MetaCarta is really good in terms of accuracy and features; in fact, most systems I've seen MetaCarta deployed in only use about 25% of its featurs.  The problem with MetaCarta is that it is expensive (I don't have figures, I've only seen my customers cringe when talking about price).  

Yahoo also has a Geotagger in the form of a web service offering called Placemaker (http://developer.yahoo.com/geo/placemaker/).   For us, Placemaker has never been a viable solution since it can't be deployed on an internal network, and doesn't fit well with our architectural use cases.  Placemarker also doesn't seemed to be well tuned to our corpora, meaning it's has lower than ideal precision in extraction. 

Outside of the commercial space, there are no viable open source alternatives.  A quick Google Search for open source Geotaggers will probably return Geodict (https://github.com/petewarden/geodict), a GitHub project by Pete Warden.  Pete combines a Gazetteer (geospatial dictionary) with some simple rules for locating potential places in a sentence (presence of key words like "in" or "near") in a brute force approach to solving the problem.  Unfortunately, Geodict's approach doesn't take semantic meaning of the sentence into account when locating potential "place" words, and it doesn't perform the resolution step (differentiating locations by context: the "Springfield" problem).

Introducing CLAVIN


Necessity is the mother of invention.  - Unknown

Early this year, our company found itself desperately in need of a Geotagger.  Our enterprise search application, built around geospatial faceting, lacked the geospatial entities we needed to do the faceting.  We had documents, just no geospatial tagging.  Architecturally, an upstream component in the ETL pipeline was supposed to provide this capability (using MetaCarta), but for one reason or another, the team producing that capability was not going to be able to make the delivery timeframe (I should add that this was no fault of the MetaCarta product).

One of our Berico Technologies' Data Scientists, Charlie Greenbacker, was working on another, unrelated problem involving the resolution of country names across a collection of structured datasets (Excel and CSV documents) that we were "mashing together" so we could do analysis across datasets. Recognizing that both problems had a similar solution, Charlie began work on a homegrown geotagger that eventually became CLAVIN: Cartographic Location And Vicinity INdexer (http://clavin.bericotechnologies.com/).

What is CLAVIN?


It's a geotagger (and resolver).  Architecturally, CLAVIN is extremely simple.  CLAVIN was written in Java, but can be bundled in a Java Web Application as a web service allowing any application to access it (as seen in our CLAVIN-Web demonstration).  

CLAVIN has a simple workflow.  An EntityTagger is used to find unresolved (string) PEOPLE, PLACES, and LOCATIONS from a string (multiline, complex quotation, etc.).  Once those entities are extracted from the text, they are passed to a LocationResolver that returns the most confident match (ResolvedLocation) for each location in the set.

The default EntityTagger implementation in CLAVIN is the Apache OpenNLP framework.  Apache OpenNLP is the most license friendly framework we could bundle with CLAVIN; the most accurate EntityTagger we have implemented is one utilizing the Stanford NER, which we don't provide outside of service contracts since it's GPL.

Our default LocationResolver uses a custom Apache Lucene index of the GeoNames Gazetteer (http://www.geonames.org/).  The LocationResolver includes tunable algorithms for performing fuzzy and probabilistic matching of locations.  Since you sacrifice performance for accuracy and vice versa, the LocationResolver is a great abstraction for a number of strategies you may need to employ in your system.  Another benefit of CLAVIN is that we maintain the resolver index (one less thing you need to worry about).

Code Example


This is how simple it is to use CLAVIN under the current API.  Keep in mind there will be some changes before it's official release mid October.
// Location of the initializer for Stanford NER
String classifierModelPath = 
  "/location/of/classifier/all.3class.distsim.crf.ser.gz";
    
// Needed by Stanford NER Implementation
SequenceClassifierProvider classifierProvider 
  = new ExternalSequenceClassiferProvider(classifierModelPath);

// Initialize the Tagger (Sorry, but I'm demonstrating the Stanford NER)
// Tagger at the moment.  Will update with OpenNLP ASAP.
EntityTagger entityTagger = new NerdEntityTagger(classifierProvider);

// Location of the Location Resolver Index
String locationResolverIndexPath = "/location/of/index/IndexDirectory";

// Instantiate the Location Resolver
LocationResolver locationResolver 
  = new LocationResolver(
    new File(locationResolverIndexPath), 3, 5);

// Nothing magic here, just a couple of sentences.
String text = getText();

// Tag the text
TaggedDocument taggedDocument = entityTagger.tagDocument(text);

System.out.println(String.format("%s locations found",
  taggedDocument.getLocations().size()));

// Resolve the locations from the extracted locations
List<ResolvedLocation> resolvedLocations = 
    locationResolver.resolveLocations(taggedDocument.getLocations());

for(ResolvedLocation resolvedLocation : resolvedLocations){
  
  System.out.println(
    String.format("%s (%s, %s)", 
      resolvedLocation.matchedName, 
      resolvedLocation.geoname.latitude, 
      resolvedLocation.geoname.longitude));
}

The getText() method simply returns the following string:


I visited the Sears Tower in Chicago only to find out there were exciting attractions in Springfield.  After Springfield, Chuck and I drove east through Indiana to West Virginia, stopping in Harper's Ferry.  We finally made it to our destination in Washington, DC on Tuesday.


And we get the following results on the console:


7 locations found
Chicago (41.85003, -87.65005)
Springfield (39.80172, -89.64371)
Springfield (39.80172, -89.64371)
Indiana (40.00032, -86.25027)
West Virginia (38.50038, -80.50009)
Washington (38.89511, -77.03637)
DC (38.91706, -77.00025)



More Information


If you want to know more about CLAVIN, Charlie will be speaking at GEOINT 2012 in Orlando, FL  (October 8-11) and hopefully at Strata Santa Clara next year (YouTube proposal below):



If you have any other questions, just leave me comment.  

Batch Processing Movies with Ruby and the HandBrake Command Line Interface

A project I was recently working on required a lot of video transformation work, of which, 98% could be automated with the right tools.  More importantly, I was constantly tweaking video settings (aspect ratio, quality), which required me to reprocess the whole batch of videos.  After a serious case of carpal tunnel from repetitively clicking the same commands in HandBrake, I decided I needed to find a better strategy.

Back in the day, I used to be pretty savvy with encoding tools, but ever since I discovered HandBrake (http://handbrake.fr/), I've never really had the need to keep up.  After looking at FFMPEG and a couple of other CLI tools, I discovered HandBrake had a CLI interface (http://handbrake.fr/downloads2.php) that was, like the GUI, ridiculously easy to use.

For example, my task included shrinking a movie to 560x416 pixels and stripping the audio:
HandBrakeCLI -i input.mp4 -o output.mp4 -e x264 -2 -O -q 20 \
  -a none -w 560 -l 416 --modulus 16 --loose-anamorphic
This is a breakdown of the task's parameters:
  • -i = input file
  • -o = output file
  • -e = encoder
  • -2 = two pass encoding
  • -O = optimize for HTTP streaming
  • -q = quality
  • -a = audio
  • -w = width
  • -l = height
  • --modulus = ratio for resizing the video
  • --loose-anamoprhic = ensures dimensions are resized cleanly by the modulus
The CLI is pretty simple, and the documentation is uncharacteristically (for open source)  complete.  To see a more complete list of arguments for the CLI, please see:  https://trac.handbrake.fr/wiki/CLIGuide.

Now all we need to do is throw in some automation.  This could be done in a number of languages (BASH, Python, Perl, NodeJS), but for tasks like this, I prefer Ruby.

The following is an easy little script for looping over all of the mp4's in a directory and executing a HandBrake encoding (in my case two):
def get_base_name(video)

  video[0, video.index(".mp4")]
end

Dir.glob("*.mp4") do |input_video|

  base_name = get_base_name input_video

  output_video_med = "../#{base_name}_560x416.mp4"

  output_video_sml = "../#{base_name}_224x160.mp4"

  settings = "-e x264 -2 -O -q 20 -a none"

  picture_med = "-w 560 -l 416 --modulus 16 --loose-anamorphic"

  picture_sml = "-w 224 -l 160 --modulus 16 --loose-anamorphic"

  command_med = "HandBrakeCLI -i #{input_video} -o #{output_video_med} #{settings} #{picture_med}"

  command_sml = "HandBrakeCLI -i #{input_video} -o #{output_video_sml} #{settings} #{picture_sml}"

  puts `#{command_med}`

  puts `#{command_sml}`

end
If you aren't familiar with Ruby, a statement encapsulated with back-ticks (`statement`) will be executed on the command line.

As you can see, the process is pretty simple and is likely to save you a "boat load" of time if you have to perform repetitive video processing tasks.  I can also see this process being integrated into a solution that involves automatically transcoding videos for a rich media site that allows users to upload videos.


Monday, September 24, 2012

Rules Engine Patterns - Part 4: Result Compilation

Instead of directly reacting to rule conditions, or directly mutating the model, in this pattern you will record and collect outcomes of rule conditions.  When the rule session is complete, the application will collect the results and do something with them (e.g.: save them to a database or forward them to different services).

Benefits:
  • Keep a history of the rule outcomes without needing to mutate the model objects inserted into the session.
Disadvantages:
  • A lot more scaffolding to employ.  You will need to create a model around the "plausible outcomes" for your rules, as well as, the scaffolding to collect those outcomes and react to the results.
Example:  Rich's Parcel and Post Service needs to route packages based on a number of rules governed by a package's weight and distance from the company's collection center.  If the package is less than 200 lbs., and within 500 miles of the collection center, the package will be delivered locally (dropped in a mail box!).  If the package is less than 200 lbs., but more than 500 miles away, it will be delivered via Air Mail.  Finally, if the package is more than 200 lbs., it will be delivered by train.

The implementation of the routing system will be performed using decorators, which will allow us to easily wrap the original parcel with new functionality (although, we will really only use it to distinguish the class type).

Parcel.java

A simple interface for Parcels in our package handling system.
package com.berico.rc;

public interface Parcel {

 public abstract double getWeight();

 public abstract int getDestinationZipCode();

}

BaseParcel.java

A simple implementation of the Parcel interface.  This class will be used initially for all packages prior to some routing determination being performed by the rules engine.
package com.berico.rc;

public class BaseParcel implements Parcel {

 private double weight = -1;
 
 private int destinationZipCode = -1;

 public BaseParcel(double weight, int destinationZipCode) {
  this.weight = weight;
  this.destinationZipCode = destinationZipCode;
 }

 @Override
 public double getWeight() {
  return weight;
 }

 @Override
 public int getDestinationZipCode() {
  return destinationZipCode;
 }
}

LocalDeliveryParcel.java

A decorator for Parcel, this class represents a package delivered via post office or some other local carrier.
package com.berico.rc;

public class LocalDeliveryParcel implements Parcel {

 private Parcel originalParcel = null;

 public LocalDeliveryParcel(Parcel originalParcel) {

  this.originalParcel = originalParcel;
 }

 public Parcel getOriginalParcel() {
  return originalParcel;
 }

 @Override
 public double getWeight() {
  
  return originalParcel.getWeight();
 }

 @Override
 public int getDestinationZipCode() {
  
  return originalParcel.getDestinationZipCode();
 }
 
 public void routeToPostOffice(){
  
  System.out.println(
   "Dropping package off in a mail box.");
 }
}


TrainParcel.java

A decorator for Parcel, this class represents a package delivered via freight train.
package com.berico.rc;

public class TrainParcel implements Parcel {
 
 private Parcel originalParcel = null;

 public TrainParcel(Parcel originalParcel) {

  this.originalParcel = originalParcel;
 }

 public Parcel getOriginalParcel() {
  return originalParcel;
 }

 @Override
 public double getWeight() {
  
  return originalParcel.getWeight();
 }

 @Override
 public int getDestinationZipCode() {
  
  return originalParcel.getDestinationZipCode();
 }
 
 public void routeToFreightCar(){
  
  System.out.println(
   "Giving package to hobo, destination: Akron, OH.");
 }
}


AirMailParcel.java

A decorator for Parcel, this class represents a package delivered via cargo plane.

package com.berico.rc;

public class AirMailParcel implements Parcel {

 private Parcel originalParcel = null;

 public AirMailParcel(Parcel originalParcel) {

  this.originalParcel = originalParcel;
 }

 public Parcel getOriginalParcel() {
  return originalParcel;
 }

 @Override
 public double getWeight() {
  
  return originalParcel.getWeight();
 }

 @Override
 public int getDestinationZipCode() {
  
  return originalParcel.getDestinationZipCode();
 }
 
 public void routeToPlane(){
  
  System.out.println(
   "Attaching package to underside of biplane.");
 }
}



ResultCompilation.drl

The rules that govern the package routing system.  You will notice the use of a "function" defined within the ruleset to calculate the distance between the call center and a zip code.  This is, of course, a pretend calculation.

package com.berico.rc

// If only it were this simple.
function double distance(int zipcode){
 return zipcode * 0.01;
}

rule "Local Delivery routing"

  when
    parcel : BaseParcel( 
     weight < 200.0, distance(destinationZipCode) < 500) 
  then
    insert( new LocalDeliveryParcel(parcel) );
end

rule "Train routing"

  when
    parcel : BaseParcel( weight >= 200.0) 
  then
    insert( new TrainParcel(parcel) );
end

rule "Air Mail routing"

  when
    parcel : BaseParcel( 
     weight < 200.0, distance(destinationZipCode) > 500) 
  then
    insert( new AirMailParcel(parcel) );
end

ResultCompilationApp.java

A decorator for Parcel, this class represents a package delivered via post office or some other local carrier.

package com.berico.rc;

import java.util.Collection;

import org.drools.runtime.ObjectFilter;

import com.berico.BaseApp;

public class ResultCompilationApp extends BaseApp {

 @Override
 protected String getRuleFile() {
  
  return "ResultCompilation.drl";
 }

 
 public ResultCompilationApp() {
  super();
  
  // Create a bunch of parcels that need
  // to be routed.
  Parcel[] parcels = new Parcel[]{
   new BaseParcel(100, 90210),
   new BaseParcel(200, 90210),
   new BaseParcel(100, 20110),
   new BaseParcel(500, 87234),
   new BaseParcel(1000, 51234)
  };
  
  // Iterate over the parcels...
  for(Parcel parcel : parcels){
   
   // Inserting each parcel into the
   // rule session.
   getSession().insert(parcel);
  }
  
  // Apply all rules against the parcels
  getSession().fireAllRules();
  
  System.out.println("LOCAL DELIVERY PARCELS........");
  
  // Print the parcels that require Local Delivery
  printParcelInfo(LocalDeliveryParcel.class);
  
  System.out.println("TRAIN PARCELS........");
  
  // Print the parcels that require Train delivery
  printParcelInfo(TrainParcel.class);
  
  System.out.println("AIR MAIL PARCELS........");
  
  // Print the parcels that require Air Mail
  printParcelInfo(AirMailParcel.class);
  
  // Kill the session
  getSession().dispose();
 }
 
 /**
  * Simple predicate to get all objects that match the
  * supplied object type.
  */
 public class ByClassTypeFilter implements ObjectFilter {

  private Class<?> targetClass = null;
  
  public ByClassTypeFilter(Class<?> targetClass){
   this.targetClass = targetClass;
  }
  
  @Override
  public boolean accept(Object object) {
   return object.getClass().equals(targetClass);
  }
 }
 
 /**
  * Get the objects of the particular parcel type from
  * the "Working Memory" of Drools and print them weight
  * and destination zip code to the console.
  * @param parcelClass Type of Parcel to retrieve
  */
 protected void printParcelInfo(Class<? extends Parcel> parcelClass){
  
  // Pull the objects from the rule session,
  // by using a custom predicate that looks
  // for a specific Parcel implementation type.
  Collection<Object> oParcels 
   = getSession().getObjects(
    new ByClassTypeFilter(parcelClass));
 
  // Iterate over the matching objects...
  for(Object oParcel : oParcels){
   
   // Cast the object to the interface.
   Parcel parcel = (Parcel)oParcel;
   
   // Print the parcel information
   System.out.println(
    String.format("Weight: %s, Zip Code: %s", 
     parcel.getWeight(), 
     parcel.getDestinationZipCode()));
  }
 }
 
 public static void main(String[] args){
  
  new ResultCompilationApp();
 }
}

On the console, you should see the following message:
LOCAL DELIVERY PARCELS........
Weight: 100.0, Zip Code: 20110
TRAIN PARCELS........
Weight: 500.0, Zip Code: 87234
Weight: 1000.0, Zip Code: 51234
Weight: 200.0, Zip Code: 90210
AIR MAIL PARCELS........
Weight: 100.0, Zip Code: 90210

Rules Engine Patterns - Part 3: Request Adjudication

In this pattern, the Rules Engine will mutate (adjudicate) model objects (requests) supplying the outcome of rules evaluated during the session.

Benefits:
  • Decisions made by rules engine are directly made on the model inserted into the rules session.
  • Knowledge of what to do based on the results (perhaps an infrastructural issue) can be handled outside of the rules engine.
Disadvantages:
  • There is a direct mutation of the model, which may not be palatable.
  • Blend request/adjudication concepts with the model, which may be a concern outside of the domain.
Example:  The state of Virginia has decided to automate Traffic Court using a set of empirically proven rules to evaluate the guilt of traffic violators.  In this example, SpeedingTicketCases will be adjudicated by the rules engine.  If the rule sets find the defendant guilty, it will simply set a guilty verdict (boolean "isGuilty") on the SpeedingTicketCase (using the "adjudicate" method).

SpeedingTicketCase.java

Represents the case being evaluated by the rules engine (not the speeding ticket itself).  If the defendant is guilty, it will be marked on the case itself.

package com.berico.ra;

public class SpeedingTicketCase {

 private boolean isGuilty;
 
 private String vehicleMake = null;
 
 private int mphOverSpeedLimit = -1;

 public SpeedingTicketCase( 
   String vehicleMake,
   int mphOverSpeedLimit) {

  this.vehicleMake = vehicleMake;
  this.mphOverSpeedLimit = mphOverSpeedLimit;
 }

 public boolean isGuilty() {
  return isGuilty;
 }
 
 public void adjudicate(boolean isGuilty){
  this.isGuilty = isGuilty;
 }

 public String getVehicleMake() {
  return vehicleMake;
 }

 public int getMphOverSpeedLimit() {
  return mphOverSpeedLimit;
 }
}

RequestAdjudication.drl

This is the rule set that will be used to determine whether a defendant is guilty.  In our case, regardless of how fast the car was going over the speed limit, Audis will always be guilty.  No one will believe, however, that a Honda can go more than 30 mph past the speed limit.  In such cases, the defendant will be cleared of the charge.

package com.berico.ra

rule "Automatically guilty if Vehicle is Audi"
  when
    speedingTicketCase : SpeedingTicketCase( vehicleMake == "Audi" )
        
  then
    speedingTicketCase.adjudicate(true);
end

rule "If 30 mph over speed limit and Honda, impossible!"
  when
    speedingTicketCase : SpeedingTicketCase( 
      vehicleMake == "Honda", mphOverSpeedLimit > 30 )
         
  then
    speedingTicketCase.adjudicate(false);
end

RequestAdjudicationApp.java

This application will drive the adjudication session.  We begin by creating some cases and inserting them into the rule session.  After calling "fireAllRules", we will evaluate the result of each case by examining the model objects directly.

package com.berico.ra;

import com.berico.BaseApp;

public class RequestAdjudicationApp extends BaseApp {

 @Override
 protected String getRuleFile() {
  
  return "RequestAdjudication.drl";
 }

 public RequestAdjudicationApp(){
  super();
  
  // Create a case that should be deemed "guilty".
  SpeedingTicketCase maryCase = new SpeedingTicketCase("Audi", 1);
  
  // Insert object into session.
  getSession().insert(maryCase);
  
  // Create a case that should be deemed "innocent".
  SpeedingTicketCase richardCase = new SpeedingTicketCase("Honda", 35);
  
  // Insert object into session.
  getSession().insert(richardCase);
  
  // Evaluate the rules against our objects.
  getSession().fireAllRules();
  
  // Dispose the rule session.
  getSession().dispose();
  
  // Print the results of the verdict for Mary
  System.out.println(
   String.format("Is Mary guilty?: %s", maryCase.isGuilty()));
  
  // Print the results of the verdict for Richard
  System.out.println(
   String.format("Is Richard guilty?: %s", richardCase.isGuilty()));
 }
 
 public static void main(String[] args){
  
  new RequestAdjudicationApp();
 }
}

On the console, you should see the following message:
Is Mary guilty?: true
Is Richard guilty?: false

Rules Engine Patterns - Part 2: Direct-Action


In this pattern, the Rules Engine will react to the state of objects by directly acting on those objects utilizing services injected into the rules context.

Benefits:
  • Very little scaffolding necessary to implement pattern.

Disadvantages:
  • Does not record which rules fired and why.
  • May not scale well unless the underlying service reacting to objects is a proxy to some remote service (e.g. a web service, message bus).

Example:  Bill Lumbergh, VP of Initech, needs to know when one of his engineers has failed to submit a coversheet with his or her TPS Reports.  He has requested the automated system send him an email when this event occurs.

Our model will consist of a TPSReport object and an EmailService (interface).  We will simply print the email to the console when the rule fires.

TPSReport.java

Simple POJO we will use to model the TPS report.

package com.berico.da;

public class TPSReport {

 protected boolean hasCoverSheet = false;
 
 protected String author = null;

 public TPSReport(
  boolean hasCoverSheet, String author) {

  this.hasCoverSheet = hasCoverSheet;
  this.author = author;
 }

 public boolean isHasCoverSheet() {
  return hasCoverSheet;
 }

 public String getAuthor() {
  return author;
 }
}

EmailService.java

Service interface describing the functionality of our email service.  This is what we will reference inside the rule file (instead of a concrete implementation!).


package com.berico.da;

public interface EmailService {

 void email(
  String to, String from, String subject, String message);
}

EmailPrinter.java

Instead of sending out an email, we'll print the email we were going to send to the console.


package com.berico.da;

public class EmailPrinter implements EmailService {

 @Override
 public void email(
   String to, String from, String subject, String message) {
  
  System.out.println(
   String.format(
    "To: %s \nFrom: %s \nSubject: %s\n----------------------\n%s", 
    to, from, subject, message));
 }
}

DirectAction.drl

This is a Drools Rule Language file describing the actions to take when objects appear in the knowledge session.  In our case, if we see a TPSReport with no coversheet, we email Bill.


package com.berico.da

global EmailService emailService;

rule "Email Bill Lumbergh when no coversheet"
 when
  tpsReport : TPSReport( hasCoverSheet == false )
 
 then 
   emailService.email(
    "bill@initech.com", 
    "rules@initech.com",
    "No Coversheet!!!!!",
    tpsReport.getAuthor() + 
    " failed to supply a coversheet!");
end

DirectActionApp.java

This is the application that will drive the session.  You will notice we instantiate the EmailService and register it in the knowledge session.  We then create some TPSReport objects and "insert" them into the session.  Finally, we call "fireAllRules" on the session to have the rules evaluated.


package com.berico.da;

import com.berico.BaseApp;

public class DirectActionApp extends BaseApp {

 @Override
 protected String getRuleFile() {
  
  return "DirectAction.drl";
 }

 public DirectActionApp(){
  super();
  
  // Instantiate the email service.
  EmailService emailService = new EmailPrinter();
  
  // Register the service on the session as a global.
  getSession().setGlobal("emailService", emailService);
  
  // Create a report that should not fire rule.
  TPSReport michaelTpsReport 
   = new TPSReport(true, "Michael Bolton");
  
  // Insert object into session.
  getSession().insert(michaelTpsReport);
  
  // Create a report that should fire rule.
  TPSReport peterTpsReport 
   = new TPSReport(false, "Peter Gibbons");
  
  // Insert object into session.
  getSession().insert(peterTpsReport);
  
  // Evaluate the rules against our objects.
  getSession().fireAllRules();
  
  // Dispose the rule session.
  getSession().dispose();
 }
 
 public static void main(String[] args){
  
  new DirectActionApp();
 }
}

On the console, you should see the following message:
To: bill@initech.com 
From: rules@initech.com 
Subject: No Coversheet!!!!!
----------------------
Peter Gibbons failed to supply a coversheet!
There's nothing else to do in this pattern, since the service registered with the Rules Engine directly handles the event.

Rules Engine Patterns - Part 1: How to employ a Rules Engine

While many engineers may understand (at least conceptually) what a Rules Engine is, I feel like Rules Engines are used so infrequently because people don't know how to employ them.  In the next few posts, I'm going to discuss design patterns for using a Rules Engine, as well as, deployment strategies for integrating the technology into your architecture.

In this post, I want to lay the ground for setting up your Java development environment for using a Rules Engine.  I will also introduce some basic features of the Drools API.  We will conclude with a brief discussion about the strategies (patterns) you will use for tying your business model to the rules engine.


General Development Environment


To effectively use the JBoss Rules framework, you should at least have the following tools installed in your Java Development environment.

Setting up a Drools Project


1.  Add the dependencies to JBoss Rules in the Maven pom.xml.
<dependency>
 <groupId>org.drools</groupId>
 <artifactId>drools-core</artifactId>
 <version>5.5.0-SNAPSHOT</version>
</dependency>
<dependency>
 <groupId>org.drools</groupId>
 <artifactId>drools-compiler</artifactId>
 <version>5.5.0-SNAPSHOT</version>
</dependency>


2.  Instantiate a Rules Session (provided a rule file). This class will serve as my base class that the other demonstrations will inherit from to initialize their rule sessions ("knowledge sessions" in Drools terminology).

This implementation will limit you to using one rule file, but abstracts the common scaffolding we will rely on in examples in the next few posts.  There are much better ways to setup a Drools environment, like using Guvnor for a dynamic rules repository and wiring up the session using Spring.  We will discuss some of this in the final (5th) post on the topic.

BaseApp.java
package com.berico;

import org.drools.KnowledgeBase;
import org.drools.builder.KnowledgeBuilder;
import org.drools.builder.KnowledgeBuilderFactory;
import org.drools.builder.ResourceType;
import org.drools.io.Resource;
import org.drools.io.ResourceFactory;
import org.drools.logger.KnowledgeRuntimeLoggerFactory;
import org.drools.runtime.StatefulKnowledgeSession;

/**
 * A simple class that will initialize a Drools
 * knowledge session for deriving classes.  This
 * class forces deriving classes to supply a rule
 * file as a requirement of building the session.
 * 
 * @author Richard Clayton (Berico Technologies)
 */
public abstract class BaseApp 
{
 /**
  * A preinitialized knowledge session derived
  * classes will use to perform their tasks.
  */
 private StatefulKnowledgeSession session = null;
 
 /**
  * Force derived classes to use a getter just in 
  * case we decide to do something special on access
  * in the future.
  * @return initialized knowledge session.
  */
 protected StatefulKnowledgeSession getSession(){
  return this.session;
 }
 
 /**
  * Deriving classes must supply a rule file
  * that will be used by the knowledge builder.
  * @return Name of the rule file
  */
 protected abstract String getRuleFile();

 /**
  * Instantiate the application, initializing
  * the knowledge session.
  */
 public BaseApp(){
  
  initializeSession();
 }
 
 /**
  * Initialize the Rules Engine Session.
  */
 private void initializeSession(){
  
  // The knowledge builder is used to compile rule and
  // workflow (BPM) resources into executable code.
  // If types are declared in the rule file, they
  // will also be compiled as Java classes.
  KnowledgeBuilder kbuilder = 
    KnowledgeBuilderFactory.newKnowledgeBuilder();
  
  // Get the rule file supplied by deriving classes.
  // This file will be pulled from the class path.
  Resource ruleFile = ResourceFactory.newClassPathResource(
    this.getRuleFile());
  
  // Add the rule file to the knowledge builder.
  kbuilder.add(ruleFile, ResourceType.DRL);
  
  // Initialize a knowledge base from the knowledge builder.
  // The knowledge base is a container for the known logic 
  // of the rules engine.
  KnowledgeBase knowledgeBase = kbuilder.newKnowledgeBase();
  
  // Initialize a rules/workflow session from the knowledge
  // base.  This is the construct we will use to insert
  // "facts" in the rules engine, apply/evaluate
  // rules, and then react to the results.
  this.session = knowledgeBase.newStatefulKnowledgeSession();
  
  // Log actions occurring in the session to the console.
  KnowledgeRuntimeLoggerFactory.newConsoleLogger(session);
  
  // Log actions occurring in the session to a file.
  KnowledgeRuntimeLoggerFactory.newFileLogger(session, 
    String.format(
     "rule_session_%s.xml", 
     System.currentTimeMillis()));
 }
}

Using the Scaffold

To use the scaffolding we wrote above, simply extend the class, providing the name a rules file located somewhere on your classpath.

A rules file is simply a text file with rules defined in the Drools Expert syntax.  You can find the Drools Expert syntax documentation here: http://docs.jboss.org/drools/release/5.4.0.Final/drools-expert-docs/html_single/.

Perhaps the easiest way to create the rules file is to use the Drools Eclipse Plugin mentioned above.  The plugin provides syntax highlighting and a diagram of the Rete tree for rule files:



I use the standard Maven project structure, storing rule files in the "src/main/resources" directory.  If you do the same, you should be able to refer to the file using the scaffold by filename.

Example Using the Scaffold


To demonstrate how this works, we need a simple POJO that notionally represents our model.  In this case, I'm going to use the obligatory "User" object.

User.java

package com.berico.scaffold;

public class User {

 private String firstName;
 private String lastName;
 private int age;
 
 public User(
  String firstName, String lastName, int age) {
  
  this.firstName = firstName;
  this.lastName = lastName;
  this.age = age;
 }

 public String getFirstName() {
  return firstName;
 }

 public String getLastName() {
  return lastName;
 }

 public int getAge() {
  return age;
 }
}


ScaffoldExample.drl
package com.berico.scaffold

rule "User seen"
  when
    user: User()
  then
    System.out.println("Hello " + user.getFirstName());
end

ScaffoldExampleApp.java
package com.berico.scaffold;

import com.berico.BaseApp;

public class ScaffoldExampleApp extends BaseApp {

 @Override
 protected String getRuleFile() {
  
  return "ScaffoldExample.drl";
 }

 public ScaffoldExampleApp() {
  super();
  
  User richard = new User("Richard", "Clayton", 31);
  
  getSession().insert(richard);
  
  getSession().fireAllRules();
  
  getSession().dispose();
 }
 
 public static void main(String[] args){
  
  new ScaffoldExampleApp();
 }
}

Understanding the Drools API


The extremely simplistic example above demonstrates the general flow of how the Drools engine is used.  Developers will instantiate or collect (say from a queue, or web service) model objects and insert those objects into a Drools Knowledge Session.  Once the rules have been added to the session, you instruct the engine to fireAllRules (evaluate) its ruleset against the objects.

In the demonstration, we simply print to the console when one of our rules successfully evaluates.  Clearly, we need to be able to do more.  So how would you react to a rule being evaluated, especially outside of the context of the rules engine (getting the results out).

Patterns for Rule Engine Employment


I have found that you typically use one of three patterns when solving a problem with a Rules Engine.  Each pattern has its own benefits and disadvantages, and I will emphasize that there isn't one pattern generally considered better than the others.  More importantly, you might mix and match these patterns depending on how the problem you are trying to solve.

Each pattern is distinguished by how they handle the "end state" of a rules session.  On one extreme, you can choose to not keep any state and simply react to conditions that occur when rules are met (e.g.: email admin on error).  On the other extreme, you can choose to record which rule conditions are met and react to the results of those conditions outside of the rules engine (e.g.: validation scenario).

I will detail three following patterns in the next few posts:
  • Direct-Action
  • Request Adjudication
  • Result Compilation

Conclusion


This admittedly is not a very exciting post, but should serve as the basis for understanding the following posts.  In the next post (which I wrote at the same time of this post!), we will discuss the Direct-Action pattern for utilizing a Rules Engine in your middle tier.

Sunday, June 10, 2012

Securing Node.js and Express with SSL Client-Authentication

In the course of my work using Node.js, I did some research on implementing securing Node.js.  There are a couple of decent articles on the subject, and of course a number of frameworks that performed this work, but nothing I read or evaluated really quite fit my needs:
  • Use strong authentication.
    In my case, I preferred certificate-based (SSL).
  • Remain as unobtrusive as possible.  
    I didn't want security code in my routers or generally entangled in the rest of my application.
  • Be Flexible.
    Let me determine how and when users could see content or perform actions.
I know it's sacrilege to say it, but I really was looking for a capability similar to Spring Security or J2EE container security.  I wanted to constrain access by route, apply arbitrary access rules, and utilize ACL's which could be used to restrict content, or constrain access to data when queries are performed against external data sources.

In this post, I will demonstrate how to setup SSL authentication in Node.js.  

In following posts, I will show you how to write Connect "middleware" components to constrain access by route and enhance user identity with roles (enabling ACLs). 

Enabling SSL Authentication in Node.js

The first step to enabling SSL Authentication is to have the necessary Public-Key Infrastructure (PKI) to generate both server and client certificates, as well as, establishing trust between those certificates by deriving the certificates from a Certificate Authority.  I discuss this topic in detail in the previous post, and will even references the certificate paths used in that discussion.

Once you have the certificates in place, the code to actually enable SSL Authentication in Node.js is actually pretty easy:
var express = require('express')
  , routes = require('./routes')
  , fs = require('fs')

// MAGIC HAPPENS HERE!
var opts = {
  
  // Specify the key file for the server
  key: fs.readFileSync('ssl/server/keys/server1.key'),
  
  // Specify the certificate file
  cert: fs.readFileSync('ssl/server/certificates/server1.crt'),
  
  // Specify the Certificate Authority certificate
  ca: fs.readFileSync('ssl/ca/ca.crt'),
  
  // This is where the magic happens in Node.  All previous
  // steps simply setup SSL (except the CA).  By requesting
  // the client provide a certificate, we are essentially
  // authenticating the user.
  requestCert: true,
  
  // If specified as "true", no unauthenticated traffic
  // will make it to the route specified.
  rejectUnauthorized: true
};

var app = module.exports = express.createServer(opts);

// Configuration 

app.configure(function(){
  app.set('views', __dirname + '/views');
  app.set('view engine', 'jade');
  app.use(express.bodyParser());
  app.use(express.methodOverride());
  app.use(app.router);
  app.use(express.static(__dirname + '/public'));
});

app.configure('development', function(){
  app.use(express.errorHandler(
    { dumpExceptions: true, showStack: true })); 
});

app.configure('production', function(){
  app.use(express.errorHandler()); 
});

// Routes
app.get('/', routes.index);

app.listen(8443);

console.log(
   "Express server listening on port %d in %s mode", 
    8443, 
    app.settings.env);
With the exception of the "options" object containing the SSL configuration, the application is just an Express.js application.

Let's start up the application...

When you run the application you will get the "Enter PEM pass phrase:" request before the web server will start.  The PEM is the password you used when creating the Server's key.  This is actually a pretty nice feature when you think about it, since you don't have to submit it when launching the application (making it visible when you use the ps command in POSIX systems).

If you are really not that concerned about security, you can specify the passphrase in the Server options:

var opts = {
  key: fs.readFileSync('ssl/server/keys/server1.key'),
  cert: fs.readFileSync('ssl/server/certificates/server1.crt'),
  ca: fs.readFileSync('ssl/ca/ca.crt'),
  requestCert: true,
  rejectUnauthorized: true,
  
  // This is the password used when generating the server's key
  // that was used to create the server's certificate.
  // And no, I do not use "password" as a password.
  passphrase: "password"
};
No when you start the server, it will not harass you for a password.

Now when a user comes to the website without a certificate, the will get the following "nasty gram":



As opposed to the original route they intended to hit (in this case the "index" route).

Pretty awesome, huh?

If you don't need to deal with users of varying levels of access (administrators, moderaters, etc.), and don't have any anonymous users, you're done.  However, if you want to show a nice "Unauthorized" web page, we will need to change our configuration a little bit.


Allowing Anonymous Access with Custom Unauthorized Pages


One of the settings in our configuration is preventing the unauthenticated browser (Firefox) from reaching our web server.  This may be okay in certain circumstances, but in general, it may be smarter to let the user know the web server is functioning correctly, it's just that they aren't privileged enough to view your content.

To allow unauthenticated browsers through, simply turn the "rejectUnauthorized" property in the options object to "false":
var opts = {
  key: fs.readFileSync('ssl/server/keys/server1.key'),
  cert: fs.readFileSync('ssl/server/certificates/server1.crt'),
  ca: fs.readFileSync('ssl/ca/ca.crt'),
  requestCert: true,

  // Now every one can get through to the web server.
  rejectUnauthorized: false,
};
Now all users can hit the website. Authentication still occurs, but unauthenticated users are given access to everything that an authenticated user gets.  This is obviously not what we want, but it's necessary to at least provide an "Unauthorized Access" webpage to that unauthenticated user.  Since we turned off Node's native way of restricting access, it's our responsibility to control that access in our application.

The next snippet of code demonstrates how to determine whether a user was authenticated and what you can potentially do with this knowledge.  In this example, I will render different content based on this condition.  If the user is unauthenticated, I will render the "Unauthorized" template.  If the user is authorized, I will render an "Authorized" template displaying the user's certificate information.
app.get('/', function(req, res){

  // AUTHORIZED 
  if(req.client.authorized){

    var subject = req.connection
      .getPeerCertificate().subject;
    
    // Render the authorized template, providing
    // the user information found on the certificate
    res.render('authorized', 
      { title:        'Authorized!',
     user:         subject.CN,
     email:        subject.emailAddress,
     organization: subject.O,
     unit:         subject.OU,
     location:     subject.L,
     state:        subject.ST,
     country:      subject.C
   }); 
 
  // NOT AUTHORIZED
  } else {
 
 // Render the unauthorized template.
    res.render('unauthorized', 
  { title: 'Unauthorized!' }); 
  }
});
OK, I've written a little security logic in a route (which I already mentioned was evil) to demonstrate how you can gain access to the authenticated user, as well as, what you can do when based on the authorization status.

I won't show you the Jade templates I used (they really simple; if you want a copy, leave me a comment), but I will show you the results:

Unauthorized Template Rendered
Authorized Template Rendered

That's it for Part 1.  In the next post, I will show you how to create Connect middleware to better separate this security logic from your routes.

Until next time, good luck.

Saturday, June 9, 2012

Automating the Creation of a Certificate Authority and Issuing Certificates with OpenSSL

I spent a little time over the weekend working on securing an Node.js server with SSL client-authentication.  I needed to create a certificate authority from which I could create and sign both server and client certificates for use by my application.

While the process is not terribly difficult, there is a lot to know (and remember), I decided to automate the it a little bit and thought others might be interested.  I should mention up front that this website (full link below) was especially helpful in understanding the process.

I will start by explaining the process of creating a CA and issuing certificates.  Once you understand the process, I will describe how you can customize some of that process using OpenSSL config files.  I will conclude with describing some automation steps that can be performed to make this process easier.

I've created a public Github repo with the scripts mentioned in this post:  https://github.com/berico-rclayton/certificate-automation/.

Creating a Certificate Authority


Each step will require some input (the commands are interactive).  You will set the CA's password in the first step, and this password will be use in many of the other steps (including signing certificates for servers and clients).  Needless to say, remember this password (but keep it safe -- this is quite literally the keys to the castle).

1.  Create a key for the certificate authority.
openssl genrsa -des3 -out ca.key 1024
2.  Create a certificate request for the authority:
openssl req -new -key ca.key -out ca.csr
3.  Create a certificate for the CA, by signing (self-sign) the certificate with the CA's own key:
openssl x509 -req -days 365 -in ca.csr \
    -out ca.crt -signkey ca.key

Create a Server Certificate


1.  Create key for the server.
openssl genrsa -des3 -out server.key 1024
2.  Create a request for certificate.
openssl req -new -key server.key -out server.csr
3.  Sign (CA) and issue the certificate.
openssl x509 -req -in server.csr -out server.crt \
    -CA ca.crt -CAkey ca.key -CAcreateserial -days 365

Create a Client Certificate


1.  Create key for the client.
openssl genrsa -des3 -out user.key 1024
2.  Create a request for certificate.
openssl req -new -key user.key -out user.csr
3.  Sign (CA) and issue the certificate.
openssl x509 -req -in user.csr -out user.crt \
     -CA ca.crt -CAkey ca.key -CAcreateserial -days 365
4.  Create a PKCS#12 formatted certificate for use in browsers like Firefox.
openssl pkcs12 -export -clcerts -in user.crt \
    -inkey user.key -out user.p12

Customizing OpenSSL Configuration


If you execute these commands, you will soon discover how annoyingly interactive the process is (lots of stuff to input just to create a certificate).  Fortunately, the CA process is a one time deal (well as long as you issued your own CA certificate).  Generating a server certificate will also probably be a relatively infrequent process.  It's the issuing of client certificates that becomes a real pain in the ass, which is why I automated this process.

There is an out-of-the-box way to automate a little bit of this work using custom OpenSSL configuration files.

1.  In the directory of your choice, create a text file called "openssl.cnf".
2.  Add configuration values to the newly created file.

Copy and paste from here: https://github.com/berico-rclayton/certificate-automation/blob/master/conf/server_openssl.cnf
 
# ... truncated ...

[ req_distinguished_name ]
0.organizationName = Organization Name (company)
organizationalUnitName  = Organizational Unit Name (department, division)
emailAddress  = Email Address
emailAddress_max = 40
localityName  = Locality Name (city, district)
stateOrProvinceName = State or Province Name (full name)
countryName  = Country Name (2 letter code)
countryName_min  = 2
countryName_max  = 2
commonName  = Common Name (Computer Name, Server, or Username)
commonName_max  = 64
 
0.organizationName_default = Berico Technologies
organizationalUnitName_default  = Engineering BU
localityName_default  = Reston
stateOrProvinceName_default = Virginia
countryName_default  = US
emailAddress_default  = server@bericotechnologies.com
commonName_default  = server
 
# ... truncated ...
3.  Edit the values (right-hand side of "=" sign) of the variables suffixed with "_default" to change the default values displayed in the OpenSSL commands.  There are also a number of other important variables like default "key length" (we've been using 1024 bits in this example) that you can also set.

4.  Ensure OpenSSL uses your new config file.
export OPENSSL_CONF=openssl.cnf

Automating the Process


We've gotten as far as we're going to get with configuration, so lets use a little BASH magic to automate some of this process.  The first thing we will do is organize our directory structure instead of writing all files to the current working directory as I have done in the previous examples.  I'm going to use a directory structure almost identical to the one suggested by the website listed above (and below):
  • ca/ - holds certificate authority's certificate, certificate request, and key.
  • server/keys/ - holds all keys generated for servers.
  • server/requests/ - holds all certificate creation requests to the CA for servers.
  • server/certificates/ - holds all certificates for servers.
  • user/keys/ - holds all keys generated for users.
  • user/requests/ - holds all certificate creation requests to the CA for users.
  • user/certificates/ - holds all certificates for users.
  • user/p12/ - holds all PKCS#12 formatted certificates for users.
By the way, make sure you protect these folders judiciously (owned and accessed by root only).

I've created three BASH scripts for automating this process:

1.  create_certauth.sh - creates the directory structure and Certificate Authority's key and self-signed certificate.  This will probably only be executed once for an organization.
sh setup_certauth.sh
2.  make_server_cert.sh - creates a certificate for a server.  You will need to specify the server name, which will be used as the file name for all artifacts.
sh make_server_cert.sh {server-name}
3.  new_client_cert.sh - creates a certificate for a user, along with the PKCS#12 certificate, which is used by some browsers.  You will need to specify the client's name, which is used as the file name for all artifacts.
sh new_client_cert.sh {client-name}

Screenshot of the client certificate creation process.


Viola.  I should mention that I am not a BASH Ninja, so if you have some suggestions, I would love to hear them.  The scripts in the Github repo will use the directory structure to gather the CA certificate, as well as, store the keys, requests, and certs.

Hope this was useful; if not, let me know.

References:


Special Thanks to the Author of this Guide:  http://www.cafesoft.com/products/cams/ps/docs30/admin/ConfiguringApache2ForSSLTLSMutualAuthentication.html#Creating_a_Certificate_Authority_using_OpenSSL

Thursday, April 19, 2012

Practical Apache Pig Recipes - Round 1


Apache Pig is an incredibly productive framework for manipulating datasets on the Hadoop platform.  Coupled with a couple of third-party libraries (PiggyBank and DataFu), Pig has helped me break the habit of writing custom Java apps to do what should be trivial tasks with structured data.  In this post, I'm going to provide a couple of useful recipes that will demonstrate the power of the platform.

Installing and Running Pig without a Hadoop Cluster.


These instructions are for a *NIX based Operating System with Java installed.

1.  Download Pig:  http://pig.apache.org/releases.html, place it in some useful directory.
2.  Extract the archive:
# If this is a later release, please change the version number.
tar xzvf pig-0.9.2.tar.gz
3.  Create an alias to Pig (in local mode; i.e.: no Hadoop Cluster):
# Assuming you are still in the same directory you extracted pig into.
# If you use Pig often, you should set this in your BASH profile.
alias pig="`pwd`/pig-0.9.2/bin/pig -x local"
4a.  Start "Grunt", the Pig Shell:
pig
4b.  Execute a Pig Script
pig {scriptname}.pig

Stripping, Adding and Reordering Columns in a CSV File


Sometimes, you will have to deal with datasets that have columns you don't care about, or will have a column out of order.  This demonstration will show how you can do this in Pig with only three lines of Pig Latin.

Princeton University has an awesome website with sample CSV datasets that include the Latitude and Longitudes of chain stores (McDonalds, Starbucks, Walmarts, etc.) in America.

The schema of the CSV file is:  longitude, latitude, store name, store address.

I have an external application that requires the coordinates to be in the form latitude, longitude instead of how it is now (longitudelatitude). The store name is also a little verbose (usually the name of the chain followed by a number); since I don't need the individual store name, but still need a normalized identifier, I'm going to drop the store name column and add a constant field with the normalized value.
Download "strip_add_remove.pig"


The console produces the following output at the end of the job:

...Truncated...
(41.5667,-70.6227,Starbucks,"28 Davis Straights/ Route 28; Falmouth)
(41.61734,-70.49054,Starbucks,"6 Market Street; Mashpee)
(43.50562,-70.43833,Starbucks,"509 Main Street; Saco)
(43.63285,-70.3631,Starbucks,"200 Running Hill Road; South Portland)
(43.63416,-70.33794,Starbucks,"364 Maine Mall Road N135; South Portland)
(43.65202,-70.3097,Starbucks,"1001 Westbrook St; Portland)
(43.65412,-70.26322,Starbucks,"594 Congress Street; Portland)
(43.65762,-70.25521,Starbucks,"176 Middle St.; Portland)
(44.1207,-70.23047,Starbucks,"35 Mount Auburn Avenue; Auburn)
(43.85869,-70.1018,Starbucks,"49 Main Street; Freeport)
(43.9374,-69.9809,Starbucks,"125 Topsham Fair Mall Rd; Topsham)
(43.90637,-69.91558,Starbucks,"10 Gurnet Drive; Brunswick)
(44.5629,-69.64249,Starbucks,"2 Waterville Commons Drive; Waterville)
(44.83537,-68.74344,Starbucks,"38 Bangor Mall Blvd; Bangor)
(44.83959,-68.7426,Starbucks,"60 Longview Dr; Bangor)

The magic is in the FOREACH statement;  all we did was create a new "projection" of the data by reordering lat and lon, providing the normalized field 'Starbucks', and omit the original name field.


Creating a Bounding Box Filter


A bounding box is really simple when you think about it.  It's simply the right and left-most latitude and top and bottom-most longitude.  Here, we will use Pig to filter store locations to those in Northern Virginia (Latitudes 38 and 39, Longitudes -77 and -78), store those locations, and count the results.
Download "bounding_box.pig"

Here are the results from the Pig job:

...Truncated...
(38.8997,-77.0262,Starbucks,"1000 H St NW; Washington)
(38.96875,-77.0261,Starbucks,"6500 Piney Branch Rd NW; Washington)
(38.99644,-77.02586,Starbucks,"915 Ellsworth Drive C-19; Silver Spring)
(38.89848,-77.02389,Starbucks,"701 9th Street NW; Washington)
(38.90243,-77.02389,Starbucks,"999 9th St NW; Washington)
(38.89402,-77.02191,Starbucks,"325 Seventh Street NW Suite 100; Washington)
(38.89977,-77.02191,Starbucks,"800 7th Street NW Suite 305; Washington)
(38.91223,-77.02191,Starbucks,"443-C 7th Street)
(38.91921,-77.02191,Starbucks,"2225 Georgia Avenue)
(38.72444,-77.01967,Starbucks,"952 E Swan Creek Rd; Fort Washington)
(38.9094,-77.01791,Starbucks,"1 Avaition Cir; Washington)
(38.8969,-77.00672,Starbucks,"40 Massachusetts Avenue Amtrak Baggage Area; Washington)
(38.88736,-77.00297,Starbucks,"237 Pennsylvania Ave SE; Washington)

...Hadoop output omitted...

(158)

The FILTER command will create a new projection of the dataset keeping only those records within the bounding box we specified (pretty simple, huh?).


Calculate the Distance Between Locations


In this recipe, we will calculate the distance between a target location (Latitude and Longitude supplied via command line) and all other locations supplied by the dataset.  In order to perform this calculation, we will use a User Defined Function from the DataFu library from LinkedIn.
Download "distance.pig"

In order to run this example, you will need to supply the appropriate center point (to set the $USERLAT and $USERLON variables):
# 38.8977, -77.0366 is the White House, Washington, D.C.
pig -p USERLAT=38.8977 -p USERLON=-77.0366  distance.pig

The results of our distance calculation and subsequent filter (locations within 10 miles):
...Truncated...
(38.99644,-77.02586,Starbucks,6.8466316029936705,"915 Ellsworth Drive C-19; Silver Spring)
(38.98825,-77.09609,Starbucks,7.02586063143866,"7700 Norfolk Ave; Bethesda)
(38.80397,-76.9843,Starbucks,7.061133827376486,"6171-A Oxon Hill Road; Oxon Hill)
(38.9953,-77.07719,Starbucks,7.0874660476019535,"8542 Connecticut Avenue; Chevy Chase)
(38.83643,-77.15642,Starbucks,7.71170369522629,"6365 Columbia Pike; Falls Church)
(38.93301,-77.17824,Starbucks,7.995811205809961,"1438 Chain Bridge Road; McLean)
(38.89369,-77.18887,Starbucks,8.192941429466497,"1218 West Broad Street; Falls Church)
(39.02039,-77.0126,Starbucks,8.574554264635223,"10103 Colesville Rd; Silver Spring)
(38.92895,-77.19746,Starbucks,8.913496579837988,"1961 Chain Bridge Rd; McLean)
(38.90366,-77.20421,Starbucks,9.02192730760362,"7501 H Leesburg Pike; Falls Church)
(38.7715,-77.08137,Starbucks,9.046367728152426,"6754 Richmond Hwy Unit 4; Alexandria)
(39.02905,-77.00707,Starbucks,9.213012865407007,"10731 Colesville Road; Silver Spring)
(38.99768,-76.90967,Starbucks,9.707743952951674,"7541 Greenbelt Rd. Space 16; Greenbelt)
(39.03881,-77.05704,Starbucks,9.811380298967933,"2800 W. University Blvd. E; Wheaton)
(39.03927,-77.05418,Starbucks,9.827010981193256,"11160 Veirs Mill Road 139; Wheaton)
(39.01629,-76.93112,Starbucks,9.962700766542332,"4750 Cherry Hill Road; College Park)
(38.83159,-77.20134,Starbucks,9.970542690585502,"7414 Little River Turnpike; Annandale)

This recipe demonstrates two important features of Pig.  First, we are registering a JAR and User Defined Function with Pig so it can be called in one of the projections.  The second is the use of command-line arguments to supply dynamic values to the script.


Tuesday, January 24, 2012

Whirlwind Tour of OSGi: Using Maven for Bundle Packaging, Activators, Service Registration, Configuration, and Custom Shell Commands

OSGi, without the proper assistance, can be a pretty complex technology for engineers to learn.  There's a lot of things to know in order to be productive, and worse yet, most of the process is a departure from what Java developers are accustomed to.  I'm going to attempt to demonstrate how to do some very basic things to become immediately productive in OSGi, including:
  1. Setting up an OSGi bundle project in Maven.
  2. Importing the project into Eclipse.
  3. Configuring Maven and Adding Dependencies.
  4. Writing an OSGi Activator.
  5. Compiling/Creating the Bundle.
  6. Registering a Service in OSGi.
  7. Consuming a Service in OSGi.
  8. Consuming Configuration Updates.
  9. Writing Menu Extensions for the Apache Felix Gogo shell.
  10. Dynamically configuring a Service using the Gogo shell.

I’m not going to spend any time discussing what OSGi is or the importance of the technology.  If you would like to understand the impetus for the framework, I suggest reading the Wikipedia article.  Before starting this tutorial, you should understand what a "bundle" is and the general lifecycle of bundles within an OSGI container.

1.  Setting up an OSGi bundle project in Maven.

Let's start by setting up the project structure in Maven.  I'm going to walk the reader through the correct steps of setting up Maven to do all of the "bundle" work.

Start by creating a project using the "quickstart" archetype.  (I ran this command in the root folder of my Eclipse workspace).
mvn archetype:generate \
 -DgroupId=com.berico.time \
 -DartifactId=timeprovider \
 -Dpackage=com.berico.timeprovider \
 -Dversion=0.0.1-SNAPSHOT
After executing this command, Maven's going to dump 534 different archetypes to choose from.  The default archetype is the Maven "quickstart" archetype (# 169 at the time of this writing).  You can enter 169, or just hit enter.  In all following questions, just use the default (by pressing enter).

2.  Importing the project into Eclipse.

We need to "Eclipsify" your project.  This will let you import the project into Eclipse.  This command also helps sometimes when Eclipse starts messing up the Maven dependency management process (I find that my Eclipse instance stops refreshing the Maven dependencies and this is the only fix).
cd timeprovider
mvn eclipse:eclipse
Now we need to import the project through the IDE.  First start by going to File and selecting Import...


Then we select the Existing Projects into Workspace option and press the Next > button.


When the next window pops up, press the Browse button and select the directory created by Maven and then press Open.


Eclipse should accept the directory as an Eclipse project.  If you did not perform the mvn eclipse:eclipse, it will not allow you to import the directory as a project.  Select Finish to continue.


In the Project Explorer, pane to the left (in Java Perspective), you should see your Maven generated project with the source folders and the pom.xml file.



The last thing we want to do is allow Maven to control the dependencies within Eclipse.  If you are using the SpringSource Tool Suite, the Maven plugin should be installed.  Otherwise, you will need to go through the process of installing it (I think it's called "m2eclipse").

Right-click the project, navigate to the Maven submenu and select Enable Dependency Management.


Ok, we've setup Eclipse and are now ready to configure Maven.

3.  Configuring Maven and Adding Dependencies.

Now that we have the project imported into Eclipse, let's set up Maven to support the creation of OSGi bundles.  The Apache Felix project includes a plugin for Maven which makes developing OSGi bundles a lot easier.  We are going to modify the Maven pom.xml file to include this plugin, as well as, adding a couple of dependencies we will need for our module.

First, let's add the dependencies.  For the purpose of this project, we want the following libraries:

  • org.osgi.core - this is the model necessary to write OSGi applications.  We will get this from the Apache Felix project.
  • org.apache.felix.configadmin - we will use this package later on in the application for configuring our Time Provider service.
  • org.apache.felix.gogo.runtime - Gogo is the shell service used to manage Apache Felix (bundled with Felix).  We will use Gogo to create menu options for the OSGi container. Gogo compliant to the newest OSGi standards, and can actually be used as a replacement shell for other OSGi runtimes like Knopperfish and Eclipse Equinox. For the purpose of this tutorial, we will only use Apache Felix.
  • joda-time - the only useable Date and Time library in Java!  We will use Joda to output time from different time zones in the application.
  • junit - Unit testing library.  Please note that Maven defaults to JUnit 3, so we are changing the only dependency present in the "quickstart" archetype to the newer version of the library.

You pom.xml's dependencies element should now look like this:
<dependencies>
    <dependency>
 <groupId>org.apache.felix</groupId>
 <artifactId>org.osgi.core</artifactId>
 <version>1.0.0</version>
    </dependency>
    <dependency>
 <groupId>org.apache.felix</groupId>
 <artifactId>org.apache.felix.configadmin</artifactId>
 <version>1.2.8</version>
    </dependency>
    <dependency>
 <groupId>org.apache.felix</groupId>
 <artifactId>org.apache.felix.gogo.runtime</artifactId>
 <version>0.10.0</version>
    </dependency>
    <dependency>
 <groupId>joda-time</groupId>
 <artifactId>joda-time</artifactId>
 <version>2.0</version>
    </dependency>
    <dependency>
 <groupId>junit</groupId>
 <artifactId>junit</artifactId>
 <version>4.10</version>
 <scope>test</scope>
    </dependency>
 </dependencies>
Next, we need to add a plugin to Maven to allow it to produce OSGi bundles.  Add the following plugins section to the pom.xml's root element:
<build>
  <plugins>
   <plugin>
     <groupId>org.apache.maven.plugins</groupId>
     <artifactId>maven-compiler-plugin</artifactId>
     <version>2.3.2</version>
     <configuration>
       <target>1.5</target>
       <source>1.5</source>
     </configuration>
   </plugin>
  <plugin> 
    <groupId>org.apache.felix</groupId>
    <artifactId>maven-bundle-plugin</artifactId>
    <version>2.3.6</version>
    <extensions>true</extensions>
    <configuration>
      <instructions>
        <Export-Package>
          com.berico.timeprovider.api
        </Export-Package>
        <Private-Package>
          com.berico.timeprovider.internal.*
        </Private-Package>
        <Bundle-Activator>
          com.berico.timeprovider.internal.Activator
        </Bundle-Activator>
      </instructions>
    </configuration>
   </plugin> 
  </plugins>
</build>
The maven-bundle-plugin will provide the facilities to allow bundle create, extending Maven.  The maven-compiler-plugin adds support for annotation processing by forcing the compiler to support Java 5 (or later) language features.

For the most part, you will probably cut and paste this every time you create a bundle.  You do want to pay attention to the plugin -> configuration -> instructions element of this section.  Within this element, you will need to provide bundle-specific information.  In this case, I've specified that the com.berico.timeprovider.api package should be exported within the OSGi runtime.  This is where I will provide interfaces and model that I want consumers to work with.  Alternatively, I don't want developers to be messing with the com.berico.timeprovider.internal package (and subpackages), so I instruct the OSGi container that these are sealed.  Finally, I've specified that the com.berico.timeprovider.internal.Activator class is my bundle activator (think "main method" of the package).

The information contained within this section will be translated into the appropriate bundle metadata required by OSGi, stored in the MANIFEST.MF file.  The observant eye will notice that I did not specify the Import-Package information, required for our bundle to operate (literally having those libraries exposed by OSGi to our bundle's classloader).  One of the really cool things about the Maven plugin is the auto-generation of this metadata based on our dependency declaration in the pom.xml.  Another cool feature of the plugin is the implicit addition of the bundle's classes to the Import-Package, a convention imposed by the OSGi framework.

The last thing we need to do is modify some of the basic information included by default in the pom.xml. We are going to add a couple of properties about the bundle itself (the project.build.sourceEncoding was added by Maven already).  The bundle.symbolicName property maps directly to the Bundle-SymbolicName and the bundle.namespace (though I'm not certain) to the package namespace we want to use on Import-Package.
<properties>
  <bundle.symbolicName>com.berico.timeprovider</bundle.symbolicName>
  <bundle.namespace>com.berico.timeprovider</bundle.namespace>
  <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
Now let's change the name of the package to reflect the bundle's symbolic name and namespace (this will appear as the Bundle's name (literally the OSGi Bundle-Name property) within Apache Felix Gogo shell:
<name>Berico Time Provider</name>
And the last change we want to make is to ensure the way Maven packages the project is in the form of a bundle and not a jar (which is the current setting):
<packaging>bundle</packaging>
With these changes, we are now ready to write some code, compile it, and package the results in a properly formatted OSGi bundle.

4.  Write an OSGi Activator.


Now that we can produce a bundle, it's time to write an Activator to demonstrate how this bundle will work in an OSGi container.  An Activator is the entry point of a bundle (it's literally a way to execute code when the bundle is started or stopped by an OSGi container).  The purpose of the Activator is to initialize your service (or tear it down); both the start and stop methods are given access to the BundleContext of the OSGi container.  Using the BundleContext, you can produce or consume services, register listeners for life cycle events with the container, among other things.

The first thing I'm going to do is delete the App class created by Maven in the com.berico.timeprovider package, and create the following classes and subpackages:
  • com.berico.timeprovider.internal.Activator - class that we are referencing as the bundle activator in our Maven pom.xml.
  • com.berico.timeprovider.api.TimeProvider - interface that we are going to offer consumers who want to request the time.
  • com.berico.timeprovider.internal.TimeProviderImpl - this is the implementation of the TimeProvider interface that we will provide consumers.
I've implemented the TimeProvider interface and TimeProviderImpl class completely, but I'm going to leave the Activator inept at the moment.  All we are going to do with the Activator is print a message to the screen when the bundle starts and stops.

TimeProvider.java
package com.berico.timeprovider.api;

public interface TimeProvider {

  String getTime();
 
  String getTime(String timezone) throws Exception;
 
  void setDefaultTimeZone(String timezone) throws Exception;
}
TimeProviderImpl.java
package com.berico.timeprovider.internal;

import org.joda.time.DateTime;
import org.joda.time.DateTimeZone;

import com.berico.timeprovider.api.TimeProvider;

public class TimeProviderImpl implements TimeProvider {
  
  // Default Time Zone is Zulu/GMT (0)
  private DateTimeZone defaultTimeZone 
        = DateTimeZone.forOffsetHours(0);
  
  public String getTime() {
    
    return getTime(this.defaultTimeZone);
  }

  public String getTime(String timezone) throws Exception {
  
    return getTime(DateTimeZone.forID(timezone));
  }
  
  public void setDefaultTimeZone(String timezone) throws Exception {
    
    DateTimeZone dtz = getTimeZone(timezone);
    
    this.defaultTimeZone = dtz;
  }

  private String getTime(DateTimeZone dtz){
    
    DateTime dt = new DateTime(dtz);
    
    return dt.toString();
  }
  
  private DateTimeZone getTimeZone(String timezone) throws Exception {
    
    DateTimeZone dtz = DateTimeZone.forID(timezone);
    
    if(dtz == null){
      
      throw new Exception(
        String.format(
          "Could not parse timezone [%s].", timezone));
    }
    
    return dtz;
  }
}
Activator.java
package com.berico.timeprovider.internal;

import org.osgi.framework.BundleActivator;
import org.osgi.framework.BundleContext;

public class Activator implements BundleActivator {

  public void start(BundleContext bundleContext) throws Exception {
    
    System.out.println("com.berico.timeprovider Started");
  }

  public void stop(BundleContext bundleContext) throws Exception {
    
    System.out.println("com.berico.timeprovider Stopped");
  }
}
We will build out the Activator in a little bit.  Before we go any further, let's create our bundle, download an OSGi container, and test to see if the bundle runs correctly.

5.  Compiling/Creating the Bundle.

Now that we have some code, let's compile the bundle.  We're going to perform these steps from the command line instead of using Eclipse.

First, let's ensure our code compiles:
mvn compile
If everything was written correctly, you should see a "BUILD SUCCESS" when Maven finishes executing the build.  Now let's produce the bundle:
mvn bundle:bundle
Once again you should see a "BUILD SUCCESS" at the end of the process.  If you go to the target directory, located in the root of the project, you should see a jar file called timeprovider-0.0.1-SNAPSHOT.jar.


I'm going to change the file extension to zip and extract the contents to show the directory structure.


Finally, let's inspect the MANIFEST.MF file to see what the Maven Bundle Plugin did:
Manifest-Version: 1.0
Bnd-LastModified: 1327273047610
Build-Jdk: 1.6.0_29
Built-By: rclayton
Bundle-Activator: com.berico.timeprovider.internal.Activator
Bundle-ManifestVersion: 2
Bundle-Name: Berico Time Provider
Bundle-SymbolicName: com.berico.time.provider
Bundle-Version: 0.0.1.SNAPSHOT
Created-By: Apache Maven Bundle Plugin
Export-Package: com.berico.timeprovider.api;version="0.0.1.SNAPSHOT"
Import-Package: com.berico.timeprovider.api;version="[0.0,1)",org.joda.t
 ime;version="[2.0,3)",org.osgi.framework;version="[1.3,2)"
Tool: Bnd-1.50.0
Awesome.  Now let's try to get this bundle to install and start in an OSGi container.

In terms of OSGi containers, you have a number of options to choose from.  The entire Eclipse platform is built on the Eclipse Equinox OSGi container.  For our purposes, I'm going to stick with Apache Felix since I am most familiar with it's environment.

Download the Apache Felix from here:  http://felix.apache.org/site/downloads.cgi.

We will also need to ensure our project's dependencies are satisfied; the org.osgi.core dependency is satisfied by the container.  The other dependencies are apart of the Apache Felix project; I will show you a nifty way to retrieve those from Felix's console.  The one library that we will have to manually retrieve is the Joda Time library.

 Download the Joda Time library from here:  http://sourceforge.net/projects/joda-time/files/joda-time/2.0/joda-time-2.0-dist.zip/download

Extract the Apache Felix archive somewhere meaningful.  Next, extract the Joda Time archive, and place the joda-time-2.0.jar in the bundle directory located in the root folder of Apache Felix.  Any OSGi bundle placed in this directory will automatically be installed by the container; for this reason, this folder is ofter referred to as the auto-deploy folder on an OSGi container.


Hey, look at that!  The org.apache.felix.gogo.runtime bundle is already there!  That's one less dependency we will need to load for this example.

Let's start Felix and install the final dependency, the org.apache.felix.configadmin bundle.

To launch Felix, you use the jar switch of the java command to launch the container.  From the root directory of the Felix installation, you would execute the following command:
java -jar bin/felix.jar
I prefer to setup an alias to this command since its so long and dependent on being in the root of that directory:
alias osgi="java -jar `pwd`/bin/felix.jar"
Now we can start the Felix container using the osgi command:


Ok, first let's look at the bundles installed within the container by calling the lb command:


Great!  We have the Joda-Time bundle installed, now we only need to get the org.apache.felix.configadmin bundle installed before we can install our newly created bundle.  Felix includes a bundle called the Apache Felix Bundle Repository that is capable of downloading commonly used OSGi bundles and automatically installing them in the current OSGi container instance.

To see the list of available bundles that can be downloaded by the Felix Bundle Repository, simply type obr:list in the Gogo shell:
g! obr:list
And you should see...
Apache Felix Bundle Repository (1.6.6, ...)
Apache Felix Configuration Admin Service (1.2.4, ...)
Apache Felix Declarative Services (1.6.0, ...)
Apache Felix EventAdmin (1.0.0)
Apache Felix File Install (3.0.2, ...)
Apache Felix Gogo Command (0.10.0, ...)
Apache Felix Gogo Runtime (0.10.0, ...)
Apache Felix Gogo Shell (0.10.0, ...)
Apache Felix Gogo Shell Commands (0.2.0)
Apache Felix Gogo Shell Console (0.2.0)
Apache Felix Gogo Shell Launcher (0.2.0)
Apache Felix Gogo Shell Runtime (0.2.0)
Apache Felix Http Api (2.0.4)
Apache Felix Http Base (2.0.4)
Apache Felix Http Bridge (2.0.4)
Apache Felix Http Bundle (2.0.4)
Apache Felix Http Jetty (2.0.4)
Apache Felix Http Proxy (2.0.4)
Apache Felix Http Samples - Filter (2.0.4)
Apache Felix Http Samples - Whiteboard (2.0.4)
Apache Felix HTTP Service Jetty (1.0.1, ...)
Apache Felix Http Whiteboard (2.0.4)
Apache Felix iPOJO (1.8.0, ...)
Apache Felix iPOJO (0.8.0)
Apache Felix iPOJO API (1.6.0, ...)
Apache Felix iPOJO Arch Command (1.6.0, ...)
Apache Felix iPOJO Composite (1.8.0, ...)
Apache Felix iPOJO Composite (1.0.0, ...)
Apache Felix iPOJO Event Admin Handler (1.8.0, ...)
Apache Felix iPOJO Extender Pattern Handler (1.4.0, ...)
Apache Felix iPOJO Extender Pattern Handler (1.0.0, ...)
Apache Felix iPOJO Gogo Command (1.0.1, ...)
Apache Felix iPOJO JMX Handler (1.4.0, ...)
Apache Felix iPOJO Temporal Service Dependency Handler (1.6.0, ...)
Apache Felix iPOJO URL Handler (1.6.0, ...)
Apache Felix iPOJO WebConsole Plugins (1.6.0, ...)
Apache Felix iPOJO White Board Pattern Handler (1.2.0, ...)
Apache Felix iPOJO White Board Pattern Handler (1.6.0, ...)
Apache Felix Log Service (1.0.0)
Apache Felix Metatype Service (1.0.2, ...)
Apache Felix Prefrences Service (1.0.2)
Apache Felix Remote Shell (1.0.4, ...)
Apache Felix Remote Shell (1.1.2, ...)
Apache Felix Shell Service (1.4.2, ...)
Apache Felix Shell TUI (1.4.1, ...)
Apache Felix UPnP Base Driver (0.8.0)
Apache Felix UPnP Extra (0.4.0)
Apache Felix UPnP Tester (0.4.0)
Apache Felix Web Console Event Plugin (1.0.2)
Apache Felix Web Console Memory Usage Plugin (1.0.0)
Apache Felix Web Console Memory Usage Plugin (1.0.2)
Apache Felix Web Console UPnP Plugin (1.0.0)
Apache Felix Web Management Console (3.1.2, ...)
Apache Felix Web Management Console (3.1.2, ...)
OSGi OBR Service API (1.0.0)
OSGi R4 Compendium Bundle (4.0.0)
Servlet 2.1 API (1.0.0)
The Apache Felix Configuration Admin Service is the final dependency we need to install.  We can install the dependency using the obr:deploy command, supply the argument org.apache.felix.configadmin as the bundle we want deployed:
g! obr:deploy org.apache.felix.configadmin
Which displays...
Target resource(s):
-------------------
   Apache Felix Configuration Admin Service (1.2.4)

Optional resource(s):
---------------------
   Apache Felix Log Service (1.0.0)

Deploying...
done.
Very nice.  Remember, you can check to see if the bundle is installed by using the lb command (lb = "list bundles").  Now let's stop the container.  At the time of this post, I have not yet found an elegant way to terminate the container.  Felix use to have a command called "shutdown", but I cannot find it anymore.  One can always press control-c if the process is running in a terminal, but I found a more elegant (but ugly) way.  If you execute lb, you will see that the first bundle (0, or zero) is the System Bundle.  All other bundles require the system bundle.  So, if you tell the framework to stop the System Bundle, it will shutdown the container and all other bundles dependent on it:
g! stop 0
Your container should yield control back to BASH (or DOS for the poor Windows users).

Even though people will tell you using the auto-deploy feature is not the best way to install a bundle, let's live dangerously and drop our timeprovider bundle into Felix's bundle directory.  Your bundle directory should look like this now:


Let's start Felix up again and see what happens...
osgi
Results in...
com.berico.timeprovider Started
____________________________
Welcome to Apache Felix Gogo

g! 
Wicked! As you saw in the Activator class, when the bundle started (literally the start method was called), we printed the following message out to the console: com.berico.timeprovider Started.

So, if the bundle started successfully, and we saw our start message, we should see the stop message when we tell the container to stop the bundle.  Let's try it out.

Find the id of the bundle using the lb command:
g! lb
START LEVEL 1
   ID|State      |Level|Name
    0|Active     |    0|System Bundle (4.0.2)
    1|Active     |    1|Joda-Time (2.0.0)
    2|Active     |    1|Apache Felix Bundle Repository (1.6.6)
    3|Active     |    1|Apache Felix Gogo Command (0.12.0)
    4|Active     |    1|Apache Felix Gogo Runtime (0.10.0)
    5|Active     |    1|Apache Felix Gogo Shell (0.10.0)
    6|Installed  |    1|Apache Felix Configuration Admin Service (1.2.4)
    7|Installed  |    1|Apache Felix Log Service (1.0.0)
    8|Active     |    1|Berico Time Provider (0.0.1.SNAPSHOT)
Now tell the container to stop the bundle supplying the timeprovider's bundle id (in this case, number 8):
g! stop 8
com.berico.timeprovider Stopped
Viola! Our stop message is displayed as expected.

If you gotten this far (assuming you are doing this self and not cheating by simply reading this post), congratulate yourself.  You've created an OSGi bundle, using Maven, and deployed it to an OSGi container.  Furthermore, you've done some cool stuff at the Gogo command line, like installing a bundle using the Felix Bundle Repository.

6.  Registering a Service in OSGi.

Now it's time to start playing with services, the core reason you probably want to learn OSGi.  We created our TimeProvider interface for a reason!  We want to expose this interface as an OSGi service that another bundle can consume.  So how do we do this?  Well, it actually quite easy as you will see.

The only class that needs to change in order to provide an instance of TimeProvider as a service for other bundles is the Activator class.  These changes are reflected below:
package com.berico.timeprovider.internal;

import java.util.Dictionary;
import java.util.Hashtable;

import org.osgi.framework.BundleActivator;
import org.osgi.framework.BundleContext;
import org.osgi.framework.ServiceRegistration;

import com.berico.timeprovider.api.TimeProvider;

public class Activator implements BundleActivator {
  
  //Save a handle to the service we registered
  private static 
     ServiceRegistration timeProviderRegistration = null;
  
  public void start(BundleContext bundleContext) throws Exception {
    
    TimeProvider timeProvider = new TimeProviderImpl();
    
    // Ignore this ugliness
    Dictionary properties = new Hashtable();
    
    timeProviderRegistration = 
      bundleContext.registerService(
        TimeProvider.class.getName(), timeProvider, properties);  
    
    System.out.println("com.berico.timeprovider Started");
  }

  public void stop(BundleContext bundleContext) throws Exception {
    
    if(timeProviderRegistration != null){
      
      bundleContext.ungetService(
        timeProviderRegistration.getReference());
    }
    
    System.out.println("com.berico.timeprovider Stopped");
  }
}
A couple of things to note about the changes we've made to the Activator.  We interact with the OSGi environment through the BundleContext.  This includes registering and unregistering services.  To register a service, you simply need to call the registerService method on the BundleContext.  The method takes three arguments.  The first is the String name of the service.  The second parameter is the service itself (implementation).  Finally, we supply metadata to the OSGi framework about the service we are registering.  This metadata is important for certain aspects of OSGi, including bundle queries, where you can search for bundles by the metadata they provide.  In our case, we have nothing useful to supply, so we give it an empty (but initialized) Dictionary collection.

If you've done any serious work in Java, you are probably wondering why we are not using Java Generics.  The OSGi framework was originally created to support mobile phone applications before Generics were introduced on the Java platform.  Since the specification targets platforms pre-Java 5, we are stuck "boxing" and "unboxing" objects and using non-generic collections.  From what I understand, the newest specification for OSGi will include an API supporting generics, cleaning up this ugliness.

Now that we've learned how to register a service with the OSGi container, let's create a new bundle that will consume the service.

7.  Consuming a Service in OSGi.

Without going through the steps all over again, I'm going to create a new, separate bundle called timeconsumer.  In the new timeconsumer project, I'm only going to create one BundleActivator class (and nothing else), which I will unimaginatively call Activator:
package com.berico.timeconsumer;

import org.osgi.framework.BundleActivator;
import org.osgi.framework.BundleContext;
import org.osgi.framework.ServiceReference;

import com.berico.timeprovider.api.TimeProvider;

public class Activator implements BundleActivator {

  private static ServiceReference timeProviderReference = null;
  
  public void start(BundleContext bundleContext) throws Exception {
    
    timeProviderReference
      = bundleContext.getServiceReference(
        TimeProvider.class.getName());
    
    if(timeProviderReference != null){
    
      TimeProvider timeProvider = 
        (TimeProvider)bundleContext.getService(timeProviderReference);
      
      System.out.println("com.berico.timeconsumer Started");

      System.out.println(
        String.format("Current Time is %s", timeProvider.getTime()));
    }
    else {
      
      System.out.println("Could not find a valid Time Provider");
    }
  }

  public void stop(BundleContext bundleContext) throws Exception {
    
    if(timeProviderReference != null){
      
      bundleContext.ungetService(timeProviderReference);
    }
    
    System.out.println("com.berico.timeconsumer Stopped");
  }

}
The Activator class is first grabbing a ServiceReference to the service we want to consume.  The ServiceReference contains the service id along with information about the bundle the service was registered from.  We next ensure the reference is not null.  If it is null, we print a message to the console saying we couldn't find the time provider.  If the reference is set, we go back to the BundleContext grabbing the service.  Remember that OSGi doesn't currently support generics, so we have to cast the object that is returned to the appropriate type.  Finally, we print the time to the console, provided by the time provider instance.

The hard part, believe it or not, is the pom.xml and not the Activator class.  There are some little tricks we have to do in order to get our timeconsumer to use the timeproducer as a dependency.  The first thing we need to do is install our timeproducer as an artifact in the local Maven repository.  This is a simple command that you can execute within the root directory of the timeproducer project:
mvn install:install
With the timeproducer bundle installed, we need to add a dependency to the timeproducer project in our timeconsumer's pom.xml:
<dependency>
  <groupId>com.berico.time</groupId>
  <artifactId>timeprovider</artifactId>
  <version>0.0.1-SNAPSHOT</version>
</dependency>
Adding the dependency is only the first step.  Since we depend on the timeproducer to start and register the TimeProvider implementation with the OSGi container before we attempt to consume the service in the timeconsumer bundle, we will need to add a special element to the plugin > configuration > instructions of the Maven Bundle Plugin configuration:
<Require-Bundle>com.berico.time.provider</Require-Bundle>
The Require-Bundle attribute essentially tells the OSGi container not to start the bundle until the bundles referenced in the value of the property are started.  The strange thing you might notice is the name of the bundle: com.berico.time.provider.  The groupId in our timeprovider Maven configuration is com.berico.time and our name is timeprovider.  The bundle.symbolicName property in the pom.xml is com.berico.timeprovider.  However, when the bundle is created by Maven, the MANIFEST.MF is written with the Bundle-SymbolicName as com.berico.time.provider.  I suspect there is an error with Maven Bundle Plugin causing it to improperly output the symbolic name.  If you are ever unsure what to use, the simple way to find out the symbolic name of a bundle is to unzip the bundle and open the MANIFEST.MF in a text editor.  If the bundle in question will install in the OSGi container, you can also execute the headers command of the Gogo shell, followed by the id of the bundle to inspect:
g! headers 19
Which Produces:
Berico Time Provider (19)
-------------------------
Bnd-LastModified = 1327295763667
Build-Jdk = 1.6.0_29
Built-By = rclayton
Bundle-Activator = com.berico.timeprovider.internal.Activator
Bundle-ManifestVersion = 2
Bundle-Name = Berico Time Provider
Bundle-SymbolicName = com.berico.time.provider
Bundle-Version = 0.0.1.SNAPSHOT
Created-By = Apache Maven Bundle Plugin
Export-Package = com.berico.timeprovider.api;version="0.0.1.SNAPSHOT"
Import-Package = com.berico.timeprovider.api;version="[0.0,1)",org.joda.time;version="[2.0,3)",org.osgi.framework;version="[1.3,2)"
Manifest-Version = 1.0
Tool = Bnd-1.50.0
g! 
Now let's drop the timeconsumer bundle in the auto-deploy directory of Felix.
Start up the OSGi container:
osgi
And you should see...
com.berico.timeprovider Started
Current Time: 2012-01-24T02:48:58.374Z
com.berico.timeconsumer Started
Current Time is 2012-01-24T02:48:58.395Z
____________________________
Welcome to Apache Felix Gogo

g! 
Success!  We have now not only produced a service in OSGi, but also consumed one.  Next, we will learn how to use the Configuration Admin specification of OSGi to dynamically receive configuration updates.

8.  Consuming Configuration Updates.

Once you've learned how to create bundles and register and consume services, I find the next thing most engineers want to learn how to do is pass configuration to there services.  Who can argue with that?  I mean, what's the point of creating highly modular systems if you hard code the configuration those module rely on?

There are a number of ways to configure an OSGi application; here are a couple:

Embed your settings in a property file.

This is somewhat idiomatic for Java.  I mean, it's incredibly simple to call System.getProperty to pull the desired configuration value from the environment.  The Felix container even offers a global properties file you can use to bundle all of this configuration data (conf/config.properties).  In many cases this works if you know what your configuration is going to be before you start the container.

Create and register a configuration service with the OSGi container.

Another option is to create a bundle and some sort of API around configuration.  For instance, you could register a Config object with the container, which is consumed by the service/bundle you need configured.  This adds an interesting benefit because you could prevent a the bundle from activating unless the Config object becomes available.  You can also change the configuration by recompiling the bundle with the Config object and updating the bundle, which should in turn update the dependent service.

Use the OSGi Specification for Configuration Admin.

The Configuration Admin specification for OSGi formalizes the previous notion (services dedicated for configuration) providing a nice model for handling these configuration updates, along with a persistence mechanism that ensure updates remain once they are set.  More importantly, the Configuration Admin service provides hooks for allowing configuration updates without needing to create special configuration bundles.  A service that wants to use the Configuration Admin service merely needs to implement the ManagedService interface.  This interface is made available to us from the org.apache.felix.configadmin artifact we referenced as a dependency in our Maven pom.xml file earlier in the post.

I've taken the liberty to implement the ManagedService interface and its corresponding update method:
package com.berico.timeprovider.internal;

import java.util.Dictionary;
import java.util.Hashtable;

import org.osgi.framework.BundleActivator;
import org.osgi.framework.BundleContext;
import org.osgi.framework.Constants;
import org.osgi.framework.ServiceRegistration;
import org.osgi.service.cm.ConfigurationException;
import org.osgi.service.cm.ManagedService;

import com.berico.timeprovider.api.TimeProvider;

public class Activator implements BundleActivator, ManagedService {

  private static ServiceRegistration timeProviderRegistration = null;
  private static ServiceRegistration configRegistration = null;
  private static TimeProvider timeProvider = new TimeProviderImpl();
  
  public void start(BundleContext bundleContext) throws Exception {
    
    // Properties for the TimeProvider registration
    Dictionary registrationProperties = new Hashtable();
    
    timeProviderRegistration = 
      bundleContext.registerService(
        TimeProvider.class.getName(), 
        timeProvider, 
        registrationProperties);  
    
    //Properties for the Config Admin Registration
    Dictionary configProperties = new Hashtable();
    configProperties.put(Constants.SERVICE_PID, "timeprovider.pid");
    
    configRegistration = 
      bundleContext.registerService(
        ManagedService.class.getName(), 
        this, configProperties);
    
    System.out.println("com.berico.timeprovider Started");
    
    printTime();
  }

  public void stop(BundleContext bundleContext) throws Exception {
    
    unregister(timeProviderRegistration, bundleContext);
    
    unregister(configRegistration, bundleContext);  
    
    System.out.println("com.berico.timeprovider Stopped");
  }
  
  public void updated(Dictionary configuration) throws ConfigurationException {
    
    System.out.println("Updating Time Provider Configuration.");
    
    if(configuration != null){
      
      String timezone = configuration.get("timezone").toString();
      
      System.out.println(
          String.format("Changing Time Zone to: %s", timezone));
      
      try {
        
        timeProvider.setDefaultTimeZone(timezone);
      
        printTime();
        
      } catch (Exception e) {
        
        System.out.println(
          String.format(
            "Could not set the provided timezone [%s]; reason: %s", 
            timezone, 
            e.getMessage()));
      }
      
    }
    else {
      
      System.out.println("Time Service: Configuration Null");
    }
  }
  
  private static void unregister(
      ServiceRegistration registration, BundleContext bundleContext){
    
    if(registration != null){
      
      bundleContext.ungetService(registration.getReference());
    }
  }
  
  private static void printTime(){
    
    System.out.println(
      String.format("Current Time: %s", timeProvider.getTime()));
  }
  
}
Now that you've see how we can consume configuration updates, I'm going to demonstrate how to produce an update.  Instead of creating a whole new bundle to make this happen, I'm going to show how to extend the Apache Felix Gogo Shell to serve as a mechanism for updating our timeproducer service.

9.  Writing Menu Extensions for the Apache Felix Gogo shell.

Writing Shell Extensions for the Felix Gogo shell is surprising easy.  Shell commands are actually methods on a POJO, and Gogo does some clever Reflection work to turn those commands into executable actions.

Let's create a simple menu extension that prints the time on command, as well as, sets the default timezone:
package com.berico.timeprovider.internal;

import java.io.IOException;
import java.util.Dictionary;
import java.util.Hashtable;

import org.apache.felix.service.command.Descriptor;
import org.osgi.framework.BundleContext;
import org.osgi.framework.ServiceReference;
import org.osgi.service.cm.Configuration;
import org.osgi.service.cm.ConfigurationAdmin;

import com.berico.timeprovider.api.TimeProvider;

public class ShellCommands {

  private final BundleContext bundleContext;
  private ConfigurationAdmin configAdmin;
  
  ShellCommands(BundleContext bundleContext){
    
    this.bundleContext = bundleContext;
    
    getConfigAdmin();
  }
  
  @Descriptor("Show the current time.")
  public void show(){
    
    ServiceReference timeProviderReference = 
      bundleContext.getServiceReference(
        TimeProvider.class.getName());
    
    TimeProvider timeProvider = 
      (TimeProvider)bundleContext.getService(
        timeProviderReference);
    
    System.out.println(timeProvider.getTime());
  }
  
  @Descriptor("Set the Default Time Zone for the Time Provider")
  public void timezone(
      @Descriptor("Valid Time Zone") 
        String timezone){
    
    System.out.println(
      String.format("Setting Time Zone: %s", timezone));
    
    ServiceReference configurationAdminReference = 
            bundleContext.getServiceReference(
              ConfigurationAdmin.class.getName());  
    
    Configuration config = null;
    
    try {
    
      config = configAdmin.getConfiguration(
              "timeprovider.pid");
      
      Dictionary props = config.getProperties();
      
      if (props == null) {
          props = new Hashtable();
      }

      props.put("timezone", timezone);

      config.update(props);
      
    } catch (IOException e) {
      
      e.printStackTrace();
    }
    
  }
  
  private void getConfigAdmin(){
    
    if(this.configAdmin == null){
    
      ServiceReference configurationAdminReference = 
              this.bundleContext.getServiceReference(
                ConfigurationAdmin.class.getName()); 
    
      if (configurationAdminReference != null) 
          {  
        
              this.configAdmin 
                = (ConfigurationAdmin) 
                  bundleContext.getService(
                    configurationAdminReference);  
          }
    }
  }
}
In this example, the methods show and timezone will serve as the "executables" the OSGi containers call when a command is typed in the Gogo shell.  In the show method, we grab an instance of the TimeProvider instance from the OSGi context and print the current time to the shell.  In the timezone method, instead of retrieving the service directly from the OSGi context, we instead grab a reference to the ConfigurationAdmin service, create a new set of properties for the timeprovider service (we refer to the service using the "pid" of the timeprovider, what I've simply called "timeprovider.pid"), and tell the ConfigurationAdmin service to update the properties for that service.  This will call the update method on our Activator class, which will in turn set the default time zone.

The last thing to mention is the @Descriptor annotation.  This is used by the Gogo shell to provide contextual help for the command.  If the annotation is placed over the method name, the description applies to the command.  If the annotation is on a function argument, it applies to one of the command's parameters.

In order to demonstrate the difference in time zone when we make changes to the default time zone stored within the TimeProvider, I'm going to make a small modification to the TimeProviderImpl.getTime method:
private String getTime(DateTimeZone dtz){
  
  DateTime dt = DateTime.now(dtz);
  
  return dt.toString("ZZZ - yyyy-MM-dd HH:mm:ss");
}
We register the ShellCommands class like any other service in OSGi.  You might be wondering, how does the Felix Gogo Shell know to "pick up" the service after it's registered with the container?  This is another example of when the properties (metadata) provided to the container is useful for service consumption.  Let's modify timeprovider's Activator class to register the ShellCommands service with the container:
//Properties for the Gogo Shell
Dictionary shellCommandsProperties = new Hashtable();
    
shellCommandsProperties.put("osgi.command.scope", "time");
    
shellCommandsProperties.put("osgi.command.function", 
  new String[] {"show", "timezone"});
    
bundleContext.registerService(
  ShellCommands.class.getName(), 
  new ShellCommands(bundleContext), 
  shellCommandsProperties);
Adding the osgi.command.scope and osgi.command.function properties to the service registration will provide the appropriate information to the Felix Gogo plugin to detect and install the ShellCommands instance as a viable addition to the menu.  Using some nifty Reflection, the Felix Gogo service can pull the description of the commands and their parameters from the annotations.  Reading the signature of the method, the Felix Gogo service can determine the number and types of parameters required to execute the function, mapping those parameters onto the particular method instance.

10.  Dynamically configuring a Service using the Gogo shell.

Now we are ready to demonstrate how to dynamically configure a service using Gogo shell.  Before we go any further, it would be wise to uninstall the timeprovider and timeconsumer bundles from the OSGi container, recompile and package them, and place the new versions in the auto-deploy directory of Apache Felix.
Updating Time Provider Configuration.
Time Service: Configuration Null
com.berico.timeprovider Started
Current Time: UTC - 2012-01-24 23:07:37
com.berico.timeconsumer Started
Current Time is UTC - 2012-01-24 23:07:37
____________________________
Welcome to Apache Felix Gogo

g!
Let's first start by seeing if our commands were loaded into the Gogo shell.  We can see a list of commands by typing the "help" command:
g! help
The Felix Gogo Shell will dump a list of commands:
felix:bundlelevel
felix:cd
felix:frameworklevel
felix:headers
felix:help
felix:inspect
felix:install
felix:lb
felix:log
felix:ls
felix:refresh
felix:resolve
felix:start
felix:stop
felix:uninstall
felix:update
felix:which
gogo:cat
gogo:each
gogo:echo
gogo:format
gogo:getopt
gogo:gosh
gogo:grep
gogo:not
gogo:set
gogo:sh
gogo:source
gogo:tac
gogo:telnetd
gogo:type
gogo:until
obr:deploy
obr:info
obr:javadoc
obr:list
obr:repos
obr:source
time:show
time:timezone
g! 
At the bottom you will notice our two commands prefixed by time.  If you recall, time was the osgi.command.scope (think namespace).  Our two methods, registered as an array of Strings with the osgi.command.function key, appear after the scope.

We can inspect the contextual help message of any command by calling help again and supplying the name of the command:
g! help show
Which displays:
show - Show the current time.
   scope: time
And the help message for the timezone command:
g! help timezone
Which displays:
timezone - Set the Default Time Zone for the Time Provider
   scope: time
   parameters:
      String   Valid Time Zone
Keep in mind that you can always prefix a command with it's scope (e.g.: time:timezone; if there isn't another command with the same name, you can remain lazy like me and simply use the command name.

Let's execute the timezone command supplying America/Phoenix (a Joda Time constant) for our new timezone:
g! timezone America/Phoenix
We get a response first from inside our ShellCommands.timezone method notifying us that the timezone is being set, followed by notifications within the Activator.update method indicating it has received new configuration:
Setting Time Zone: America/Phoenix
Updating Time Provider Configuration.
Changing Time Zone to: America/Phoenix
Current Time: America/Phoenix - 2012-01-24 16:11:37
We also have the option to display the time at any time by calling the show command:
g! time:show
Which displays:
America/Phoenix - 2012-01-24 17:49:03

Conclusion


Although this was a very long post, my intent was to provide a complete introduction into creating an OSGi bundle in Maven, registering and consuming services, using the ConfigurationAdmin service, and manipulating the Apache Felix Gogo shell.  In this tutorial, I didn't spend a whole lot of time talking about the theory and mechanics of OSGi.  Thanks for your time and patience!

For an in depth look at OSGi, I recommend the following books:

OSGi in Depth
http://www.manning.com/alves/
OSGi in Action
http://www.manning.com/hall/