Sunday, September 30, 2012

Introducing "CLAVIN" (Cartographic Location And Vicinity INdexer)

What is a Geotagger?


If your work involves finding meaning in unstructured text, you may have at one time or another, worked with semantic technologies like entity extraction.  An Entity Extractor promotes words in text to concepts; this is typically realized in the form of entity tagging, where an ontology is associated with a word or phrase (e.g. PERSON, PLACE, TIME, ORGANIZATION).  Once entities have been "tagged", the next step is to "resolve" them to a global concept or entity (entity resolution).  For instance, we not only want to know that "Barack Obama" is a PERSON, we also want any reference to Barack Obama to point to one "identifier".  This way, we can associate an entity across all of documents they have occurred in, allowing us to do things like build a global graph of concepts or perform faceted searches against those concepts.

Perhaps one of the most important forms of entity resolution is associating locations to geographic coordinates, commonly known as Geotagging (which may also encompass the entity resolution step).  For instance, we not only want to know that New York City is a LOCATION, but also, that it's center latitude and longitude is 40.7142° N, 74.0064° W, the location is in the "New York State" administrative district, and in the country of the United States of America.  More sophisticated resolution techniques might even include the polygonal boundaries of the location, and an association to a semantic graph of concepts related to the city and it's history.

The Problem


For years the Geotagging market has been dominated by a very small number couple commercial products; many entity extractors can identify locations, but few actually resolve that location to a fixed point in space.   As far as I'm aware, the most used is MetaCarta (http://www.metacarta.com/), one I have personally used on a number of projects.  MetaCarta is really good in terms of accuracy and features; in fact, most systems I've seen MetaCarta deployed in only use about 25% of its featurs.  The problem with MetaCarta is that it is expensive (I don't have figures, I've only seen my customers cringe when talking about price).  

Yahoo also has a Geotagger in the form of a web service offering called Placemaker (http://developer.yahoo.com/geo/placemaker/).   For us, Placemaker has never been a viable solution since it can't be deployed on an internal network, and doesn't fit well with our architectural use cases.  Placemarker also doesn't seemed to be well tuned to our corpora, meaning it's has lower than ideal precision in extraction. 

Outside of the commercial space, there are no viable open source alternatives.  A quick Google Search for open source Geotaggers will probably return Geodict (https://github.com/petewarden/geodict), a GitHub project by Pete Warden.  Pete combines a Gazetteer (geospatial dictionary) with some simple rules for locating potential places in a sentence (presence of key words like "in" or "near") in a brute force approach to solving the problem.  Unfortunately, Geodict's approach doesn't take semantic meaning of the sentence into account when locating potential "place" words, and it doesn't perform the resolution step (differentiating locations by context: the "Springfield" problem).

Introducing CLAVIN


Necessity is the mother of invention.  - Unknown

Early this year, our company found itself desperately in need of a Geotagger.  Our enterprise search application, built around geospatial faceting, lacked the geospatial entities we needed to do the faceting.  We had documents, just no geospatial tagging.  Architecturally, an upstream component in the ETL pipeline was supposed to provide this capability (using MetaCarta), but for one reason or another, the team producing that capability was not going to be able to make the delivery timeframe (I should add that this was no fault of the MetaCarta product).

One of our Berico Technologies' Data Scientists, Charlie Greenbacker, was working on another, unrelated problem involving the resolution of country names across a collection of structured datasets (Excel and CSV documents) that we were "mashing together" so we could do analysis across datasets. Recognizing that both problems had a similar solution, Charlie began work on a homegrown geotagger that eventually became CLAVIN: Cartographic Location And Vicinity INdexer (http://clavin.bericotechnologies.com/).

What is CLAVIN?


It's a geotagger (and resolver).  Architecturally, CLAVIN is extremely simple.  CLAVIN was written in Java, but can be bundled in a Java Web Application as a web service allowing any application to access it (as seen in our CLAVIN-Web demonstration).  

CLAVIN has a simple workflow.  An EntityTagger is used to find unresolved (string) PEOPLE, PLACES, and LOCATIONS from a string (multiline, complex quotation, etc.).  Once those entities are extracted from the text, they are passed to a LocationResolver that returns the most confident match (ResolvedLocation) for each location in the set.

The default EntityTagger implementation in CLAVIN is the Apache OpenNLP framework.  Apache OpenNLP is the most license friendly framework we could bundle with CLAVIN; the most accurate EntityTagger we have implemented is one utilizing the Stanford NER, which we don't provide outside of service contracts since it's GPL.

Our default LocationResolver uses a custom Apache Lucene index of the GeoNames Gazetteer (http://www.geonames.org/).  The LocationResolver includes tunable algorithms for performing fuzzy and probabilistic matching of locations.  Since you sacrifice performance for accuracy and vice versa, the LocationResolver is a great abstraction for a number of strategies you may need to employ in your system.  Another benefit of CLAVIN is that we maintain the resolver index (one less thing you need to worry about).

Code Example


This is how simple it is to use CLAVIN under the current API.  Keep in mind there will be some changes before it's official release mid October.
// Location of the initializer for Stanford NER
String classifierModelPath = 
  "/location/of/classifier/all.3class.distsim.crf.ser.gz";
    
// Needed by Stanford NER Implementation
SequenceClassifierProvider classifierProvider 
  = new ExternalSequenceClassiferProvider(classifierModelPath);

// Initialize the Tagger (Sorry, but I'm demonstrating the Stanford NER)
// Tagger at the moment.  Will update with OpenNLP ASAP.
EntityTagger entityTagger = new NerdEntityTagger(classifierProvider);

// Location of the Location Resolver Index
String locationResolverIndexPath = "/location/of/index/IndexDirectory";

// Instantiate the Location Resolver
LocationResolver locationResolver 
  = new LocationResolver(
    new File(locationResolverIndexPath), 3, 5);

// Nothing magic here, just a couple of sentences.
String text = getText();

// Tag the text
TaggedDocument taggedDocument = entityTagger.tagDocument(text);

System.out.println(String.format("%s locations found",
  taggedDocument.getLocations().size()));

// Resolve the locations from the extracted locations
List<ResolvedLocation> resolvedLocations = 
    locationResolver.resolveLocations(taggedDocument.getLocations());

for(ResolvedLocation resolvedLocation : resolvedLocations){
  
  System.out.println(
    String.format("%s (%s, %s)", 
      resolvedLocation.matchedName, 
      resolvedLocation.geoname.latitude, 
      resolvedLocation.geoname.longitude));
}

The getText() method simply returns the following string:


I visited the Sears Tower in Chicago only to find out there were exciting attractions in Springfield.  After Springfield, Chuck and I drove east through Indiana to West Virginia, stopping in Harper's Ferry.  We finally made it to our destination in Washington, DC on Tuesday.


And we get the following results on the console:


7 locations found
Chicago (41.85003, -87.65005)
Springfield (39.80172, -89.64371)
Springfield (39.80172, -89.64371)
Indiana (40.00032, -86.25027)
West Virginia (38.50038, -80.50009)
Washington (38.89511, -77.03637)
DC (38.91706, -77.00025)



More Information


If you want to know more about CLAVIN, Charlie will be speaking at GEOINT 2012 in Orlando, FL  (October 8-11) and hopefully at Strata Santa Clara next year (YouTube proposal below):



If you have any other questions, just leave me comment.  

Batch Processing Movies with Ruby and the HandBrake Command Line Interface

A project I was recently working on required a lot of video transformation work, of which, 98% could be automated with the right tools.  More importantly, I was constantly tweaking video settings (aspect ratio, quality), which required me to reprocess the whole batch of videos.  After a serious case of carpal tunnel from repetitively clicking the same commands in HandBrake, I decided I needed to find a better strategy.

Back in the day, I used to be pretty savvy with encoding tools, but ever since I discovered HandBrake (http://handbrake.fr/), I've never really had the need to keep up.  After looking at FFMPEG and a couple of other CLI tools, I discovered HandBrake had a CLI interface (http://handbrake.fr/downloads2.php) that was, like the GUI, ridiculously easy to use.

For example, my task included shrinking a movie to 560x416 pixels and stripping the audio:
HandBrakeCLI -i input.mp4 -o output.mp4 -e x264 -2 -O -q 20 \
  -a none -w 560 -l 416 --modulus 16 --loose-anamorphic
This is a breakdown of the task's parameters:
  • -i = input file
  • -o = output file
  • -e = encoder
  • -2 = two pass encoding
  • -O = optimize for HTTP streaming
  • -q = quality
  • -a = audio
  • -w = width
  • -l = height
  • --modulus = ratio for resizing the video
  • --loose-anamoprhic = ensures dimensions are resized cleanly by the modulus
The CLI is pretty simple, and the documentation is uncharacteristically (for open source)  complete.  To see a more complete list of arguments for the CLI, please see:  https://trac.handbrake.fr/wiki/CLIGuide.

Now all we need to do is throw in some automation.  This could be done in a number of languages (BASH, Python, Perl, NodeJS), but for tasks like this, I prefer Ruby.

The following is an easy little script for looping over all of the mp4's in a directory and executing a HandBrake encoding (in my case two):
def get_base_name(video)

  video[0, video.index(".mp4")]
end

Dir.glob("*.mp4") do |input_video|

  base_name = get_base_name input_video

  output_video_med = "../#{base_name}_560x416.mp4"

  output_video_sml = "../#{base_name}_224x160.mp4"

  settings = "-e x264 -2 -O -q 20 -a none"

  picture_med = "-w 560 -l 416 --modulus 16 --loose-anamorphic"

  picture_sml = "-w 224 -l 160 --modulus 16 --loose-anamorphic"

  command_med = "HandBrakeCLI -i #{input_video} -o #{output_video_med} #{settings} #{picture_med}"

  command_sml = "HandBrakeCLI -i #{input_video} -o #{output_video_sml} #{settings} #{picture_sml}"

  puts `#{command_med}`

  puts `#{command_sml}`

end
If you aren't familiar with Ruby, a statement encapsulated with back-ticks (`statement`) will be executed on the command line.

As you can see, the process is pretty simple and is likely to save you a "boat load" of time if you have to perform repetitive video processing tasks.  I can also see this process being integrated into a solution that involves automatically transcoding videos for a rich media site that allows users to upload videos.


Monday, September 24, 2012

Rules Engine Patterns - Part 4: Result Compilation

Instead of directly reacting to rule conditions, or directly mutating the model, in this pattern you will record and collect outcomes of rule conditions.  When the rule session is complete, the application will collect the results and do something with them (e.g.: save them to a database or forward them to different services).

Benefits:
  • Keep a history of the rule outcomes without needing to mutate the model objects inserted into the session.
Disadvantages:
  • A lot more scaffolding to employ.  You will need to create a model around the "plausible outcomes" for your rules, as well as, the scaffolding to collect those outcomes and react to the results.
Example:  Rich's Parcel and Post Service needs to route packages based on a number of rules governed by a package's weight and distance from the company's collection center.  If the package is less than 200 lbs., and within 500 miles of the collection center, the package will be delivered locally (dropped in a mail box!).  If the package is less than 200 lbs., but more than 500 miles away, it will be delivered via Air Mail.  Finally, if the package is more than 200 lbs., it will be delivered by train.

The implementation of the routing system will be performed using decorators, which will allow us to easily wrap the original parcel with new functionality (although, we will really only use it to distinguish the class type).

Parcel.java

A simple interface for Parcels in our package handling system.
package com.berico.rc;

public interface Parcel {

 public abstract double getWeight();

 public abstract int getDestinationZipCode();

}

BaseParcel.java

A simple implementation of the Parcel interface.  This class will be used initially for all packages prior to some routing determination being performed by the rules engine.
package com.berico.rc;

public class BaseParcel implements Parcel {

 private double weight = -1;
 
 private int destinationZipCode = -1;

 public BaseParcel(double weight, int destinationZipCode) {
  this.weight = weight;
  this.destinationZipCode = destinationZipCode;
 }

 @Override
 public double getWeight() {
  return weight;
 }

 @Override
 public int getDestinationZipCode() {
  return destinationZipCode;
 }
}

LocalDeliveryParcel.java

A decorator for Parcel, this class represents a package delivered via post office or some other local carrier.
package com.berico.rc;

public class LocalDeliveryParcel implements Parcel {

 private Parcel originalParcel = null;

 public LocalDeliveryParcel(Parcel originalParcel) {

  this.originalParcel = originalParcel;
 }

 public Parcel getOriginalParcel() {
  return originalParcel;
 }

 @Override
 public double getWeight() {
  
  return originalParcel.getWeight();
 }

 @Override
 public int getDestinationZipCode() {
  
  return originalParcel.getDestinationZipCode();
 }
 
 public void routeToPostOffice(){
  
  System.out.println(
   "Dropping package off in a mail box.");
 }
}


TrainParcel.java

A decorator for Parcel, this class represents a package delivered via freight train.
package com.berico.rc;

public class TrainParcel implements Parcel {
 
 private Parcel originalParcel = null;

 public TrainParcel(Parcel originalParcel) {

  this.originalParcel = originalParcel;
 }

 public Parcel getOriginalParcel() {
  return originalParcel;
 }

 @Override
 public double getWeight() {
  
  return originalParcel.getWeight();
 }

 @Override
 public int getDestinationZipCode() {
  
  return originalParcel.getDestinationZipCode();
 }
 
 public void routeToFreightCar(){
  
  System.out.println(
   "Giving package to hobo, destination: Akron, OH.");
 }
}


AirMailParcel.java

A decorator for Parcel, this class represents a package delivered via cargo plane.

package com.berico.rc;

public class AirMailParcel implements Parcel {

 private Parcel originalParcel = null;

 public AirMailParcel(Parcel originalParcel) {

  this.originalParcel = originalParcel;
 }

 public Parcel getOriginalParcel() {
  return originalParcel;
 }

 @Override
 public double getWeight() {
  
  return originalParcel.getWeight();
 }

 @Override
 public int getDestinationZipCode() {
  
  return originalParcel.getDestinationZipCode();
 }
 
 public void routeToPlane(){
  
  System.out.println(
   "Attaching package to underside of biplane.");
 }
}



ResultCompilation.drl

The rules that govern the package routing system.  You will notice the use of a "function" defined within the ruleset to calculate the distance between the call center and a zip code.  This is, of course, a pretend calculation.

package com.berico.rc

// If only it were this simple.
function double distance(int zipcode){
 return zipcode * 0.01;
}

rule "Local Delivery routing"

  when
    parcel : BaseParcel( 
     weight < 200.0, distance(destinationZipCode) < 500) 
  then
    insert( new LocalDeliveryParcel(parcel) );
end

rule "Train routing"

  when
    parcel : BaseParcel( weight >= 200.0) 
  then
    insert( new TrainParcel(parcel) );
end

rule "Air Mail routing"

  when
    parcel : BaseParcel( 
     weight < 200.0, distance(destinationZipCode) > 500) 
  then
    insert( new AirMailParcel(parcel) );
end

ResultCompilationApp.java

A decorator for Parcel, this class represents a package delivered via post office or some other local carrier.

package com.berico.rc;

import java.util.Collection;

import org.drools.runtime.ObjectFilter;

import com.berico.BaseApp;

public class ResultCompilationApp extends BaseApp {

 @Override
 protected String getRuleFile() {
  
  return "ResultCompilation.drl";
 }

 
 public ResultCompilationApp() {
  super();
  
  // Create a bunch of parcels that need
  // to be routed.
  Parcel[] parcels = new Parcel[]{
   new BaseParcel(100, 90210),
   new BaseParcel(200, 90210),
   new BaseParcel(100, 20110),
   new BaseParcel(500, 87234),
   new BaseParcel(1000, 51234)
  };
  
  // Iterate over the parcels...
  for(Parcel parcel : parcels){
   
   // Inserting each parcel into the
   // rule session.
   getSession().insert(parcel);
  }
  
  // Apply all rules against the parcels
  getSession().fireAllRules();
  
  System.out.println("LOCAL DELIVERY PARCELS........");
  
  // Print the parcels that require Local Delivery
  printParcelInfo(LocalDeliveryParcel.class);
  
  System.out.println("TRAIN PARCELS........");
  
  // Print the parcels that require Train delivery
  printParcelInfo(TrainParcel.class);
  
  System.out.println("AIR MAIL PARCELS........");
  
  // Print the parcels that require Air Mail
  printParcelInfo(AirMailParcel.class);
  
  // Kill the session
  getSession().dispose();
 }
 
 /**
  * Simple predicate to get all objects that match the
  * supplied object type.
  */
 public class ByClassTypeFilter implements ObjectFilter {

  private Class<?> targetClass = null;
  
  public ByClassTypeFilter(Class<?> targetClass){
   this.targetClass = targetClass;
  }
  
  @Override
  public boolean accept(Object object) {
   return object.getClass().equals(targetClass);
  }
 }
 
 /**
  * Get the objects of the particular parcel type from
  * the "Working Memory" of Drools and print them weight
  * and destination zip code to the console.
  * @param parcelClass Type of Parcel to retrieve
  */
 protected void printParcelInfo(Class<? extends Parcel> parcelClass){
  
  // Pull the objects from the rule session,
  // by using a custom predicate that looks
  // for a specific Parcel implementation type.
  Collection<Object> oParcels 
   = getSession().getObjects(
    new ByClassTypeFilter(parcelClass));
 
  // Iterate over the matching objects...
  for(Object oParcel : oParcels){
   
   // Cast the object to the interface.
   Parcel parcel = (Parcel)oParcel;
   
   // Print the parcel information
   System.out.println(
    String.format("Weight: %s, Zip Code: %s", 
     parcel.getWeight(), 
     parcel.getDestinationZipCode()));
  }
 }
 
 public static void main(String[] args){
  
  new ResultCompilationApp();
 }
}

On the console, you should see the following message:
LOCAL DELIVERY PARCELS........
Weight: 100.0, Zip Code: 20110
TRAIN PARCELS........
Weight: 500.0, Zip Code: 87234
Weight: 1000.0, Zip Code: 51234
Weight: 200.0, Zip Code: 90210
AIR MAIL PARCELS........
Weight: 100.0, Zip Code: 90210

Rules Engine Patterns - Part 3: Request Adjudication

In this pattern, the Rules Engine will mutate (adjudicate) model objects (requests) supplying the outcome of rules evaluated during the session.

Benefits:
  • Decisions made by rules engine are directly made on the model inserted into the rules session.
  • Knowledge of what to do based on the results (perhaps an infrastructural issue) can be handled outside of the rules engine.
Disadvantages:
  • There is a direct mutation of the model, which may not be palatable.
  • Blend request/adjudication concepts with the model, which may be a concern outside of the domain.
Example:  The state of Virginia has decided to automate Traffic Court using a set of empirically proven rules to evaluate the guilt of traffic violators.  In this example, SpeedingTicketCases will be adjudicated by the rules engine.  If the rule sets find the defendant guilty, it will simply set a guilty verdict (boolean "isGuilty") on the SpeedingTicketCase (using the "adjudicate" method).

SpeedingTicketCase.java

Represents the case being evaluated by the rules engine (not the speeding ticket itself).  If the defendant is guilty, it will be marked on the case itself.

package com.berico.ra;

public class SpeedingTicketCase {

 private boolean isGuilty;
 
 private String vehicleMake = null;
 
 private int mphOverSpeedLimit = -1;

 public SpeedingTicketCase( 
   String vehicleMake,
   int mphOverSpeedLimit) {

  this.vehicleMake = vehicleMake;
  this.mphOverSpeedLimit = mphOverSpeedLimit;
 }

 public boolean isGuilty() {
  return isGuilty;
 }
 
 public void adjudicate(boolean isGuilty){
  this.isGuilty = isGuilty;
 }

 public String getVehicleMake() {
  return vehicleMake;
 }

 public int getMphOverSpeedLimit() {
  return mphOverSpeedLimit;
 }
}

RequestAdjudication.drl

This is the rule set that will be used to determine whether a defendant is guilty.  In our case, regardless of how fast the car was going over the speed limit, Audis will always be guilty.  No one will believe, however, that a Honda can go more than 30 mph past the speed limit.  In such cases, the defendant will be cleared of the charge.

package com.berico.ra

rule "Automatically guilty if Vehicle is Audi"
  when
    speedingTicketCase : SpeedingTicketCase( vehicleMake == "Audi" )
        
  then
    speedingTicketCase.adjudicate(true);
end

rule "If 30 mph over speed limit and Honda, impossible!"
  when
    speedingTicketCase : SpeedingTicketCase( 
      vehicleMake == "Honda", mphOverSpeedLimit > 30 )
         
  then
    speedingTicketCase.adjudicate(false);
end

RequestAdjudicationApp.java

This application will drive the adjudication session.  We begin by creating some cases and inserting them into the rule session.  After calling "fireAllRules", we will evaluate the result of each case by examining the model objects directly.

package com.berico.ra;

import com.berico.BaseApp;

public class RequestAdjudicationApp extends BaseApp {

 @Override
 protected String getRuleFile() {
  
  return "RequestAdjudication.drl";
 }

 public RequestAdjudicationApp(){
  super();
  
  // Create a case that should be deemed "guilty".
  SpeedingTicketCase maryCase = new SpeedingTicketCase("Audi", 1);
  
  // Insert object into session.
  getSession().insert(maryCase);
  
  // Create a case that should be deemed "innocent".
  SpeedingTicketCase richardCase = new SpeedingTicketCase("Honda", 35);
  
  // Insert object into session.
  getSession().insert(richardCase);
  
  // Evaluate the rules against our objects.
  getSession().fireAllRules();
  
  // Dispose the rule session.
  getSession().dispose();
  
  // Print the results of the verdict for Mary
  System.out.println(
   String.format("Is Mary guilty?: %s", maryCase.isGuilty()));
  
  // Print the results of the verdict for Richard
  System.out.println(
   String.format("Is Richard guilty?: %s", richardCase.isGuilty()));
 }
 
 public static void main(String[] args){
  
  new RequestAdjudicationApp();
 }
}

On the console, you should see the following message:
Is Mary guilty?: true
Is Richard guilty?: false

Rules Engine Patterns - Part 2: Direct-Action


In this pattern, the Rules Engine will react to the state of objects by directly acting on those objects utilizing services injected into the rules context.

Benefits:
  • Very little scaffolding necessary to implement pattern.

Disadvantages:
  • Does not record which rules fired and why.
  • May not scale well unless the underlying service reacting to objects is a proxy to some remote service (e.g. a web service, message bus).

Example:  Bill Lumbergh, VP of Initech, needs to know when one of his engineers has failed to submit a coversheet with his or her TPS Reports.  He has requested the automated system send him an email when this event occurs.

Our model will consist of a TPSReport object and an EmailService (interface).  We will simply print the email to the console when the rule fires.

TPSReport.java

Simple POJO we will use to model the TPS report.

package com.berico.da;

public class TPSReport {

 protected boolean hasCoverSheet = false;
 
 protected String author = null;

 public TPSReport(
  boolean hasCoverSheet, String author) {

  this.hasCoverSheet = hasCoverSheet;
  this.author = author;
 }

 public boolean isHasCoverSheet() {
  return hasCoverSheet;
 }

 public String getAuthor() {
  return author;
 }
}

EmailService.java

Service interface describing the functionality of our email service.  This is what we will reference inside the rule file (instead of a concrete implementation!).


package com.berico.da;

public interface EmailService {

 void email(
  String to, String from, String subject, String message);
}

EmailPrinter.java

Instead of sending out an email, we'll print the email we were going to send to the console.


package com.berico.da;

public class EmailPrinter implements EmailService {

 @Override
 public void email(
   String to, String from, String subject, String message) {
  
  System.out.println(
   String.format(
    "To: %s \nFrom: %s \nSubject: %s\n----------------------\n%s", 
    to, from, subject, message));
 }
}

DirectAction.drl

This is a Drools Rule Language file describing the actions to take when objects appear in the knowledge session.  In our case, if we see a TPSReport with no coversheet, we email Bill.


package com.berico.da

global EmailService emailService;

rule "Email Bill Lumbergh when no coversheet"
 when
  tpsReport : TPSReport( hasCoverSheet == false )
 
 then 
   emailService.email(
    "bill@initech.com", 
    "rules@initech.com",
    "No Coversheet!!!!!",
    tpsReport.getAuthor() + 
    " failed to supply a coversheet!");
end

DirectActionApp.java

This is the application that will drive the session.  You will notice we instantiate the EmailService and register it in the knowledge session.  We then create some TPSReport objects and "insert" them into the session.  Finally, we call "fireAllRules" on the session to have the rules evaluated.


package com.berico.da;

import com.berico.BaseApp;

public class DirectActionApp extends BaseApp {

 @Override
 protected String getRuleFile() {
  
  return "DirectAction.drl";
 }

 public DirectActionApp(){
  super();
  
  // Instantiate the email service.
  EmailService emailService = new EmailPrinter();
  
  // Register the service on the session as a global.
  getSession().setGlobal("emailService", emailService);
  
  // Create a report that should not fire rule.
  TPSReport michaelTpsReport 
   = new TPSReport(true, "Michael Bolton");
  
  // Insert object into session.
  getSession().insert(michaelTpsReport);
  
  // Create a report that should fire rule.
  TPSReport peterTpsReport 
   = new TPSReport(false, "Peter Gibbons");
  
  // Insert object into session.
  getSession().insert(peterTpsReport);
  
  // Evaluate the rules against our objects.
  getSession().fireAllRules();
  
  // Dispose the rule session.
  getSession().dispose();
 }
 
 public static void main(String[] args){
  
  new DirectActionApp();
 }
}

On the console, you should see the following message:
To: bill@initech.com 
From: rules@initech.com 
Subject: No Coversheet!!!!!
----------------------
Peter Gibbons failed to supply a coversheet!
There's nothing else to do in this pattern, since the service registered with the Rules Engine directly handles the event.

Rules Engine Patterns - Part 1: How to employ a Rules Engine

While many engineers may understand (at least conceptually) what a Rules Engine is, I feel like Rules Engines are used so infrequently because people don't know how to employ them.  In the next few posts, I'm going to discuss design patterns for using a Rules Engine, as well as, deployment strategies for integrating the technology into your architecture.

In this post, I want to lay the ground for setting up your Java development environment for using a Rules Engine.  I will also introduce some basic features of the Drools API.  We will conclude with a brief discussion about the strategies (patterns) you will use for tying your business model to the rules engine.


General Development Environment


To effectively use the JBoss Rules framework, you should at least have the following tools installed in your Java Development environment.

Setting up a Drools Project


1.  Add the dependencies to JBoss Rules in the Maven pom.xml.
<dependency>
 <groupId>org.drools</groupId>
 <artifactId>drools-core</artifactId>
 <version>5.5.0-SNAPSHOT</version>
</dependency>
<dependency>
 <groupId>org.drools</groupId>
 <artifactId>drools-compiler</artifactId>
 <version>5.5.0-SNAPSHOT</version>
</dependency>


2.  Instantiate a Rules Session (provided a rule file). This class will serve as my base class that the other demonstrations will inherit from to initialize their rule sessions ("knowledge sessions" in Drools terminology).

This implementation will limit you to using one rule file, but abstracts the common scaffolding we will rely on in examples in the next few posts.  There are much better ways to setup a Drools environment, like using Guvnor for a dynamic rules repository and wiring up the session using Spring.  We will discuss some of this in the final (5th) post on the topic.

BaseApp.java
package com.berico;

import org.drools.KnowledgeBase;
import org.drools.builder.KnowledgeBuilder;
import org.drools.builder.KnowledgeBuilderFactory;
import org.drools.builder.ResourceType;
import org.drools.io.Resource;
import org.drools.io.ResourceFactory;
import org.drools.logger.KnowledgeRuntimeLoggerFactory;
import org.drools.runtime.StatefulKnowledgeSession;

/**
 * A simple class that will initialize a Drools
 * knowledge session for deriving classes.  This
 * class forces deriving classes to supply a rule
 * file as a requirement of building the session.
 * 
 * @author Richard Clayton (Berico Technologies)
 */
public abstract class BaseApp 
{
 /**
  * A preinitialized knowledge session derived
  * classes will use to perform their tasks.
  */
 private StatefulKnowledgeSession session = null;
 
 /**
  * Force derived classes to use a getter just in 
  * case we decide to do something special on access
  * in the future.
  * @return initialized knowledge session.
  */
 protected StatefulKnowledgeSession getSession(){
  return this.session;
 }
 
 /**
  * Deriving classes must supply a rule file
  * that will be used by the knowledge builder.
  * @return Name of the rule file
  */
 protected abstract String getRuleFile();

 /**
  * Instantiate the application, initializing
  * the knowledge session.
  */
 public BaseApp(){
  
  initializeSession();
 }
 
 /**
  * Initialize the Rules Engine Session.
  */
 private void initializeSession(){
  
  // The knowledge builder is used to compile rule and
  // workflow (BPM) resources into executable code.
  // If types are declared in the rule file, they
  // will also be compiled as Java classes.
  KnowledgeBuilder kbuilder = 
    KnowledgeBuilderFactory.newKnowledgeBuilder();
  
  // Get the rule file supplied by deriving classes.
  // This file will be pulled from the class path.
  Resource ruleFile = ResourceFactory.newClassPathResource(
    this.getRuleFile());
  
  // Add the rule file to the knowledge builder.
  kbuilder.add(ruleFile, ResourceType.DRL);
  
  // Initialize a knowledge base from the knowledge builder.
  // The knowledge base is a container for the known logic 
  // of the rules engine.
  KnowledgeBase knowledgeBase = kbuilder.newKnowledgeBase();
  
  // Initialize a rules/workflow session from the knowledge
  // base.  This is the construct we will use to insert
  // "facts" in the rules engine, apply/evaluate
  // rules, and then react to the results.
  this.session = knowledgeBase.newStatefulKnowledgeSession();
  
  // Log actions occurring in the session to the console.
  KnowledgeRuntimeLoggerFactory.newConsoleLogger(session);
  
  // Log actions occurring in the session to a file.
  KnowledgeRuntimeLoggerFactory.newFileLogger(session, 
    String.format(
     "rule_session_%s.xml", 
     System.currentTimeMillis()));
 }
}

Using the Scaffold

To use the scaffolding we wrote above, simply extend the class, providing the name a rules file located somewhere on your classpath.

A rules file is simply a text file with rules defined in the Drools Expert syntax.  You can find the Drools Expert syntax documentation here: http://docs.jboss.org/drools/release/5.4.0.Final/drools-expert-docs/html_single/.

Perhaps the easiest way to create the rules file is to use the Drools Eclipse Plugin mentioned above.  The plugin provides syntax highlighting and a diagram of the Rete tree for rule files:



I use the standard Maven project structure, storing rule files in the "src/main/resources" directory.  If you do the same, you should be able to refer to the file using the scaffold by filename.

Example Using the Scaffold


To demonstrate how this works, we need a simple POJO that notionally represents our model.  In this case, I'm going to use the obligatory "User" object.

User.java

package com.berico.scaffold;

public class User {

 private String firstName;
 private String lastName;
 private int age;
 
 public User(
  String firstName, String lastName, int age) {
  
  this.firstName = firstName;
  this.lastName = lastName;
  this.age = age;
 }

 public String getFirstName() {
  return firstName;
 }

 public String getLastName() {
  return lastName;
 }

 public int getAge() {
  return age;
 }
}


ScaffoldExample.drl
package com.berico.scaffold

rule "User seen"
  when
    user: User()
  then
    System.out.println("Hello " + user.getFirstName());
end

ScaffoldExampleApp.java
package com.berico.scaffold;

import com.berico.BaseApp;

public class ScaffoldExampleApp extends BaseApp {

 @Override
 protected String getRuleFile() {
  
  return "ScaffoldExample.drl";
 }

 public ScaffoldExampleApp() {
  super();
  
  User richard = new User("Richard", "Clayton", 31);
  
  getSession().insert(richard);
  
  getSession().fireAllRules();
  
  getSession().dispose();
 }
 
 public static void main(String[] args){
  
  new ScaffoldExampleApp();
 }
}

Understanding the Drools API


The extremely simplistic example above demonstrates the general flow of how the Drools engine is used.  Developers will instantiate or collect (say from a queue, or web service) model objects and insert those objects into a Drools Knowledge Session.  Once the rules have been added to the session, you instruct the engine to fireAllRules (evaluate) its ruleset against the objects.

In the demonstration, we simply print to the console when one of our rules successfully evaluates.  Clearly, we need to be able to do more.  So how would you react to a rule being evaluated, especially outside of the context of the rules engine (getting the results out).

Patterns for Rule Engine Employment


I have found that you typically use one of three patterns when solving a problem with a Rules Engine.  Each pattern has its own benefits and disadvantages, and I will emphasize that there isn't one pattern generally considered better than the others.  More importantly, you might mix and match these patterns depending on how the problem you are trying to solve.

Each pattern is distinguished by how they handle the "end state" of a rules session.  On one extreme, you can choose to not keep any state and simply react to conditions that occur when rules are met (e.g.: email admin on error).  On the other extreme, you can choose to record which rule conditions are met and react to the results of those conditions outside of the rules engine (e.g.: validation scenario).

I will detail three following patterns in the next few posts:
  • Direct-Action
  • Request Adjudication
  • Result Compilation

Conclusion


This admittedly is not a very exciting post, but should serve as the basis for understanding the following posts.  In the next post (which I wrote at the same time of this post!), we will discuss the Direct-Action pattern for utilizing a Rules Engine in your middle tier.