Clojure Command Line Applications
written on 03 Feb 2012

I originally wrote this post some time in 2011, but have recently updated it to reflect current versions etc.

Creating command line applications in Clojure is easy. To illustrate this, I will create a simple application that will read a known log file format from STDIN and display a variety of information from it in a tabular form to STDOUT.

All the code for this example can be found on github.

Project Setup

I will be using Leiningen to generate the project and handle dependencies. See the github pages for installation instructions and general usage information.

Create a new project named logreader:

$ lein new logreader
$ cd !$

Using your favourite editor (emacs), edit your project.clj file to look like this:

(defproject logreader "0.0.1-SNAPSHOT"
  :description "Simple log reader"
  :dependencies [[org.clojure/clojure   "1.3.0"]
                 [org.clojure/tools.cli "0.2.1"]
                 [cheshire              "2.1.0"]
                 [clj-time              "0.3.5"]]
  :main logreader.main)

We have added dependencies on tools.cli, cheshire and clj-time as we will be needing those libraries. tools.cli provides command line argument support, cheshire is a JSON parser/generator, and cli-time is a wrapper around the joda datetime library in Java. We have also added a main options which tells Clojure which namespace to use as the entry point to the application.

Next get the dependencies downloaded to your lib folder:

$ lein deps

After maven has downloaded the internet for you, we are ready to go. Lets take a look a some example lines from our fictitious log file:

2010-12-20T14:00:00,000 [main] INFO - {"foos":1,"bars":10,"widgets":5,"whosits":3 }
2010-12-20T14:05:00,000 [main] INFO - {"foos":2,"bars":20,"widgets":10,"whosits":6 }
2010-12-20T14:10:00,000 [main] INFO - {"foos":3,"bars":30,"widgets":15,"whosits":9 }
2010-12-20T14:15:00,000 [main] INFO - {"foos":4,"bars":40,"widgets":20,"whosits":12 }
... etc

Looks like a pretty standard style log output, but we are logging some structured data (JSON) as the messages. Good for us. Lets imagine this is a log of statistics of interesting numbers for a running server. And you’re going to have to imagine, because I gave them all stupid names.

I used the following ruby script to generate a bunch of random log messages to play with, saved to a file called generate-log.rb:

#!/usr/bin/env ruby                                                                                            
require 'time'

time = Time.local(,,
foos, bars, widgets, whosits = 1, 10, 5, 3
(60 * 60 * 24).times do |i|
  puts "#{time.strftime("%Y-%m-%dT%H:%M:%S")} [main] INFO - {\"foos\":#{foos},\"bars\":#{bars},\"widgets\":#{wi\
dgets},\"whosits\":#{whosits} }"
  time, foos, bars, widgets, whosits = time + 1, foos + 1, bars + 10, widgets + 5, whosits + 3

The script generates a line for every second from midnight today to midnight tomorrow and writes them to STDOUT. You can either pre-generate a log file:

$ ruby generate-log.rb > example.log

Or just pipe the output to our Clojure application every time.

Back to the application, We want to run a command specifying which statistic(s) we want to see (between a start and end time) as a comma delimited list and have that outputted to STDOUT in a tabular format, with the date time portion in the first column. Something like:

$ ruby generate-log.rb | some_command --statistics foos,bars --start 2012-02-05T14:00:00 --end 2012-02-05T14:20:00 
2012-02-05T14:00:00 1 10
2012-02-05T14:05:00 2 20
2012-02-05T14:10:00 3 30
2012-02-05T14:15:00 4 40
... etc

Leaving off --start and --end should default to a time range matching all of the current day. The --statistics flag should be required.

As the functionality is pretty trivial, I’m going to implement it all in one go (you will need to rename the core.clj file leiningen generated to main.clj):

(ns logreader.main
  (:use [ :only (cli)])
  (:require [cheshire.core   :as json]
            [clj-time.core   :as time]
            [clj-time.format :as time-format])

(def dhms (time-format/formatters :date-hour-minute-second))

(defn parse-line
  (let [[_ t json] (re-find #"(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}).*?(\{.*)" line)]
    [(time-format/parse dhms t) (json/decode json true)]))

(defn in-range?
  [line options]
  (let [[t _]  line
        {:keys [start end]} options]
    (time/within? (time/interval start end) t)))

(defn statistics-for
  [line options]
  (let [[t json]   line
        statistics (options :statistics)]
    (apply vector (time-format/unparse dhms t) (map json statistics))))

(defn all-lines
   (when-let [line (read-line)]
     (cons line (all-lines)))))

(defn -main
  [& args]
  (let [[opts extra banner]
  (cli args
             ["-h" "--help" "Halp" :flag true :default false]
             ["--statistics" "The statistics to display" :parse-fn #(map keyword (.split (or % "") ","))]
             ["--start" "Time to start capturing" :default (time/today-at-midnight)
                                                  :parse-fn #(time-format/parse dhms %)]
             ["--end" "Time to end capturing" :default (time/plus (time/today-at-midnight)
                                                                  (time/days 1))
                                              :parse-fn #(time-format/parse dhms %)])]
    (when (:help opts)
      (println banner)
      (System/exit 0))
    (when-not (:statistics opts)
      (println "Statistics is a required argument")
      (println banner)
      (System/exit 1))
    (let [stats (->> (all-lines)
                     (map parse-line)
                     (filter #(in-range? % opts))
                     (map #(statistics-for % opts)))]
      (doseq [s stats]
        (apply println s)))))

Hopefully the code itself is fairly straightforward. We are turning STDIN into a lazy-seq of lines, parsing each of them into a pair of the time and the json, filtering out any lines not in our range, and then pulling out the statistics that have been requested. The cheshire library takes care of the JSON conversion for us. The clj-time library takes care of the date and time comparisons, and the tools.cli library takes care of parsing the command line arguments and generating the help messages for us.

In order to run this code we need to compile it into a JAR file (we could also use the lein-run plugin to avoid having to do this step). Note the :gen-class inside of the ns declaration. This tells Clojure to go ahead and geneate a class file for this namespace which we need to do to run the JAR.

Compile the app:

$ lein jar

At this point you should have a SNAPSHOT jar in the project root, and be able to do:

$ java -cp lib/*:logreader-0.0.1-SNAPSHOT.jar logreader.main -h

 Switches               Default                   Desc                      
 --------               -------                   ----                      
 -h, --no-help, --help  false                     Halp                      
 --statistics                                     The statistics to display 
 --start                2012-02-05T00:00:00.000Z  Time to start capturing   
 --end                  2012-02-06T00:00:00.000Z  Time to end capturing 

You get this help documentation for free with tools.cli.

You can go ahead and give the app a try:

$ ruby generate-log.rb | java -cp lib/*:logreader-0.0.1-SNAPSHOT.jar logreader.main --statistics foos,whosits

By treating the lines on STDIN as a lazy-seq, we ensure that we can process a log file of any size without running out of heap, by addressing each line in turn (rather than loading the whole file into memory and processing it).

That’s it - a straightforward Clojure command line application reading from STDIN, writing to STDOUT, with a simple command line interface and help documentation generated for you. It doesn’t do much, but it illustrates the point quite nicely.

As noted at the beginning of the post, you can find the source code for the application here

blog comments powered by Disqus