There is a saying that necessity is the mother of invention. Such necessity happened last week and forced me to try out the Groovy language.

The trigger was need for creating good data set for testing changes in a full text search. I had to locate few hundred of obsolete technical documents to be used as test data, dump the metadata as well as the BLOB data to disc, create INSERT sql scripts and loader that would from Ant pre-load the database with the test set and insert the BLOB's. It is fairly simple task, the issue were boundary conditions:

Database in question is Oracle. I have no OCI8 installed on my MBP and refuse to install Oracle on OS-X - it was little fun getting it up and running on Linux. Usually I run Oracle in Windows or Linux VM and with lack of OCI8, only option how to communicate with it is Level 4 JDBC driver.

Because of missing OCI8, Ruby was out of question (the gems for Oracle access needs OCI). So was PL/SQL, for the reason of administator priviledges. I am not exactly a PL/SQL programmer, but know enough to write stored proc that loads or unloads BLOB, assumed that UTL_FILE and DBMS_LOB packages are available and accessible. Which was not the case.

Before falling to default and writing it all in Java (which would work, but take certainly longer than I was willing to invest), I decided try out this Groovy thing :-). My exposure to the language was minimal - I never programmed in it (not counting 2 hours spent browsing Ruby in Action), but I hoped that similarity with Java and almighty Google will help me out ...

Connecting to database in Groovy is simple and elegant:

[sourcecode language='java']

import groovy.sql.Sql

db = Sql.newInstance(
'jdbc:oracle:thin:@myoraclehost:1521:XE',
'USER', 'PWD', 'oracle.jdbc.driver.OracleDriver')

def sql = """
select d1.document_id, d1.version, d1.filename from document d1
where d1.document_id in ...

.... REST OF SQL DELETED ....
"""
list = db.rows(sql)

[/sourcecode]

The result is hash-like structure that can be used to drive both generating the DELETE and INSERT SQL statements into text file as well as retrieving the BLOB's and saving them to HDD:

[sourcecode language='java']

def writeBlobToFile(db, docid, version, filename, reallyWrite=false) {
def sqlString = "SELECT d.DOCUMENT_BODY FROM document d where d.document_id = '$docid' and version = $version"
def row = db.firstRow(sqlString)
def blob = (oracle.sql.BLOB)row[0]
def byte_stream = blob.getBinaryStream()
if( byte_stream == null ) { println "Error for ${docid} : ${version}" }

int total = blob.length();

// Write to a file
if (reallyWrite) {
byte[] byte_array = new byte[total]
int bytes_read = byte_stream.read(byte_array)
def fullname = "/Users/miro/tmp/BLOBS/$filename"
def fos= new FileOutputStream(fullname)
fos.write(byte_array);
fos.close()
}
println "Document $docid:$version, file: $filename, size $total"
return total
}

[/sourcecode]

Loading the documents and metadata from disk to database is done by Ant script

[sourcecode language='xml']

[/sourcecode]

The Groovy task must be defined before used (with proper classpath). The groovy
task shows passing arguments and executing script. The most important part of
the script (loading binary file and inserting the BLOB) is here:

[sourcecode language='java']

// each line in METADATA.TXT looks like this:
reg = /document_id: \'(\w+)\', version: (\d+), filename: \'(.+)\', size: (\d+)/

db = Sql.newInstance(
'jdbc:oracle:thin:@myoraclehost:1521:XE',
'USER', 'PWD', 'oracle.jdbc.driver.OracleDriver')
counter = 0
new File(args[1]).eachLine { line ->
line.eachMatch(reg) { match ->
try {
str = "${args[0]}/${match[3]}"
println "Opening file: ${str}"
FileInputStream fis = new FileInputStream(str)
int size = fis.available()
byte[] data = new byte[size]
fis.read(data)

// get the BLOB
row = db.firstRow("select document_body from document where document_id = ? and version = ? for update",
[match[1], match[2]])
my_blob = (oracle.sql.BLOB)row[0]
if( my_blob == null ) println "my_blob is null!"
outstream = my_blob.getBinaryOutputStream();
outstream.write(data);
outstream.close()
fis.close()
counter = counter+1
} catch (Exception e) {
print "Exception: ${e}\n"
}
}
println "Processed files: ${counter}"
[/sourcecode]

The loader is controlled by the file METADATA.TXT that contains information linking document metadata in database (inserted by SQL statement) with file on disk. This indirect way allows easily "inject" document body with required search phrases, reload database and run tests.

The whole experience was quite pleasant and considering how little I knew about Groovy, it took very little time to create something useful. I wish I could have done some thing in Ruby (which I still like better) - but I was amazed how powerful the combination of Groovy + Ant can be.

The most valuable feature is no need to add anything really new or exotic to Java environment - all you need is Jar file and Eclipse plugin, no new/different library packaging scheme - Jars are still jars, not gems.

The only minor hickup and weird thing I found was passing arguments between Ant and Groovy script. It looks like that the groovy script should be a script and not use classes + static main method (which would be natural instinct for Java developer). Thanks to Christopher Judd for the hint.