Thursday, May 16, 2013

Running my WebApp locally with HSQLDB while my production target is ORACLE

Introduction

In the company I work for, the standard JEE stack is quite heavyweight; commonly Websphere Application Server for the webapp container and ORACLE for the database.

I currently develop on a HP Mobile Workstation laptop running an intel i5 chip with 4 GB under Win7, and my common IDE is Eclipse (currently Indigo flavour) with a bunch of plugins, including m2e (as we work intensively with maven).

The company provides a bundle with Eclipse and JBoss, but I had many recurrent troubles with the wtp-jboss plugin which randomly published only parts of the war content, so I keep on using a small tomcat6 install which does perfectly the job.

But for the database, we used to connect to the single ORACLE development platform.
It has the obvious advantage of being the real target in terms of database, but has many drawbacks, including :

  1. All developers share the same instance, so conflicts and overlaps are frequent
  2. Some uneducated developers run data changes straightly in the database with sqldeveloper for example, without writing it first in a script file that would be stored in our SCM (currently SVN); sharing the same database does not help them to realize that such changes are (or should be) short-lived.
  3. Due to the company size, all servers are hosted on a datacenter which is quite far from our office; it's not a big deal for manual testing, but it makes a big difference when we want to run automated integration tests.
  4. When part of the team works overseas, moreover through VPN, SQLNet lags a lot.
I quickly thought that providing each developer a dedicated instance would greatly help, so I first did it by creating as many ORACLE schemas as needed. It answered #1 and partly #2 (each developer having its own schema, it's easier to get that changes are local, and won't be magically propagated to others).

To answer #3 and #4, I thought the best would be to have a local database, on the PC itself.
I first tried Oracle Express, but soon dropped it. Too heavy, not very reliable...

And I was thinking that for the automated functional tests, an in-memory lightweight database would be perfect. Which led me to HSQLDB.

It is a single jar you can draw from maven in a snap, that is very easy to start and stop in a second from a java launcher in eclipse, as well as starting in-memory from java testing code.

Of course, there are several limitations that could prevent from using effectively HSQLDB in place of ORACLE. Using triggers, pl/sql functions and procedures, materialized views, and many others.

Fortunately my current project can be done so far with only plain tables, PK, FK and sequences, so it was worth a try. (well actually the sequence stuff is strangely and actually differently handled by both HSQLDB and ORACLE, which brings some fun... more on that below).

One last constraint, for various reasons we design the physical database with UML Enterprise Architect, using it to generate the ORACLE scripts. Thus this ORACLE scripts are our "reference script set" for building the schema, from which I intended to setup the HSQLDB instance.

Let's extract the gems from the ORACLE coalmine !

Transposing tables and their PKs and FKs

I first tried to generate HSQLDB compatible DDL scripts from those targeting ORACLE, but it was not reliable.

Then I gave a try to ddlutils.

Though it has some some caveats and seems not actively maintained for years, I succeded to make it inspect our Oracle Database to generate an XML file containing an agnostic definition of all the tables and their PKs and FKs.

I then fed ddlutils with this agnostic definition and it perfectly built up the tables in the HSQLDB local database.

Unfortunately, I failed to make ddlutils get both sequences (seems not supported) and data inside the tables (it failed with a complaint about some ORACLE system tables he can't handle, and I didn't succeed to exclude them from its processing).

Transposing sequences

The simpliest to handle the sequences was actually to use the genuine Oracle SQL sequence creation scripts upon HSQLDB.
I just had to remove a few oracle specific sequence parameters (that were currently not necessary for the application).
It gave such sequence creation scripts that work on both ORACLE and HSQLDB :
create sequence MYSEQ increment by 1 start with 1 minvalue 1;

Transposing data

To cope with the data, I switched to dbUnit, and succeeded to make it inspect our Oracle Database to generate an XML file containing an agnostic definition of all the data we needed.

Show me the code, dude

Okay let's stop teasing ;)

First of all, here are the dependencies I put in my pom.xml to draw HSQLDB, ddlutils and dbUnit :

<dependency>
  <groupId>org.hsqldb</groupId>
  <artifactId>hsqldb</artifactId>
  <version>2.2.9</version>
</dependency>

<dependency>
  <groupId>org.hsqldb</groupId>
  <artifactId>sqltool</artifactId>
  <version>2.2.9</version>
</dependency>

<dependency>
  <groupId>org.apache.ddlutils</groupId>
  <artifactId>ddlutils</artifactId>
  <version>1.0</version>
</dependency>

<dependency>
  <groupId>org.dbunit</groupId>
  <artifactId>dbunit</artifactId>
  <version>2.4.9</version>
</dependency>
As both ddlutils and dbUnit provide several ant tasks, it came naturally to me to write the oracle to HSQLDB "translator" with ant.

As we work with Maven, I could transmit the proper classpath to ant so declaring the tasks was done without relying on ugly hardcoded paths.

In my pom.xml I added the following profile :

<profile>
  <id>antrun</id>
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-antrun-plugin</artifactId>
        <executions>
          <execution>
            <id>antrun</id>
            <phase>validate</phase>
            <configuration>
              <target>
                <property name="classpath" refid="maven.compile.classpath" />
                <ant antfile="${basedir}/shared/${scriptName}.xml" target="${targetName}"/>
              </target>
            </configuration>
            <goals>
              <goal>run</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
</profile>
So I could easily run any ant script target with a simple maven commandline, like :
mvn.bat validate -Pantrun -DscriptName=YYY -DtargetName=XXX
Note how I passed the compile classpath computed by maven, as a property surprisingly named "classpath"...

Then the ant script was pretty straightforward :

<?xml version="1.0" encoding="UTF-8"?>
<project name="myProject-dao" default="ora2local" basedir=".">

  <property name="local.driver" value="org.hsqldb.jdbc.JDBCDriver" />
  <property name="local.url" value="jdbc:hsqldb:hsql://localhost/myLocalDB" />
  <property name="local.user" value="SA" />
  <property name="local.pass" value="" />

  <property name="ora.driver" value="oracle.jdbc.driver.OracleDriver" />
  <property name="ora.url" value="jdbc:oracle:thin:@...:1521:..." />
  <property name="ora.user" value="..." />
  <property name="ora.pass" value="..." />
  <property name="modelName" value="..." />
[...]
  <taskdef name="databaseToDdl" classpath="${classpath}"
           classname="org.apache.ddlutils.task.DatabaseToDdlTask" />
  <taskdef name="ddlToDatabase" classpath="${classpath}"
           classname="org.apache.ddlutils.task.DdlToDatabaseTask" />
  <taskdef name="dbunit" classpath="${classpath}"
           classname="org.dbunit.ant.DbUnitTask" />
[...]
  <target name="ora2xmlCommon">
    <echo message="ora.user=${ora.user}" />
    <echo message="===== Dumping Oracle schema for ${ora.user}" />
    <databaseToDdl modelname="${modelName}" schemapattern="${modelName}" verbosity="DEBUG">
      <database driverClassName="${ora.driver}" url="${ora.url}"
                 username="${ora.user}" password="${ora.pass}" />
      <writeschematofile outputfile="src/main/resources/ddlutils/schema.xml" />
    </databaseToDdl>

    <echo message="===== Dumping Oracle data" />
    <dbunit driver="${ora.driver}" url="${ora.url}"
             userid="${ora.user}" password="${ora.pass}" schema="${modelName}">
      <export dest="src/main/resources/ddlutils/data.xml" />
    </dbunit>
  </target>
At this stage, we have 2 XML files, respectively schema.xml for an agnostic representation of the tables, PKs and FKs of the schema, and data.xml for an agnostic representation of their data. Now let's have a look on the other side of the job, injecting all this stuff in our HSQLDB instance.

Let's put the gems on the HSQLDB ring !

First of all, we may already have stuff in our current HSQLDB instance, so let's drop it :
<target name="xml2local">
  <echo message="===== Dropping Local Schema" />
  <sql driver="${local.driver}" url="${local.url}"
       userid="${local.user}" password="${local.pass}"
       onerror="continue" print="true" classpath="${classpath}"
       showheaders="false" showtrailers="false">
    DROP SCHEMA PUBLIC CASCADE
  </sql>
[...]
Then we build up the schema :
  [...]
  <echo message="===== Injecting Oracle Schema in Local database" />
  <ddlToDatabase verbosity="ERROR">
    <database driverClassName="${local.driver}" url="${local.url}"
              username="${local.user}" password="${local.pass}" />
    <fileset dir="src/main/resources/ddlutils">
      <include name="schema.xml" />
    </fileset>
    <writeschematodatabase alterdatabase="true" failonerror="false" />
  </ddlToDatabase>
  [...]
Afterwards we manually create the sequences :
  <echo message="===== Injecting Sequences in Local database" />
  <sql driver="${local.driver}" url="${local.url}"
       userid="${local.user}" password="${local.pass}"
       onerror="continue" print="true" classpath="${classpath}"
       showheaders="false" showtrailers="false">
    <transaction src="${sequencesPath}/mySeq.sql" />
And we finish with the data :
  <echo message="===== Injecting data in local database" />
  <dbunit driver="${local.driver}" url="${local.url}"
          userid="${local.user}" password="${local.pass}">
    <operation type="INSERT" src="src/main/resources/ddlutils/data.xml" />
  </dbunit>
Unfortunately, the last step failed miserably on integrity violation... WHAT THE HECK ??

OMG... currently one of my tables had a FK on itself, implementing a tree of nodes...
Which means that some records in the table depend on the existence of other records of the same table...
Requiring to insert the records in the "appropriate" order...
Which is quite impossible to explain to ddlutil.

If we can't comply the constraints, let's remove them ! (temporarly of course :)
Before inserting tasks, we just disable all the constraints, and afterwards, we enable them back, and that's it.

Moreover it's very straightforward with HSQLDB :

  <sql driver="${local.driver}" url="${local.url}"
       userid="${local.user}" password="${local.pass}"
       onerror="continue" print="true" classpath="${classpath}"
       showheaders="false" showtrailers="false">
    SET DATABASE REFERENTIAL INTEGRITY FALSE
  </sql>
and
  <sql driver="${local.driver}" url="${local.url}"
       userid="${local.user}" password="${local.pass}"
       onerror="continue" print="true" classpath="${classpath}"
       showheaders="false" showtrailers="false">
    SET DATABASE REFERENTIAL INTEGRITY TRUE
  </sql>
At this stage, I was able to start my webapp locally, targeting my local HSQLDB, and everything went smooth in my first tests... until I tried to create a new record. BANG, constraint violation. WHAT THE HECK (again) ??

You got an idea ? no ?

Ok, here's a hint :
When data was inserted in ORACLE, the sequences on which some PKs rely got incremented, right ?

You've got it :
The sequences in HSQLDB should be set with the same values as those in ORACLE, but the above process do not handle that.

I could have get the value of each from ORACLE, but as I knew which PK was bound to which sequence, I found better to compute the appropriate value out of the actual dataset, as demonstrates the following snippet (thanks to http://stackoverflow.com/questions/1062887/how-can-i-read-output-of-an-sql-query-into-an-ant-property which helped me a lot) :

  <sql driver="${local.driver}" url="${local.url}"
       userid="${local.user}" password="${local.pass}"
       onerror="continue" print="yes" classpath="${classpath}"
       showheaders="false" showtrailers="false"
       output="temp.properties">
    select 'mySeq=' || (max(id_ofMyTable)+1) from myTable;
  </sql>
  <property file="temp.properties" />
  <delete file="temp.properties" />
  <echo message="got mySeq=${mySeq}" />
  <sql driver="${local.driver}" url="${local.url}"
       userid="${local.user}" password="${local.pass}"
       onerror="continue" print="yes" classpath="${classpath}"
       showheaders="false" showtrailers="false"
       expandProperties="true">
    alter sequence mySeq restart with ${mySeq};
  </sql>
</target>  
This time we're really done. Have fun !

Sunday, May 5, 2013

Windows 7 free backup/restore solutions, continued

In my previous post I shortlisted Redo Backup and Macrium Reflect Free, but before experiencing them, I had to do some serious cleaning on my C partition.

Step 1 : basic cleanup

First I grabbed the most recent version of BleachBit, an open-source tool to clean most of the junk (temp files and such) on a PC.
Got back about 5 GB.

Step 2 : scanning biggest folders for further investigation

Then I took TreeSize Free to find out the biggest folders, and discovered that there were some serious junk too from two sources : iTunes and The Sims (my two daughters of 12 and 14 are complete fans of this game and I often have to battle with them to get hands on the family desktop :)).
After some googling I found out which subfolders could be quietly deleted for both.
Got back about 7 more GB.

There was a third huge folder, filled up with recent, and others not so recent, software and various stuff downloaded from the internet.
After sorting out what could be dropped and what was worth keeping (that moved to my D partition), I got back another 10 more GB.

Step 3 : Disk Defragmentation and optimization

I first used the Windows 7 defragmentation tool, before trying to shrink my C partition from its current 580 GB (filled only up to 90 GB) to what sounds more appropriate to me, that is to say about 200 GB.
I wasn't delighted when Windows 7 partition reducer offered me to shrink down to... 380 GB.

So I took UltraDefrag, which now is able to hook during boot, to see whether I can get better results.
I enabled the boot time scan feature (shortcut F11), editing the script (shortcut F12) to make it quick optimize the disk :
udefrag -q %SystemDrive%
And rebooted my desktop.

Indeed the command-line udefrag started during the boot and did its work perfectly.
After Windows 7 reopened, I retried the partition reducer, which this time offered me to shrink down to 168 GB. Agreed !

I reopened UltraDefrag to disable the boot time scan feature, so next boots won't be slowed down by useless repeated optimizations.

Next step, trying the backup itself with Redo Backup, and searching for an old disk drive that can be used as target of a restoration... stay tuned !