Rolling back transactions

Pentaho Data Integration (Kettle) never was a real transactional database engine, and never pretended to be that. It was designed to handle large data volumes and slam a commit in between every couple of thousand rows to prevent the databases from chocking on the logging problem.

However, more and more people are using Kettle transformations in a transactional way. They want to have the option to roll back any change that happened to a database during the execution of a transformation in case anything goes wrong.

Well, we have been working on that in the past, but never quite got it right… until today actually. As part of bug report 724 I lifted the decision to commit or roll back all databases to the transformation level.

Take for example a look at this transformation:

What happens is that the first 2 steps will always finish execution before a single row hits the Abort step. That means that all rows from the “CSV file input” step will be inserted into the database table before the transformation fails. Well, in the past, even if you enabled “Unique connections”, this would have resulted in those rows to remain in the table.

To test yourself, use revision 6587 in trunk to build yourself or download a nightly build tomorrow.

With a little luck (further tests and then more tests) we can back-port this fix to version 3.0.2 this week, ready for the 3.0.2GA release at the end of next week.

I’m hoping to extend this same principle to jobs as well in the (more distant) future.

Until next time,
Matt

6 comments