A new debugger for Kettle

We have been aligning Pentaho Data Integration to go into feature freeze in a few weeks when we’ll release 3.0.0-RC1. However, before we do so, I wanted to write a (simple) debugger. It’s important to get at least the API in there so that we can continue to build on top of that in the 3.x update releases.

How does it work? Well, suppose you have a simple transformation like this one:

A simple transformation

We just generate empty rows and add an id from 1 to 1000. Now we want to pause the transformation and see the content of the row where

id=387

Well, that is what we made possible. Simply click on the debug icon in the toolbar:

Debug icon in the toolbar

That will open up the debug dialog:

The new debug window
As you can see, we can specify a condition on which the transformation is paused. We can also specify to keep the last N rows in memory before the condition was met. Pressing OK and launching the transformation in the execution dialog will then show the requested rows:

Previewing rows

As you can see, for your convenience, the order of the rows is reversed. (most recent first) If you try this yourself, you will note in the transformation log tab that the transformation you are debugging is paused. That means that you can now hit the resume button and the transformation will simply continue to run. If a condition is met again, the transformation will be paused again and another preview dialog is presented.

The old-style preview has also been converted to the new pause/resume capabilities.

One interesting observation is that the performance hit while running in debugging or preview mode has been kept very low.  The slowdown obviously depends on the number of conditions and the buffer sizes, but typically I think you will not experience any performance drop at all.
The Pentaho Data Integration development team and I really hope that these new capabilities will shorten your time to hunt down complex transformations.

Until next time,

Matt

4 comments

  • Hello Matt,

    A new crazy and usefull function … as usual ! This will save us from having to save step’s result into text files in order to look for some conditions or event that occured in the data flow.

    No other comment than “thanks you” for this new awaited branch of Kettle. I know the way before freezing a version is a so fascinating period,

    Regards,
    Patrick

  • Pingback: Matt Casters on Data Integration » Kettle 3 RC1

  • Paul

    Hi Matt,

    My understanding is that the debug preview will list entire rows and not necessarily the condition(s) that cause the break/pause.

    Does the debugger in Kettle support watches which can be used to inspect variable when a breakpoint has been reached/triggered?

    Thank you
    Paul

  • Paul,

    Variables are static as far as a single transformation is concerned, I don’t think it makes sense to debug those.
    The debugger will pause a step (plus the transformation) and display the last X rows that passed through the step at the moment your debugging condition was met. The line on top is the row that matches the debugging condition.

    Matt