PDI 3.0 : first milestone available

UPDATE: version 3.0 has been released!

Dear Kettle fan,

While this first milestone release of Kettle version 3 is absolutely NOT YET READY FOR PRODUCTION, it’s a nice way to see the speed of our new architecture for yourself.
Version 3.0 is a complete refactoring of the complete Kettle code base and as such it will take a while for things to settle down again.
That being said, we have a number of tests that tell us this might be a good time to tell the world we’re still very much alive.

As noted above, this release focuses on performance. Version 3.0 was reworked to completely separate data and metadata. This has led to significant performance gains across the board. At the same time we expect all your old transformations to run unchanged. (if not, it’s a bug)

Get your new software fix over here: http://s3.amazonaws.com/kettle3/Kettle-3.0.0-M1.zip

IMPORTANT: from this version on, Kettle required Java 5 or higher to run!

Here are the most important changes versus v2.5:

  1. new architecture : separation of data and metadata significantly increased performance
  2. more customization options: the Spoon GUI is now easier to modify and extend
  3. new plug-in architecture: easier to plug in new steps and job entries (XML, Java 5 annotations, etc)
  4. Support for “Lazy conversion”: delaying data conversions until it’s really needed (sometimes it’s not)
  5. new “CSV Input” to read delimited files faster using NIO, includes support for lazy conversion
  6. new “Fixed Input” step to read fixed width files faster using NIO, includes support for lazy conversion and parallel read. (across both step copies and slaves server)
  7. new partitioning algorithm allowing data to be (re-)partitioned in a clustered environment.

Besides that, the code base also was reviewed and a lot of algorithms where improved.

Please note that 6 steps are not yet ported from version 2.5: Database Join, XBase Input, Add XML, Access Output, Web Service, Formula and the deprecated Aggregate Rows. All these will be available in the next milestone drop or will be available through free plug-in downloads.

Feel free to let us know how this development milestone release works out for you and how it doesn’t.
A lot of people have worked really hard to get here: let us know your success stories!

Again: DO NOT USE THIS FIRST MILESTONE RELEASE IN PRODUCTION!

All the best,

Matt

P.S. As usual: file as many bugs as you like. Thanks in advance!

8 comments

  • Ryu

    Hi,

    When is the release date of PDI 3.0 GA ?

  • Manel

    Hi Matt:

    I’ve imported a repository from version 2.3.1 to the new 3.0.0 to test it.

    It seems that the ExecSQL and Javascript steps don’t import ok because when I tried to open transformations that contains those steps with kettle the next error fires:

    An error occured reading a transformation from the repository

    Unable to load class for step/plugin with id [null
    ].Check if the plugin is available in the plugins subdirectory of the Kettle distribution.

    In the r_step table those steps have the Id_step_type set to -1.

    Are those steps not ported yet????

    Thanks.
    Manel Gimeno

  • Hi Manel,

    Thanks for the feedback, as mentioned above, a few streps still need to be ported.
    They should all be done & tested in a few weeks.

    All the best,

    Matt

  • Almost a year that I have been used kettle in my organization.
    we use kettle 2.3.0.
    because this version is not yet fit to our enterprise environment.
    so, this quarter we decide to buy ETL Enterprise and the decision goes to Informatica and IBM Information Server.

    With the news of PDI 3.0, we are extremely shock. because, all the features we needs are provided in PDI3.0. Let’s say the gzip and Oracle bulk load, awesome.
    The ETL is in progress of POC, but after the news, we think we need to rethink of buying those to :)

    we are very excited and can wait for the GA version.

    I have lot of features to request, but I’ll go to forum.

    su do for all PDI team!!

  • Hi,

    While GA is still some months away, please note that 3.0 is all about ironing out architectural annoyances and performance increases. Feature-wise, most is already in 2.5.0 and the soon to be released 2.5.1 point update.

    HTH,

    Matt

  • Ryu

    >While GA is still some months away, please note that 3.0 is all about ironing out architectural annoyances and >performance increases. Feature-wise, most is already in 2.5.0 and the soon to be released 2.5.1 point update
    What means “Some months”? 3 months, 6 months, 12 months?
    We need make a choice of an open source ETL. Performances are very important for me. Actually, I have try Talend Open Studio ( http://www.talend.com ), cloverETL (http://cloveretl.org/) but I want compare with your PDI 3.0 (because it has got better performance than PDI 2.5).
    I can delayed my choice until the end of the year (but 10 months delayed is not possible).
    Thanks a lot and good luke for PDI 3.0 released ;)
    Ryu

  • We will very likely have a GA version later this year.

  • Ryu

    Thank you for your quick answer! If PDI 3.0 will be available on december or next january, it will be great !