January 30th 2012 04:57 pm
Big Kettle News
Dear Kettle fans,
Today I’m really excited to be able to announce a few really important changes to the Pentaho Data Integration landscape. To me, the changes that are being announced today compare favorably to reaching Kettle version 1.0 some 9 years ago, or reaching version 2.0 with plugin support or even open sourcing Kettle itself…
First of all…
Pentaho is again open sourcing an important piece of software. Today we’re bringing all big data related software to you as open source software. This includes all currently available capabilities to access HDFS, MongoDB, Cassandra, HBase, the specific VFS drivers we created as well as the ability to execute work inside of Hadoop (MapReduce), Amazon EMR, Pig and so on.
This is important to you because it means that you can now use Kettle to integrate a multitude of technologies, ranging from files over relational databases to big data and NoSQL. You can do this in other words without writing any code. Take a look at how easy it is to program for Hadoop MapReduce:
In other words, this part of the big news of today allows you to use the best tool for the job, whatever that tool is. You can now combine the large set of steps and job entries with all the available data sources and use that to integrate everything. Especially for Hadoop the time it takes to implement a MapReduce job is really small taking the sting out of costly and long training and testing cycles.
But that’s not all…
Pentaho Data Integration as well as the new big data plugins are now available under the Apache License 2.0. This means that it’s now very easy to integrate Kettle or the plugins in 3rd party software. In fact, for Hadoop, all major distributions are already supported including: Amazon Elastic MapReduce, Apache Hadoop, Cloudera’s Distribution including Apache Hadoop (CDH), Cloudera Enterprise, EMC Greenplum HD, HortonWorks Data Platform powered by Apache Hadoop, and MapR’s M3 Free and M5 Edition.
The change of Kettle from LGPL to Apache License 2.0 was broadly supported by our community and acts as an open invitation for other projects (and companies) to integrate Kettle. I hope that more NoSQL, Big Data and Big Search communities will reach out to us to work together to even broaden our portfolio. The way I see it, the Kettle community just got a whole lot bigger!
Where are the goodies?
The main landing page for the Big Data community is placed on our wiki to emphasize our intention to closely work with the various communities to make Pentaho Big Data a success. You can find all information over there, including a set of videos, PDI 4.3.0 preview download (including Big Data plugins), Hadoop installation instructions, PRD configuration information and much more.
Thanks for your time reading this and thanks for using Pentaho software!
Matt
6 Comments »



Jens Bleuel about Kettle aka Pentaho Data Integration (PDI) & Pentaho BI » Blog Archive » Pentaho Kettle for Big Data on 30 Jan 2012 at 17:06 #
[…] Download, how-to docs, videos and more at http://community.pentaho.com/BigData, see also Matt’s blog at http://www.ibridge.be/?p=207 […]
Big Kettle News « Pentaho Business Analytics Blog on 30 Jan 2012 at 21:26 #
[…] blog was originally appeared on matt casters on data integration Share this:ShareTwitterFacebookDiggStumbleUponRedditEmailLike this:LikeBe the first to like this […]
Sean on 31 Jan 2012 at 23:33 #
This is great news! I think PDI is going in the right direction. From last year, PDI has been my go-to ETL software. Before that I used Talend which is going in the opposite direction. I don’t even have it installed anymore.
Thank you for bringing these features into open source.
Sean
Vishwesh on 01 Feb 2012 at 3:43 #
Great News. This will definitely help business people to evaluate the Kettle’s capabilities with Hadoop/HDFS.
As 1 month free subscription is not enough to evaluate Kettle with Big Data capabilities. This will also help to increase performance of Kettle with Big Data capabilities.
Thank you.
Vishwesh
Fabrice on 02 Feb 2012 at 16:00 #
Talend is going in the opposite direction?
That’s not true, Talend never close any open source feature and continue to rapidly extend capabilities of all Open sources offerings.
Some Talend solutions are already available under an Apache license (ESB for example), and we’re still far from the end!
Stay tuned…
Btw, congrats Matt!
Fabrice
Kettle goes Big Data | techscouting through the java news on 02 Feb 2012 at 16:52 #
[…] Casters, the lead developer of the open source data integration tool Kettle announced that Pentaho is going to open source all Kettle plugins related to big data today. You can now […]