July 3rd 2008 04:58 pm
Kettle / PDI
During the last couple of years, Pentaho Data Integration (PDI) a.k.a. Kettle has become one of the leading ETL tools. Here are a few useful or memorable links to things I wrote on my blog about Kettle…
- Getting started with Kettle : The birth of the Getting started wiki page for PDI
- Making the case for Kettle : on why & how in Kettle, ELT vs ETL, Freedom
- Key-value madness : overly opinionated piece about how to handle key-value pairs
- Simpler reporting : make your data richer : on solving complex reporting problems with ETL
- Just say no… : … to XML!
- A spoon to mix it all up : Spoon 2.4.0 flash demo & the vanishing act of Chef
- Going virtual : introducing Apache VFS file handling in Kettle
- Handling errors : introducing step error handling
- A nice chat : Q&A chat log dump about how Kettle works internally
- Mind the gap : a rant about requirements & data warehousing
- Data vs Metadata : Kettle 3.0 : explaining why we did the version 3 redesign
- 10,000 posts : reached on June 28th 2007
- Kettle 2.5.0 installer : a Windows installer was born
- Being lazy : on the new lazy conversion system in Kettle for faster text file handling
- Clustering & partitioning : explaining the basics
- A new debugger for Kettle : making examining tricky situations a bit easier
- Back to basics : (improvised) flash demo on how to transfer data in a text file to a database
- Kettle at Talend : reporting on the enjoyable visit to Talend HQ in Paris
- Test case : fast parallel flat file reading : detailed analyses of parallel text file reading performance
- Opinion: commercial BI : my opinion on how the BI market will evolve in the future
- About i18n : how to translate Kettle with our easy-to-use GUI
- Announcing Pentaho Data Integration 3.0.0 GA : a big milestone for us
- The problem with major releases : a rant about versions & releases in open source
- Rolling back transactions: on how to make a transformation respect database transactions on failure and success.
- EC2 : scaling large files with S3 and Kettle
- Revamping Spoon: first results of the redesign, the UI engineers finally decided to get involved.
- 5000 forum threads : and 22,000 posts reached on May 26th 2008
Here are a few additional interesting links:
- The Kettle homepage (Downloads, documentation, road map, etc)
- The Kettle wiki pages
- The Kettle JIRA bug & requests tracker
- My Kettle lightning talk at FOSDEM 2008, the presentation (PDF) and the video (OGG)
Here are a number of things I found interesting lately:
- ETL-tools.info explaining various ETL concepts, with sample transformations
- A review of Kettle 3.0 by an old friend
If you have other interesting Kettle/PDI related links, feel free to comment
12 Comments »




yxskkk on 15 Jul 2008 at 2:45 #
Hi,matt,First,thanks to reply my thread on forums.pentaho.org.
I want to talk to you.Can I?
I’m chinese,want to lrean kettle from you ^-^
Open Source Metrics and Benchmarks « Gobán Saor on 30 Oct 2008 at 14:32 #
[…] expected each tool has their own strengths and weaknesses, but one thing stands out, the venerable Kettle ETL aka PDI 3.0 is now a serious contender for handling very large datasets. Obviously all the work […]
Virendra Rathore on 25 Dec 2008 at 16:11 #
Hello,
I’ve been using Kettle 2.2 till now. Also, there were plugins developed by others.
I’ve downloaded PDI 3.0.4. But, most of the existing transforms are not working in PDI 3.
Is there a way to use/access those plugins in PDI 3.0.4 ?
How to migrate those plugins into PDI3 ?
Please help, thnx.
Virendra
Feris Thia on 09 Jan 2009 at 8:10 #
Hi Matt,
I have also setup Kettle wiki for Pentaho community in Indonesia. You can check it at http://pentaho.phi-integration.com/kettle.
Regards,
Feris
bambam on 06 Feb 2009 at 8:17 #
hi matt,
been trying to access our proprietary DB with kettle, except that we can’t find the supported DB in the list (http://www.kjube.be/tnenopxe/index.php?section=69). our vendor told us they are using PICK database except that we don’t have any idea about it. are you familiar what DB is being used by tigerlogic corporation? their website is (http://rainingdata.com/)
thanks for having this useful website.
bambam
Matt Casters on 06 Feb 2009 at 9:37 #
Hi Bambam,
Unfortunately I’ve never heard about that database. That doesn’t mean much though. There are new databases popping up all the time, usually PostgreSQL, MySQL, etc clones but others as well.
Ask your vendor for a JDBC or ODBC driver and we can work together to create the driver in Kettle. File a feature request at http://jira.pentaho.org/browse/PDI
Matt
YeXiangJie on 28 Oct 2011 at 10:05 #
hi matt,
Some of the kettle I convert smth into Chinese work , but do not know how to submit this infomation to your organization’s projects. Can you tell me how to do it?
Matt Casters on 28 Oct 2011 at 13:56 #
Hi YeXiangJie,
Create a JIRA case describing the improvements and contributions you did and we’ll make sure it finds the right place.
http://jira.pentaho.com
Thanks in advance,
Matt
Joe-1 on 09 Nov 2011 at 8:11 #
Hello Guys,
Can anyone tell me where can i find release notes of pdi 4.2 stable ? Thanks in advance
Matt Casters on 16 Nov 2011 at 13:37 #
The release notes are in JIRA:
http://jira.pentaho.com/secure/IssueNavigator.jspa?reset=true&mode=hide&jqlQuery=project+%3D+PDI+AND+fixVersion+%3D+%224.2.0+GA+%284.0.0+GA+Suite+Release%29%22
YeXiangJie on 21 Nov 2011 at 8:31 #
hi matt,
our project needs the Kettle,and i integrated it in our project.i use the next math to run the Job,but i don’t know how to stop the Job.you cen tell me how to stop this Job?
this is my run the Job src:
//???
EnvUtil.environmentInit();
JobEntryLoader.init();
StepLoader.init();
//??
LogWriter log=LogWriter.getInstance(”KettleTest.log”, true, LogWriter.LOG_LEVEL_DETAILED);
//??
UserInfo userInfo=new UserInfo();
userInfo.setLogin(”admin”);
userInfo.setName(”admin”);
DatabaseMeta connection=new DatabaseMeta(”10.207.6.109-sspa”, “Oracle”, “Native”, “10.207.6.109″, “orcl”, “1521″, “sspa”, “sspa”);
//??????
RepositoryMeta repinfo=new RepositoryMeta();
repinfo.setConnection(connection);
//???
Repository rep=new Repository(log, repinfo, userInfo);
//?????
rep.connect(”");
//???????
RepositoryDirectory dir=new RepositoryDirectory(rep);
//??????
StepLoader steploader=StepLoader.getInstance();
//Job???
JobMeta jobMeta=new JobMeta(log, rep,”ceshiJob”, dir);
//Job
Job job=new Job(log, steploader, rep, jobMeta);
//??Job
//job.execute();
job.run();
//??Job????
job.waitUntilFinished();
YeXiangJie on 23 Nov 2011 at 9:54 #
hi matt,
i don’t know how to stop kettle’s jobs use the kettle api ,could you tell me how to do this.