About Matt

I have been an independent BI consultant for many years and implemented numerous data warehouses and BI solutions for large companies. For the last 10 years I have been very busy writing an ETL tool called Kettle. This tool was open sourced in December 2005 and acquired by Pentaho Open Source BI early in 2006. As such I’m now Chief Data Integration for Pentaho mainly doing lead development for Kettle a.k.a. Pentaho Data Integration.

In our garden

 

93 comments

  • Alain.Debecker

    Hello Matt,
    I just did want to say hello. And thank you for the good time we spend in Berlin.

    And then… I find this blog.
    Just tell me. What kind of soft do you use ? I want the same.

    AlainD

  • It’s called WordPress, there is a link directly to your right.

  • Hi Matt.

    Thanks for the info on my blog. I got the opportunity to see some of the Pentaho stack at ODTUG this week. Very impressive. From the Oracle BI stack, I am most familiar with OWB so I will be starting with Kettle. The interface actually looks a lot like OWB but I guess there are only so many ways to visualize a transformation or mapping. ;-)

    Thanks,

    LewisC

  • Hi Lewis,
    To a certain extent, Kettle was written out of frustration with OWB. As such I tried to do exactly the opposite of what OWB was doing :-)

    Be careful, you might even like Kettle. You wouldn’t be the first ACE that defects…

    All the best,

    Matt

  • Hey Matt, you’re a legend!

    Just getting started on Kettle, and it’s really, really good.

    Well done, all your hard work is well and truly appreciated.

  • Hi Matt,
    Congratulations. Kettle is a great tool for ETL and it is very easy to use and getting things done without too much learning curve. Thank you. Keep up with the good work.
    Regards,
    Ali Akkas
    Oxford, UK.

  • Gerson Reis

    Hi Matt,
    I am Brazilian and newer in kettle and i’m tryng to create a transformation in spoon, export to xml and run this transformation in a java class, but I’m not getting success. I don’t know if this can be done. and if can how to do.
    I have read about this in the internet and pentaho forun but it’s not clear for me I need something more explained, like step by step.

    I have much dificult to find help about the kettle java api on the internet. if you can incate something.

    Thanks a Lot
    Gerson Luiz dos Reis

  • Hi Gerson,

    I understand the temptation to ask questions on this blog, but it would be much better if you could turn to our forum for this: http://forums.pentaho.org/forumdisplay.php?f=69

    Thank you for your understanding!

    Matt

  • Gerson Reis

    I’m Sorry for this post it was not so happy for me hehehe

    I tried to post in the forum but i can’t in the that time.
    I will post ther this question Ok.

    Thans and Sorry Again.

    Gerson Luiz dos Reis

  • Edward Gibbons

    Thanks Matt,

    Your Kettle Integration software is truly spectacular. I was attempting to learn OWB until I found Pentaho Data Integration.

    I have since created many successful transformations and uninstalled OWB from our machines.

    If only all enterprise software were this easy.

    Regards,

    Edward Gibbons
    Southern California

  • Hello Matt,
    I just did want to say hello. And thank you for the good time we spend in china.

    scott

  • Hi Matt,

    I have been using Kettle since its first open source release. I have implemented at many customers. Thanks!!!

    I have had this problem in the latest release. I do daily updates from an ODBC source database. After the process has run for a month or so the reading from the database really slows down (from 2000rps to 150rps). The only thing that helps is to reboot the server OR delete the previous ODBC config and re-config. Any ideas?

    Thanks again. BTW I am presenting Pentaho to the largest BI conference in Africa tomorrow.

    Regards,

    Van Zyl Kruger
    South Africa

  • There is probably a memory leak in the ODBC driver somewhere (some DLL). If you can, try to use a direct JDBC connection.
    In the future, please post problems like this on our forum or create a bug tracker.
    Good luck with the conference!

    Matt

  • Salah

    I can guess that Kettle is one of greatest open source data integration tools.
    I searched on internet for a comparsion between Business Objects data integrator and Kettle but couldn’t find.
    I need such comparsion in study to my company that is planning to implement a data integration tool.

    Any Help?

    Thanks.

  • Salah, why don’t you download both tools and see for yourself?
    What’s that you say? You can’t download BODI? I guess that’s a big difference already, isn’t it.

    Matt

  • Fred

    Matt,
    My company is in the beginning stages of implementing OWB. Could you give us some reasons why Kettle is better? Any information you could give might save us a lot of time and effort with OWB.

    Thanks
    Fred

  • Ouch Fred, do yourself a favor and read this blog entry: http://www.ibridge.be/?p=65
    If against my advice you would still go for OWB, make sure you have someone with very good in-depth Oracle knowledge on your team.

    Matt

  • Lilia Muñoz

    Hello Matt, thank you for your blog is very helpful. I am looking for the metamodel of the tool Kettle, I need help

  • yxskkk

    Hi Matt,I am chinese,I want to learn kettle and make friend with you,am I?
    My msn is : yxskkk@263.net~hehe

  • Matt, fantastic work…!!! My name is Nicolas Nakasone Bi Consultant too, in this moment for here yet don’t ear about open source, but with your innovate and join effort, the bi open source will conquest all the world.

    Best Regards from Lima, Perú.

  • Babs

    Hi Matt,

    The blog is a great read, thanks!

    We are currently evaluating buying an ETL tool. What would you say to those (vendors) who say,

    1. Open source tools such as Pentaho, Talend etc are for smaller ETLs managing small volumes and are for small ETL jobs not for entrprise class ETL?

    2. If you have other tools from them in your environment for example, BOBJ BI and BOBJ Data services OR IBM DataStage and Cognos, then the consolidation and integration of meta data from these tools allows for easier management resulting in reduced workload and better quality of data.

    3. How does a tool like Kettle address this issue when an organization has a different BI platform?

    Thanks,
    Babs

  • Hi Babs,

    at Pentaho we have been selling professional support and services for more than 2 years. In that period of time we’ve gathered a nice collection of customers. Every now and then we announce this over at Pentaho.com so go there if you want to have a look at a few customer cases.

    Answer 1. Now that Pentaho Data Integration offers performance equal to or better than the commercial vendors (you should try yourself!!) the only defense left for the vendors is FUD at the moment (Fear Uncertainty and Doubt). We just had a customer of ours do their own benchmark against BODI and they couldn’t find a situation where BODI was faster than PDI. (PDI was at least 20% faster) In the customer references we have a testimony of a company that said they replaced OWB and saw performance go way up as well. Mind you, in a lot of these cases, these companies would still have selected Pentaho Data Integration if it would have been 20-50% slower!

    Answer 2. Ironically, it’s not BOBJ, COGN, IBM nor INFA that are open in their specifications and metadata. Ask yourself this question: how is the inclusion of more closed software in your stacks going to improve transparency? Obviously, if you have a lot of money and don’t mind the perpetual vendor lock-in, it doesn’t matter. For a lot of organizations, completely open systems are the way forward.

    Answer 3. In large corporations with large deployments of proprietary data integration tools, the purchase and maintenance cost of the software is substantial. However, the invested cost in terms of time (work) is usually a lot more. As such, it becomes a huge vendor lock-in. For example, I’ve heard of a company that had their people work in shifts because they couldn’t afford any more DataStage workstation licenses. In the end, what it comes down to is that companies usually budget costs per project and that Kettle is then being deployed for one small separate project (usually to see how good it works) and then another, and another. In these configurations, it works alongside the proprietary tools. The fact that Pentaho Data Integration is very easy to set up, configure and manage has something to do with it I guess. The overhead of maintaining Kettle as an extra tool is far far less than the purchase cost of additional licenses or paying more maintenance costs. Additionally, in the long run (4 years or more) it gives these organizations hope for better times.

    Take care,
    Matt

  • taoufiq

    Bonjour Matt,

    je veux juste te dire bravo pour ce belle outil Kettle vraiment je l’adore
    je suis entrain de convaincre mes supérieurs d’opter pour la solution open source Ketlle au lieu d’acheter l’autre

    juste une question si tu peux m’aider quels sont les choses clés que je dois absolument leurs parler pour les convaincre

    merci beaucoup

    Taoufiq

  • Bonjour Taoufiq,

    Tu peut toujours m’envoyer un e-mail ou tu peut utiliser notre forum :

    http://forums.pentaho.org/forumdisplay.php?f=135

    A+,

    Matt

  • Jihong Liu

    Hi Matt,
    Does Pentaho Data Integration supports transformation level transaction now?
    I could not find out this feature in version 3.1.0

    Thanks
    Jihong

  • Sure it does Jihong (hint: “Unique connections” option in the transformation settings). However, please post your questions to our forum.
    Thank you for your understanding.

    Matt

  • Terry

    Hi Matt,

    I happened to know kettle recently and realized that kettle is great tool.
    Currently I am working in the field of distributed computing.
    My interest is making distributed computing easy for users who are unfamilliar with
    I think Kettle is good solution for that purpose.
    I’m trying to develop hadoop(http://hadoop.apache.org/) components for Kettle because Hadoop is one of the most famous distributed computing plartforms.
    I will contact you again with some sample plugins.
    Any advice and comment is welcomed

    Thanks
    Terry

  • Hi Thierry, for samples of plug-ins, you can visit the PDI Plugins page:

    http://wiki.pentaho.com/display/EAI/List+of+Available+Pentaho+Data+Integration+Plug-Ins

    Feel free to post more questions on our forum.

    All the best,
    Matt

  • taoufiq

    Bonjour Matt,

    je me demande si il existe des certifications pour les utilisateurs de Kettle.

    puisque comme je suis intégrateur de PDI il me demande souvent si je suis certifié Kettle, puisque le marché parle plutôt langage certification que compétence.

    A+

  • Hi Matt,
    Did you have a blog post that said, basically “I’m a former OWB expert who was sick of OWB, so I created Kettle.”?

    I remember reading that, but I haven’t been able to find it anywhere.

    Thanks

  • Bernie, check the “Making the case for Kettle” post.
    I didn’t really put it as colorful as that but it was indeed very much like that.
    During first 4 months into the last project I did with OWB (9i) we got 6 (six) serious bugs *accepted* by Oracle. If you know how hard it is to get bugs accepted by Oracle, you know what I’m talking about :-)

  • ????????????????Matt? ??????? ?? wonderful tools , Matt , you are sooo cool !

  • o ,so sorry ,your blog have some problem for chinese

  • Shaheed Fazal

    Hi Matt,

    I was wondering whether Kettle can be used for the scenario below:

    – I have a master list of products
    – I want to match another list of products (format and codes are not the same)

    I was thinking of using some sort of fuzzy matching but I want humans to verify each match because I am dealing in drug names and if one character is out then it can cause huge problems. Is this possible?

    Also, I get these lists daily so will Pentaho store the accepted matches in some sort of index?

    Looking forward to a favorable response.

    Shaheed

  • Shaheed, you could use the “fuzzy match” step of PDI 4 in combination with some web logic, perhaps using the Pentaho BI server.

    Good luck,
    Matt

  • Razane

    Hi Matt,

    I need your help, I have to extract the content of a binary object ,and I don’t know how to do it , I tried with talend but I failed.

    Thank You,

    Razane.

  • Razane

    My question is : How do I extract a Blob from my database Oracle ?

    Thanks

  • Razane, you can ask your questions on the Kettle forum over here:

    http://forums.pentaho.org/forumdisplay.php?f=135

    Good luck,
    Matt

  • Leonardo Müller

    Hello Matt!

    I work with the Kettle for a year and a half in a large project in Brazil. We’re developing to the highest courts of the country with vast amounts of data and architecture and business rules too complicated. Caio Moreno Junior from Sao Paulo, who knows you, told me that perhaps if interessase Pentaho to use our project as a case due to its complexity. If you want more details write to my e-mail and keep in touch!

    Leo

  • Mihai Manea

    Hello Matt
    I did an internship in one of Sybase’s branches, where i developed a Pentaho ETL plugin for bulk loading data from Sybase IQ database.Now I am back in the university and because i liked Pentaho i would like to join the Pentaho’s community as Java developer.
    Could you please give me some info about how I can bring my contribution and with which current community developers I could work.

    Best regards
    Mihai Manea

  • Hi Mihai,

    Anyone can be a contributor. Simply create a JIRA case with your source code attached.
    If you want to contribute regularly then send me an email and we’ll set you up with write access to our repository.

    Thanks in advance for your help!

    Regards,
    Matt

  • Mihai Manea

    Hello Matt
    Regarding your proposal on contributing regular, that it would be great!
    Could you please give me a contact email where i could reach you.

    Best regards
    Mihai Manea

  • Krzysztof Radecki

    Hello Matt,

    I believe I found an error in your book (“Pentaho Kettle Solutions”). On page 234 under ‘General Information’ the following can be found: “Note that the Commit size is only applicable in the lookup mode”. Shouldn’t it say “…in the insert mode”?

    Best Regards,
    Krzysztof Radecki

    BTW. The Book is really great.

  • Idan Koch

    Hi Matt,

    I hope you can help i’m trying to run transformation (ktr file) on jboss 5 + kettle 4
    when trying to invoke the transformation on the server using:
    transMeta = new TransMeta(filename);

    i get vfs file not found :
    org.pentaho.di.core.exception.KettleXMLException:
    Error opening/validating the XML file ‘/eventsFileToDB.ktr’!

    Unable to get VFS File object for filename ‘/eventsFileToDB.ktr’ : Could not find file with URI “D:\eventsFileToDB.ktr” because it is a relative path, and no base URI was provided.

    what do i need to do to configure it?

    H-E-L-P

  • Hi Krzysztof,

    You are right of-course. Thank you for your correction & compliments!

    Matt

  • Hi Matt,
    thanks for having conceived Kettle, an excellent tool that significantly reduces development time of a BI solution. I myself have started a few weeks to use Kettle and Pentaho suite. Excellent your book “Pentaho Kettle Solution “.

    My Blog: http://musarra.wordpress.com

    Bye,
    Antonio.

  • Hi Matt

    I am looking for someone with the following skills in New York, New Jersey, Pennsylvania area, please let me know if you would know anyone.

    Education: Bachelors in Engineering or Equivalent

    Essentials: Excellent communication skills

    Year of relevant experience: 6 years or more

    Technical skills:

    1. Strong fundamentals in ETL concepts

    2. Strong in database concepts

    3. Working knowledge of Pentaho Business Intelligence Platform

    4. Working experience on ETL tools with experience on Pentaho Kettle/ Pentaho Data Integration.

    5. Should have prior experience in data modeling , data mapping , data conversion / migration

    6. Working experience using Pentaho features like, Reporting, Analysis, Dashboards

    7. Proven experience in create, edit, update, and publish OLAP schema definitions

    8. Experience in portal integration, metadata management, integration of security

    Nice to have:

    1. Experience in architecting and designing data warehouse.

    2. Strong expertise on Data Modeling and Data Analysis

    3. Knowledge in integrated report bursting

    4. Knowledge in role based security and mapping users with user groups

    5. Experience in administration and configuring / maintenance of multiple Pentaho Environments (Dev/QA/Prod)

    6. Onsite / Offshore management

    7. Knowledge on off the shelf products like BO, Cognos etc

  • Hi Anil, feel free to post job offers on the Pentaho Data Integration forum.
    Thanks,
    Matt

  • Hello Matt,
    I just started using Kettle and love it. I’m still a rookie at it but getting much better. I wanted to get your opinion on the best approach to load lots of files in parallel. I have thousands of files that need to get loaded daily into an oracle database. I’d like to do everything in Kettle (as a long time oracle dba I know I could user parallel direct path loading), However I like the management and ease of use with Kettle.

    How would I get the best performance with a clustered solution. Do I need to split the files across all of my servers before loading . (ie, can the load be done in paralllel if the files only exist on one server or do I need to spread them across all the nodes in my cluster).

    Thanks again, I appreciate everything your doing for Kettle. Makes things much easier for me.

  • Karthick

    Hi Matt,

    Kettle is great and I find the documentation part to be very less. I dont see any document to use steps like JSON Input and JSON Output. I got a issue on this too

    Could you help me to the solve the issuse I have posted in http://forums.pentaho.com/showthread.php?82056-JSON-Input-step-for-Mongodb-MySQL&p=256669&posted=1#post256669

  • Dear Kettle Master Matt,

    I am facing a problem, which troubles me for a long time and is out of my ability. It would very kind of you to solve this, and the problem is as below:

    1? configure the DSN on my computer , using the Webtrends ODBC-driver(I need to export the data from the Webtrends, which provides the ODBC export driver)
    2? create a “ODBC” connection and choose “generic database”. And the connection id tested successfully
    3? There is an error when I run the SQL select statement:
    Unable to retrieve database information because of an error Error occured while trying to connect to the database Error connecting to database: (using class sun.jdbc.odbc.JdbcOdbcDriver) [WebTrends ODBC Driver] Failed to connect to Database using this connection string Reason: CoInitialize has not been called

    In addition, the edition of the Kettle is PDI-CE-4.1.0-Stable.

    Sincerely
    Allen

  • Folks, please give it up. This is my blog, not a support forum.

  • Sorry! and thank you all the same for the sharings in your blog.

    Allen

  • hi matt,

    There are some guys in our company working about kettle, Now I have a problem about kettle 4.0. Why do you move all valid uses of BaseStep into StepInterface

    and stop extending Thread? Could you please describe the advantages and disadvantages? thanks ,thanks!

  • Hi Matt –
    I am certainly a long time user of Kettle (PDI) and I rarely get the opportunity (time) to post to blogs but I just wanted to say that I have truly been blessed by your contribution of Kettle. Hands down, Kettle was the absolute catalyst for our success here at Loma Linda. The rest of the Pentaho suite was just the icing on the cake. It literally saved our organizations project from falling off the budget bandwagon. When i saw that HL7 has been incorporated into the next Kettle release I almost fell back on my chair. You and the rest of your talented corhorts ROCK!!!! Thanks for your never ending passion to make Kettle the absolute BEST ETl tool available.

    Cheers,

    Darrin

  • HT

    Hi Matt

    Hope you’re not tired of compliments: PDI and your blog rock!

    I just saved HOURS of work.

    Thanks!!

    HT

  • lemon

    hi,u look so young,how old r u if could i ask, u did a great job ,thank u

  • Hello Matt
    In my company we have PDI 3.2 for which we created several plugins.
    Now we wish to make our plugins compatible with latest version 4.2 and afterwards give these plugins to community for free.

    The reason why i am contacting you is because the arrangement of Java classes in the packages is different from 3.2 compared to 4.2.
    And due to this fact, we encounter problems in re-mapping our plugin classes to the PDI core classes of PDI 4.2!

    Could you please tell me where i could find info about how classes where arranged in 4.2 compared to version PDI 3 ??

    Thank you and keep up on doing a good job.
    Best regards
    Mihai Manea

  • Macin

    Hey Matt,

    Macin here again!
    Does PDI 4.2 have the possibility of passing parameters and output data in a browser? (Like you can do already with transformations)

  • Anything is possible with JavaScript Macin. You can write anything you like to your browser using the “Kettle data in a browser” post : http://www.ibridge.be/?p=199

  • Chris

    Matt, working through the “Pentaho® Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration” book. Looking for the Sakila DWH files, but they don’t seem to be on the Wiley website. Am I missing something?

  • Hi Chris,

    They’re in the download offered by Wiley over here: http://is.gd/mEii13
    Look in the ZIP file under chapter 4, you’ll see the Sakila DWH file archive(s).

    Matt

  • Chris

    Thanks! Found them.

    Chris

  • Jim

    Did you ever get a satisfactory answer for PDI-325, Parameter 1 is the return clause of the stored procedure call, it can only be registered as an integer type. I am getting that in Pentaho, calling a Stored Procedure in z/OS. Seems similar to other driver related issues i have searched. The call worked over a year ago, but not now.

  • manoj

    Hi Matt,

    We are trying to run a pentaho transformation from java.The transformation has an HbaseInput(reading data from Hbase) and storing the data in a file after sorting it in ascending order.
    But because of Hbase input it is showing the following error:-

    NFO 25-04 18:53:29,411 – Using “/tmp/vfs_cache” as temporary files store.
    org.pentaho.di.core.exception.KettleXMLException:
    Error reading object from XML file

    Unable to load step info from XML step nodeorg.pentaho.di.core.exception.KettleStepLoaderException:
    Unable to load class for step/plugin with id [HBaseInput]. Check if the plugin is available in the plugins subdirectory of the Kettle distribution.

    Unable to load class for step/plugin with id [HBaseInput]. Check if the plugin is available in the plugins subdirectory of the Kettle distribution.

    at org.pentaho.di.trans.TransMeta.loadXML(TransMeta.java:3422)
    at org.pentaho.di.trans.TransMeta.(TransMeta.java:2959)
    at org.pentaho.di.trans.TransMeta.(TransMeta.java:2924)
    at org.pentaho.di.trans.TransMeta.(TransMeta.java:2911)
    at org.pentaho.di.trans.TransMeta.(TransMeta.java:2888)
    at org.pentaho.di.trans.TransMeta.(TransMeta.java:2863)
    at PentahoTest_pkr.main(PentahoTest_pkr.java:72)
    Caused by: org.pentaho.di.core.exception.KettleXMLException:
    Unable to load step info from XML step nodeorg.pentaho.di.core.exception.KettleStepLoaderException:
    Unable to load class for step/plugin with id [HBaseInput]. Check if the plugin is available in the plugins subdirectory of the Kettle distribution.

    Unable to load class for step/plugin with id [HBaseInput]. Check if the plugin is available in the plugins subdirectory of the Kettle distribution.

    The version of Habse and Hadoop used is as follows:-
    Hbase:-hbase-0.90.4-cdh3u2
    Hadoop:-hadoop-0.20.2-cdh3u2
    Eclipse;-Version: 3.3.2

    Please let mw know any solution as soon as possible.

  • You either didn’t initialize Kettle properly in Java (KettleEnvironment.init()) or more likely you are including the Kettle plugins on the Hadoop nodes while also using the new distributed cache at the same time.

    Why don’t you post your question on the Pentaho Big Data forum : http://forums.pentaho.com/forumdisplay.php?301-Big-Data

  • Madhan

    Hi Matt

    I am very new to Pentaho and trying out to write transformations for some of my business needs.I was able to move data from tables from source to other tables that reside in SQL server.

    However , when i try to move data from a table into a table which has a identity column .. i get the error “Identity error is set to off” .. which i totally understand. i introduced a new Execute SQL hop in between steps to set the identity insert ON . but it still gave me same error.

    Can you please tell me what is wrong here ? and also point me in some direction reg this for a workaround.

    Thanks,
    Madhan

  • Hi Madhan,

    Can I ask you again to please post support questions either on the forum or to the Pentaho support team?

    Thanks for your understanding. You can find the Kettle forum here: http://forums.pentaho.com/forumdisplay.php?135-Pentaho-Data-Integration-Kettle

    Best of luck,

    Matt

  • Sta

    Matt,

    Looking for trivia to run on the screen during a presentation about the use of PDI at our company and was trying to find in your book the reason for the use of the ‘kitchen’ terminology for so many of the names of things in PDI – Can you refresh my memory as to why you chose that terminology?

  • Hi Matt,

    It’s good to meet someone keen in this whole area. I worked with Intalio for a stretch in 2003/2004 doing consulting, tech pre-sales and training in Australia, New Zealand, Singapore. I have sat on an expert panel on business process improvement at Queensland University of Technology in Australia too.

    I am working (consulting) at a wonderful not-for-profit in Brisbane, Australia now. They help heaps of individuals and families Australia wide and further afield. They have over 540 limited range radio stations, devotional publications and an on-line book shop. They are UCB Australia. http://www.ucb.com.au. I’ve really come to appreciate these guys the past 5 months and would dearly like to help them well… which brings me to the purpose of contacting you…

    We have a requirement for systems integration. We have a .NET app in the cloud (OrderManagementSystems), a J2EE app on our server (KonaKart), a CMS we are choosing (or trying to!), MPX Donations/CRM (Win client/server) and some other odds end ends. I have looked at Talend, Jitterbit, Pentaho, Centerprise and more. I’m struggling to find an equitable way forward that will provide us with ability to integrate DB & web services (and possibly message queues) across these platforms. I saw it alluded to that Kettle has a web service plugin, but I am having trouble getting clarity on the non-commercial stack here. if the vanilla Kettle has extensions available, I sure would like to know!

    Being a not-for-profit, these guys are a bit cash constrained. We need an integration engine for the project I am on, and furthermore it’s probable my next project will look at business processes and enterprise integration. I’d really like to get Pentaho’s community versions working well for UCB and see where it goes from there.

    Are you able to clarify my path in using Pentaho please. I have installed Pentaho on my Ubuntu Linux 64 box at my home office, and the browser is coming up ok. My guess is I try to start the GUI from within the software tree somewhere now the services are up? Is Kettle the open-source version, or is Pentaho open-source different? Where should I go for downloads, forums etc? Where are the best tutorials?

    I talked to Zachary Zuess the other day in AU, but he is of course focussed on commercial stuff, and is not into the open-source side of things. I’d really appreciate some brief hand-holding to assist these guys if possible.

    Yours sincerely,
    Steve Barnes
    Business Geeks Alliance (Australia)

  • Mike

    Hey Matt,

    Just wanted to reach out and express my gratitude for such an amazing product benevolently released to the world. I have saved ungodly amounts of time using in just a couple months and done other things I never would have been able to accomplish otherwise. I know you’ve been at this for a while but I think Kettle will ultimately be as ubiquitous as Excel is now. I know you guys at Pentaho must be making a killing on large Enterprise sales but I honestly think bundling this up as a stand-alone app for, say, $299, could easily make up for lost unit revenue with massive volume.

    All the best,

    Mike

  • Stefan Badenhorst

    Hi Matt.

    I have a quick question on PDI.
    We are not using PDI at the moment and I would like to know if it is possible and if you would recommend the following:
    As the first input step, we want to create a plugin that is also a UDP listen server.
    We have devices that send UDP messages. We want to listen for these messages and then process them through various steps and finally output them into a database table.

    Thanks for taking time to answer my question.

  • manisha k

    How should I configure dynamic database connection in pentaho As We are migrating from POSTGRES to MSSQL this should work perfectly in connecting Postgres and MSSQL.

    This Dynamic configuration should allow me to change between dev/qa/prod databases at runtime..

    How I need to maintain two database Configurations.

  • Tim D

    Hi Matt,

    Any estimate on when there might be a “continent_code” in your MaxMindGeoIPLookup java api?

    https://github.com/mattcasters/MaxMindGeoIPLookup/blob/master/README.txt

    Thanks.

  • As always, create JIRA case for it and we’ll do our best. Since there is no case for it the answer to your question is simply: no. :-)

  • Hi Matt,

    Are you aware of anyone that has interfaced with Hypertable? We’d like to use Data Integration with our Hypertable-resident data. Thanks,

  • Hi,

    How to Stream twitter data using pentaho ,i tried some other resources but it doesn’t works , Iam new to pentaho please help me step by step to streaming the twitter real time data..

    Regards,
    gopal

  • Dejan Proki?

    Hi Matt,

    I am working with Kettle and have a problem with integration with Impala. I saw there are bugs opened for their JDBC driver. It seems that you already have solution, I have seen you have attached a patch to the task (https://issues.apache.org/jira/browse/HIVE-4806). I am interested if you know a solution for fixing Impala JDBC driver and is there a way I can fix it, maybe build the driver with the patch you have supplied. I see that latest version of Kettle (4.4.2-GA) does not work with Impala. Can you tell me will version 5.0 work with Impala and when is it planned to be released?

    Best regards and thank you in advance,
    Dejan

  • Dejan Proki?

    I managed to merge your patch with main branch from apache repo on github and it works :)

  • Karthik

    Hi Matt,

    I would like to know few information.
    How can you expose PDI/Kettle job as a webservice. So that can call it through REST or SOAP call to execute the job. And if called as web service then how would you provide parameters? Example could be to load some data into DB based on an external application calling the web service.

    BR,
    Thanks

  • Steven Gimenez

    Any security patches dealing with Insecure Temporary File, CRLF Injection, or Information Exposure Through Sent Data? I have been trying to do a poc with the kettle product and it has been flagged in our security audit.

    Thank you,
    Steve

  • Yes, they went into 4.4.1 and 4.4.2, our patch levels delivered to our customers as well (4.4.1.1, 4.4.1.2 and so on). Upcoming 5.0.0 will also contain the apparently required ESAPI integration for all service points.

    That being said, I’m convinced most of these automatic scanners are full of it. In the case of ETL you either trust what’s being executed or you don’t. It’s as simple as that.

  • Thanks Matt for your work on ETL and kettle.. I am from China where many people are starting to use kettle in ther BI solution. I am currently on one right now, it’s a project for a bank to produce regulation reports.
    I am reading your , and wonder whether any Chinese version is avaiable already or under construction. If by any chance, I can get the permission from you (and your co-workers) and the publisher, I want to try with the translation and eventually bring it to all Chinese readers.
    Sincerely hope I’ll receive your reply mail.

    A million thanks!

    • Hello Huan Yang, thank you for your kind comments. Translation of Pentaho Kettle Solutions into Chinese has come up before but until now we haven’t been able to get approval from Wiley.
      Best of luck with your Kettle projects!

      Matt

  • mahesh

    I am Trying to access data from Cassandra…

    An unexpected error occurred in Spoon:
    Could not initialize class org.apache.thrift.transport.TSocket
    java.lang.NoClassDefFoundError: Could not initialize class org.apache.thrift.transport.TSocket
    at org.pentaho.cassandra.legacy.CassandraConnection.openConnection(CassandraConnection.java:234)
    at org.pentaho.cassandra.legacy.CassandraConnection.checkOpen(CassandraConnection.java:151)
    at org.pentaho.cassandra.legacy.CassandraConnection.setKeyspace(CassandraConnection.java:174)
    at org.pentaho.cassandra.legacy.LegacyKeyspace.setKeyspace(LegacyKeyspace.java:93)
    at org.pentaho.cassandra.legacy.CassandraConnection.getKeyspace(CassandraConnection.java:277)
    at org.pentaho.di.trans.steps.cassandrainput.CassandraInputDialog.popupSchemaInfo(CassandraInputDialog.java:926)
    at org.pentaho.di.trans.steps.cassandrainput.CassandraInputDialog$12.widgetSelected(CassandraInputDialog.java:518)
    at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
    at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
    at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
    at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1227)
    at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7368)
    at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:8673)
    at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:625)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.pentaho.commons.launcher.Launcher.main(Launcher.java:134)

    please any one solve this problem …..bcoz i am not getting the solution in Pentaho forums also….

  • Alan George

    Sir,

    I have to do a presentation on Pentaho at college. It would be very grateful if you just tell me how it evolved into a big database?

    Hope you acknowledge my request.

  • Alan, Pentaho is not a Big Data database but is capable of working with Big Data data stores.
    As always, please ask your questions on the Kettle forum folks –> http://forums.pentaho.com/forumdisplay.php?135

  • Thomas

    Hey Matt,

    I have watched your Youtube-videos the wonderfull “checkpoint”-restartability-feature for PDI. As mentioned in the KETTLE-wiki, it is integrated into PDI 5.0. Unfortunately i can’t find the context-menus and options mentioned on the wiki page in PDI 5.0-CE. Is this feature an enterprise-edition-only thing? Perhaps I am simply to stupid to find it ;-)

    Thanks in advance

    Thomas

  • Lise

    Hi Matt,

    I’m looking how to convert PDI v3 plugins to v5. I only found doc v3 to v4 on http://wiki.pentaho.com/display/EAI/Converting+your+PDI+v3+plugins+to+v4

    Could you help me ?

    Thanks in advance
    Lise

  • Version 4 plugins are compatible with v5.

  • Veera

    Hi Matt,

    I am evaluating this tool for some of the projects. As part of this POC, I have seen excellent performance when compared with the previous tool.

    My scenario is like, my input file format is not fixed (even column names/data types/no. of columns..etc). How can I make my schema dynamic?, I have seen examples of creating dynamic transformations using JAVA API. Does the JAVA API is the only option to create n number of transformations at run time based on the input file format.

    Thanks
    Veera

  • Eliana

    Hi Matt,
    I am from Peru, and I am also new in BI topics, working as a Java developer for great time and I want to start into BI world so I will be starting reading your articles first. :)

Leave a Reply

Your email address will not be published. Required fields are marked *