Archive for the 'Open Source' Category

May 6th 2010

Book Review : Pentaho 3.2 Data Integration

Dear Kettle fans,

A few weeks ago, when I was stuck in the US after the MySQL User Conference, a new book was published by Packt Publishing.

That all by itself is something that is not too remarkable.  However, this time it’s a book about my brainchild Kettle. That makes this book very special to me. The full title is Pentaho 3.2 Data Integration : Beginner’s Guide (Amazon, Packt).  The title all by itself explains the purpose of this book: give the reader a quick-start when it comes to Pentaho Data Integration (Kettle).

The author María Carina Roldán (blogtwitter) is a seasoned BI consultant and a valued member of the Kettle community. Besides her frequent appearances on our forum, she is appreciated by many for the time she spent on the Kettle Tutorial.

I’m not going to go over the detailed table of content.  Since I wrote the foreword of the book, I’m sure you’ll agree I’m somewhat biased. However, in all objectivity, the book covers what it claims to cover: it does help the PDI/Kettle beginner tremendously.  It covers all you need to get started and then some: the installation of PDI, the typical “Hello World” setup of PDI, reading text files, calculating, scripting, databases, repositories, etc.  As the title indicates, this book covers the current 3.2 stable release of Kettle, not the upcoming 4.0 release. However, for as far as 99% of the topics covered are concerned, that shouldn’t make too much of a difference.

So obviously I can recommend this book very much. It’s a time-saver for those that are starting with PDI.  For those that have dabbled with Kettle before I must say that María packed the book with nice tips and tricks so I’m sure you’ll be able to learn a thing or two.

Until next time,

Matt

6 Comments »

December 8th 2009

Open Source BI : Pentaho Rules!

Dear Pentaho friends,

I just wanted to share some good news with you regarding a new study that got published today.

Mark Madsen, an independent business intelligence analyst, together with the BeyeNETWORK took an in depth look at what is going in companies with respect to open source BI.**

You can find the complete report over at the BeyeNETWORK, but here is the graph that particularly interested me:

My congratulations to the Pentaho community for pulling this off.

Until next time,
Matt

** This study was sponsored by JasperSoft, KickFire, Talend and Pentaho

4 Comments »

October 30th 2009

Book Review : Pentaho Reporting 3.5 for Java Developers

Hi Pentaho fans,

These are exciting times for Pentaho for sure.  These are also extremely busy times.  However, that doesn’t mean we can’t look around once in a while.  Today we’ll take a quick look at a new book that arrived on my doorstep a few weeks ago.  It’s titled

Pentaho Reporting 3.5 for Java Developers

I’m very pleased to be able to review this book as it is written by one of the smartest but more importantly also one of the nicest people at Pentaho: Will Gorman.  Not only that, he apparently had help from KC (Kurtis Cruzada) and Jem (Matzan) completing the dream team for this book.

And what a great book it turned out to be.  It covers pretty much everything from basic reporting, over mobile reporting, calculations and formula, sub-reporting, cross-tabs, charting down to the Java API.

Obviously, this book as been reviewed many times before by various people and websites. (Yes, it’s that popular)   To me that means that I can’t just do a quick review, I’m going to have to actually use and read the book.  And that’s what we’ll do today for this review.

We’re going to create a report in the form of a PDF.  The data for the report comes from a Kettle transformation.  We’re going to do it with my favorite programming language (Java) and a complete stack of Open Source Software…

I began by creating a new Eclipse project called KettleBook, download the source over here.
To make sure I didn’t miss any library dependencies, I used the complete “lib” folder of Pentaho Report Designer 3.5 as my class path. (not included in the download)

First, I went to Chapter 10 in the book and started reading the paragraph titled “Building a report using Pentaho Reporting’s API” as that seems to fit the bill. (page 266)

That part explains plain and simple how to create a new Master Report, how data sources work.  But wait, I don’t want a DefaultTableModel, I want to read from Kettle!  Well, a few page flips later we find ourselves on page 143 reading about the KettleDataFactory.  That got me quite far actually as the sample is quite descriptive.

So then I created a small transformation to read from a sample customer file using Pentaho Data Integration 3.2.  This is it:

It reads 100 rows of sample customer data, filters out the people from California, Florida and New York state.  That gives us 91 records.  We’re going to read from the RESULT step placeholder.

The part on page 147 I needed was this block:

KettleTransFromFileProducer producer = new KettleTransFromFileProducer("Customer data", transFile, stepName, "", "", new String[0], new ParameterMapping[0]);
KettleDataFactory factory = new KettleDataFactory();
factory.setQuery("default", producer);

This part describes a producer to the engine.

I then proceeded on page 269 and put a document header and footer on the report and an item band.  Then I put 4 columns on the page and the report was written.  This took me all of about 30 minutes. The nice folks at Pentaho Orlando will have to forgive me, reporting is not my specialty. Personally I was quite pleased that it was that easy to do.

So, with the report definition ready, I now wanted to create an actual PDF out of that.  More reading revealed that we needed a PDF Output processor (to generate the actual file) and a page-able report processor to paginate and process the report definition.  This is how it looks in my case:

  FileOutputStream fos = new FileOutputStream("files/output.pdf");
DefaultConfiguration configuration = new DefaultConfiguration();
PdfOutputProcessor processor = new PdfOutputProcessor(configuration, fos);
PageableReportProcessor reportProcessor = new PageableReportProcessor(report, processor);
reportProcessor.processReport();

5 lines of code to generate a PDF! Suffice it to say I was very happy.

In total I spent a little over an hour to produce this document:

It’s quite simple: if it weren’t for the book I would have a really hard time figuring out where to begin.  I probably would have had to talk to Thomas Morgner, the brain child of Pentaho Reporting.  A nice fellow as he is, communicating to him is not for the faint hearted. (Fortunately he recently moved to Ireland so things will get better soon)

All joking aside, if you are planning to create reports using the Java API, do yourself a favor and buy this book right away.  Even if you’re not going to use the API, Pentaho Reporting principles and concepts are explained in great detail.

Many thanks to Packt publishing for sending me the book to review and congratulations to Will Gorman and the reviewers for an excellent job.  Congratulations to Thomas and his community too for making Pentaho Reporting 3.5 a smash hit.

Until next time,
Matt

P.S. I’ll be obviously covering more of this Java API sample at the upcoming Devoxx conference in Antwerp.

1 Comment »

October 24th 2009

My new netbook…

Dear Linux fans,

Last weekend I saw an ad for a netbook in a Carrefour superstore leaflet that I guess was just too good to refuse.

Unlike other netbooks, this one was priced really low: €199,00 (including taxes which makes it cost my company €164.46 or about 200 $USD).  For me, that’s the price point where a netbook makes sense, not €400-500 what you see all over the place.

Now, for that low price, you get the following machine:

  • 1.6Ghz VIA C7-M CPU
  • 512MB RAM (DDR2 667, shared with video, 384 available)
  • 120GB hard disk (2.5″, 7200rpm)
  • 1024×600 LCD screen (pretty good quality actually)
  • Webcam
  • WIFI b/g
  • 2xUSB 2.0
  • VGA port
  • a multi-format card reader (SD, SDHC, MMC)
  • Microphone
  • Sound in/out
  • Mandriva Linux 2009.1

It was very interesting to see that “Windows 2007 Home Premium” was priced at exactly the same price.  Talk about a total waste of money on the Microsoft side.

OK, back to the netbook.  The memory issue is not a problem.  I already ordered a 2GB DDR2 RAM module for the machine at €39.

UPDATE 10/27 : the RAM arrived, was installed in 5 minutes and all works fine now.  With 1.9GB available the machine is a lot snappier too.

Performance is obviously not stellar but I didn’t expect this either.  I paid less for it then my current cell phone.  However, it plays full screen AVI without a glitch.

The only real problem the box has is that it comes with … Mandriva Linux.  Maybe I’m spoiled by years of Ubuntu use, but this distribution really sucks.  Can I please just install some software, customize the UI a bit?  Please?  I don’t recall the last time I couldn’t install a piece of software on Ubuntu because a package couldn’t be downloaded.  WTF?  And charge €28 just to get a couple of codecs to play audio/video? I can legally use these drivers in Europe without a problem.

Don’t get me wrong, all hardware is supported and works fine, including audio, the webcam, skype, flash, etc.

Anyway, I tried to put Ubuntu Netbook Remix 9.04 on it by booting from a USB stick.  Unfortunately, either the image or the stick has an issue since it freezes upon installer boot.  The live system boots but has a nasty video problem.  So I’m going to retry later next week.  Heck, maybe it’s better to just wait until Kubuntu 9.10 Netbook Remix comes out next week.

Feel free to leave advice on what distro to pick and how to best handle the install.  Also feel free to leave tips on how to explain the kids that this is not a toy.

Thanks in advance!

Cheers,

Matt

8 Comments »

July 20th 2009

The kindness of strangers

Dear Kettle fans,

There isn’t a week that goes by where I don’t find myself amazed by the number of contributions and help that the Pentaho Data Integration project receives in all kinds of forms.  There are people contributing anything from small patches to complete steps, folks helping out others on the forum, writing documentation, writing books, translating PDI, etc.  Without any question, this has been a truly amazing experience, not just for me but for the whole Kettle project.

It’s because of that overwhelmingly positive experience that I’ve always tried to be accessible and in contact with my community in all sorts of possible ways.  And because of that positive vibe I have refrained from commenting on the negative flip side to that story for the longest time.

The problem is really that lately things have been changing.  It’s probably caused in general by an increasing attention to open source and specifically by an increase in popularity of Kettle.  In any case, certain types of people do the following:

  • Send me personal email
  • IM me on skype/Yahoo!/MSN/AIM/…
  • Send me all sorts of messages and questions through the forums
  • Ask questions on this blog

Usually it’s a combination of any of the above.  Any time now I expect folks to be sending me direct twitter messages.  The questions are always the same:

I have an urgent Pentaho porblem.  I am incapable of using the forum for some stupid reason and so you have to help me, preferable now or within the next 15 minutes!!!!

This way, the meaning of “The kindness of strangers” becomes more and more like the one from the Nick Cave song.

I’ve just finished reading Linus‘ book “Just for fun” (Thanks again Domingo!) and his approach to the problem of staying in reach for people to contribute code and at the same time allowing yourself to have a life and a job is simple : if it ain’t fun, don’t do it.  Well, the barrage of this sort of questions has stopped being fun for me a long time ago.

As such, I’m going to try this approach: any question that could or should be asked on the forum is from now on silently ignored and deleted from my mailbox.  Any person that is not part of my “community” and that needlessly contacts me over IM gets blocked indefinitely.  And yes, that goes for twitter as well.  Off-topic questions on this blog go to the spam folder as well.  I will simply refuse to spend time on non-interesting topics.

I thought about creating a standard response e-mail, but any sort of replying is simply an encouragement to certain types of people and will only make matter worse. (been there, done that)

I’m sure everyone understands that this is the only way to free up time to work on the real problems at hand.  Thank you for your understanding in any case.

Until next time,

Matt

8 Comments »

Next »

Pentaho world image