Archive for the 'Databases' Category

May 28th 2009

Google Goodies and Lego

Dear Kettle friends,

Will Gorman and Mike D’Amour, Senior Developers at Pentaho, are presenting Pentaho’s Google integration work at the Google I/O Developer Conference. (at the Sandbox area to be specific)   Yesterday, Pentaho announced that much.

Here are a few of the integration points:

  • Google maps dashboard (available in the Pentaho BI server you can download)
  • A new Google Docs step was created for Pentaho Data Integration Enterprise Edition
  • Running (AVI, 30MB) the Pentaho BI server on Android
  • A new Google Analytics step was created for Pentaho Data Integration Enterprise Edition
  • Since version 2.0, the Pentaho BI server depends heavily on Google Web Toolkit (GWT)

To top that off, Will twittered about this new Lego bar-chart + logo they created for the conference:

UPDATE: now with building instructions and action video!

We are all soooo proud of them!

Until next time,

Matt

No Comments yet »

December 11th 2008

Kettle at the MySQL UC 2009

Hello Kettle fans,

Like Roland I got confirmation earlier this week that I could present my talk on “MySQL and Pentaho Data Integration in a cloud computing setting”, at the next MySQL user conference.

I’m very excited about the work we’ve done on the subject and it’s going to be great talking about it in April.

See you there!
Matt

4 Comments »

July 18th 2008

Pentaho changes

I’m back at my favorite spot at the Orlando airport:

This week has gone bye so fast it’s kinda scary.  I got dragged into one meeting after another design session after another knowledge transfer opportunity for 5 days in a row.  After our long working days, the discussions and talks just continued over dinner and beers.

It was great to meet everyone and as always we had a good time around the office and at the Ale House.  I even managed to stay sober this time around.  Well at least most of the time.

As always, the thing that struck me the most was how fast Pentaho changes.  It’s almost like visiting a different company every time I drop in.  Since I don’t see the day-to-day changes around the office, the difference between the first time I visited (15 people) and now (70+) is striking.  The office space occupied more than doubled for example.

Well, let me tell you, it’s an honour to be able to witness this great growth process in person.  I’m looking forward to things to come!

Until next time,
Matt

2 Comments »

April 23rd 2008

Give MySQL a break please

In a unique display of mass hysteria, one blogger after the other and even slashdot (no, I’m not going to link) managed to take the completely innocent message that certain new enterprise features might get released as closed source only and turn it into an ongoing bad press onslaught about “MySQL closing down source code”.

Why don’t you all give MySQL a break here please?  The rule is always the same for everybody: the one that writes the code gets to pick the license.  Listen, I 100% believe in open source and I consider myself to be a big advocate, but commercial open source companies like MySQL (and Pentaho) are commercial entities.  At lease try to put yourself in their position for a second.  For example, if a customer asks you to NOT to release a piece of software they paid for, you don’t release it, it’s that simple.

In the end, what MySQL is doing is simple: they are experimenting with a commercial open source  (COS) model.  Why are they experimenting?  Because the concept of COS is very new and there are no clear guidelines.  It simply hasn’t been done before.  How do you keep growing?  How do you keep paying more open source developers?  How do you pay for the millions of web hits each day?  How do you pay for the millions of downloads, the Tera bytes of internet traffic?  How do you guarantee your long term survival?  How do you strike a balance between commercial success and widespread open source adoption?  How do you keep your investors happy as well as your community?

I guess we learned one thing the past week : it’s easier to spout criticism than to give answers to these tough questions.

Matt

7 Comments »

March 14th 2008

Describing outer joins in metadata

Dear Metadata fans,

It’s been a while that I blogged about Pentaho Metadata. This is undeservedly so because the last couple of months, a lot of things have been moving. Most of that is not really visible to the end-user. The GUI part of the metadata suite was attacked last year and doesn’t really need all that much work. What we have been doing is extending the underlaying architecture by making it more flexible, more robust and easier to program from an API viewpoint. Most of that work has been in the capable hands of Pentaho rock star Will Gorman. The work he did last year for example was building in support for libformula (Open Office formula) by Pentaho reporting wizard Thomas Morgner.

Lately, Will and I started on the next big thing: adding support for outer joins in Pentaho Metadata models. We knew going in that it was Pandora’s box that we were opening. It’s so much easier to say: people, just write a sane star schema, but in the end it can make sense to just throw a few models over an ODS or source system, for example for prototyping and evaluation reasons.

The problem with outer joins…

The problem with outer joins is that you need to know exactly what you are doing if you want to get good reliable results out of your metadata model. That is because the order at which you execute the various outer joins determines the eventual outcome of the data.

Take a look at the Wikipedia entry on joining and outer joining to know what we’re talking about. The situation with 2 tables can be explained quite easily: Table A has 5 records, you do a left outer join with Table B but only 3 records match. Well, you still have 5 records.

However, if you start to get 3, 4 or heaven forbid even more (outer) joins in the same query, the order in which you are going to execute the joins is going to have a strong impact on the actual result of the query. (See also: this scenario) Suppose you have a Table C with 2 records. B is left outer joined with C. If you do ( A Left Outer Join B ) and then Left outer Join that result to C, you have 5 records. If you do B left outer join with C (3 records) and then right outer join with A, you get 3 records.

In the distant past I’ve had to deal with those situations a few times in closed source BI applications and I swore I would never mess with outer joins again because of this join order problem. In an ad-hoc reporting situation, this is a real issue since the users don’t care and don’t want to know about this.

How did we solve it?

Well, if the join execution order determines the result, that order in itself has to become part of the metadata, and so that’s what we did add to the model.

We think that the order in outer join situations is as important as the relationships themselves and as such, the selected join order key that is entered is also displayed in the graphical model view.

You can enter this join order key in the relationship dialog:

As you can see, we also added the 0:0 relationship type that corresponds to the “Full Outer” join type.

What does the SQL look like?

If we were to select columns from all databases, apply a condition to the second table, add a condition to an aggregate and set a sort order, we get an SQL query like this:

SELECT

BT_TABLE1_TABLE1.PRIMARYKEY AS COL0
,BT_TABLE2_TABLE2.PRIMARYKEY AS COL1
,SUM(BT_TABLE1_TABLE1.PRIMARYKEY) AS COL2
,BT_TABLE4_TABLE4.PRIMARYKEY AS COL3

FROM TABLE4 BT_TABLE4_TABLE4 LEFT OUTER JOIN

(

TABLE1 BT_TABLE1_TABLE1 FULL OUTER JOIN TABLE2 BT_TABLE2_TABLE2
ON ( BT_TABLE1_TABLE1.PRIMARYKEY = BT_TABLE2_TABLE2.FOREIGNKEY AND ( BT_TABLE2_TABLE2.FOREIGNKEY = 2 ) )

)
ON ( BT_TABLE2_TABLE2.FOREIGNKEY = BT_TABLE4_TABLE4.PRIMARYKEY )

GROUP BY

BT_TABLE1_TABLE1.PRIMARYKEY
,BT_TABLE2_TABLE2.PRIMARYKEY
,BT_TABLE4_TABLE4.PRIMARYKEY

HAVING

(
SUM(BT_TABLE1_TABLE1.PRIMARYKEY) = 2
)

ORDER BY

COL0

Note that we use a nested join syntax to make sure that the requested join order is followed. What we also try to do is place the conditions as close to the join as possible for performance reasons.

What else did we change?

As you can see from the sample query above, we abandoned the aliasing of business tables with their name. Because most if not all names contain spaces, it uses quoting all over the place and it runs into some limitations if your names are too long on Oracle for example. As such, we replaced it by the business table ID. The end user never sees these queries anyway.

Another thing we modified is the way that formulas in business columns are processed. You can now include business columns from any columns in the business model in a single expression. This is obviously something that is also important when you do reporting on a 3NF model. Almost all objects you want to calculate with are going to be in a different business table. See this wiki page on Pentaho Metadata Formulas for more information.

What’s next?

The next things on the agenda are undoubtedly things like row level security and a special section for predefined conditions and calculated members on the business model level. There are also rumors that the long awaited Formula editor is nearing completion so we’ll be able to use that in the Pentaho Metadata Editor in the future as well. As always, you can play an important role in the evolution of Pentaho Metadata by letting us know what you love or hate about Pentaho Metadata. In the end, it is feedback from our customers and our community that drives all our software development!

Until next time,

Matt

No Comments yet »

Next »

Pentaho world image