4.3 million rows per second

Earlier today I was building a test-case in which I wanted to put a lot of Unicode data into a database table. The problem is of-course that I don’t have a lot of data, just a small Excel input file.

So I made a Cartesian product with a couple of empty row generators:

4M rows per second transformation

It was interesting to see how fast the second join step was generating rows:

4M rows per second log

Yes, you are reading that correctly: 717 million rows processed in 165 seconds = 4.3 million rows per second.

For those of you that would love to try this on their own machine. Here is an exclusive present for the readers of this blog in the form of a 3.0.0-RC2 preview of 2007/10/12 (88MB zip file). We’ve been fixing bugs like crazy so it’s pretty stable for us, but it’s still a few weeks until we release RC2. Don’t do anything crazy with this drop! This is purely a present for the impatient ones. If you find a bug, please file it! (give us a present back :-))

Until next time,

Matt

2 comments

  • Mat,

    this is absolutely awesome…

    Is this performance the result of lazy conversion?

  • Hi Roland,

    Lazy conversion is typically used only for file handling. That’s where it really shines. (reading, writing, sorting, that sort of thing) It’s doesn’t really come into play here. The results in that department are also impressive. I’m sure more details will be released later about the I/O benchmarks that we’ve been doing. I can’t tell you a lot about it, but I can tell you that we’re reading files in the gigabytes per second range now. (that’s not on this laptop though ;-))

    This result by itself doesn’t mean that much, besides bringing new evidence that the new engine is pretty cool.
    But it did sure get everyone’s attention ;-)