March 2nd 2011 06:17 pm
Reading from MongoDB
Hi Folks,
Now that we’re blogging again I thought I might as well continue to do so.
Today we’re reading data from MongoDB with Pentaho Data Integration. We haven’t had a lot of requests for MongoDB support so there is no step to read from it yet. However, it is surprisingly simple to do with the “User Defined Java Class” step.
For the following sample to work you need to be on a recent 4.2.0-M1 build. Get it from here.
Then download mongo-2.4.jar and put it in the libext/ folder of your PDI/Kettle distribution.
Then you can read from a collection with the following “User Defined Java Class” code:
import java.math.*;
import java.util.*;
import java.util.Map.Entry;
import com.mongodb.Mongo;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.BasicDBObject;
import com.mongodb.DBObject;
import com.mongodb.DBCursor;
private Mongo m;
private DB db;
private DBCollection coll;
private int outputRowSize = 0;
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
DBCursor cur = coll.find();
if (first) {
first=false;
outputRowSize = data.outputRowMeta.size();
}
while(cur.hasNext() && !isStopped()) {
String json = cur.next().toString();
Object[] row = createOutputRow(new Object[0], outputRowSize);
int index=0;
row[index++] = json;
// putRow will send the row on to the default output hop.
//
putRow(data.outputRowMeta, row);
}
setOutputDone();
return false;
}
public boolean init(StepMetaInterface stepMetaInterface, StepDataInterface stepDataInterface)
{
try {
m = new Mongo("127.0.0.1", 27017);
db = m.getDB( "test" );
coll = db.getCollection("testCollection");
return parent.initImpl(stepMetaInterface, stepDataInterface);
} catch(Exception e) {
logError("Error connecting to MongoDB: ", e);
return false;
}
}
You can simply paste this code into a new UDJC step dialog. Change the parts in the init() method to server your needs. This code reads all the data from a collection in a Mongo database. The output of this step is a set of rows contain each one JSON string. So make sure to specify one JSON String field as output of your step. These JSON structures can be parsed with the new “JSON Input” step and then you can do whatever you want with it.
Please let us know what you think of this and whether or not you would like to see support for writing to MongoDB and/or dedicated steps for it. I’m sorry to say I have no idea of the popularity of these new NoSQL databases.
Until next time,
Matt
UPDATE: The functionality described in this UDJC code is available in a new “MongoDB Input” step in 4.2.0-M1 or later.
UPDATE2: We also added authentication for MongoDB in PDI-6137
P.S. To install and run MongoDB on your Ubuntu 10.10 machine, do this:
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10 sudo apt-get update sudo apt-get install mongodb
19 Comments »



Pedro Alves on 02 Mar 2011 at 20:05 #
please do.
we’re finishing elastic search support for writes, would be great to have mongo and couch too
ehcache.net on 17 Mar 2011 at 10:53 #
Reading from MongoDB…
Now that we’re blogging again I thought I might as well continue to do so.
Today we’re reading data from MongoDB with Pentaho Data Integration. We haven’t had a lot of requests for MongoDB support so there is no step to read from it yet. Howev…
Caio Moreno de Souza on 18 Mar 2011 at 19:59 #
Hi Matt,
Really nice post. Thanks man.
Garima on 31 Mar 2011 at 10:41 #
Hi Matt
Thanks for the post. We are planning to move from MySQL to MongoDB very soon. It will be great if PDI can provide full support for MongoDB in Spoon. We need that badly.
Thanks!
Matt Casters on 31 Mar 2011 at 11:12 #
Hi Garima,
I’ll create a few steps in a couple of weeks for sure. There seems to be a fair amount of interest in MongoDB so we’ll support it.
Regards,
Matt
Kaushal Sheth on 06 Apr 2011 at 22:19 #
This is great news. I am just starting a project which will require pulling data from MongoDB into a central data mart, and I was starting to dread asking for java resources to help build a custom plugin. I’m looking forward to the new steps you plan to create.
Thanks,
Kaushal
Greg Banbury on 13 May 2011 at 17:12 #
Hi Matt,
Firstly can I say you are an absolute star for working on the Mongo stuff!
Couldn’t be better timing for us. Any update on the new steps? Would be ideal if they are ready soon.
Cheers,
Greg
Matt Casters on 13 May 2011 at 18:16 #
The reader step is already in 4.2.0-m1
Greg Banbury on 19 May 2011 at 12:45 #
Matt,
The reader step works perfectly. Is there any documentation on the JSON input step you can point me to?
Cheers,
Greg
Shannon Hardt on 19 May 2011 at 21:23 #
Hi Matt,
Thanks for adding a MongoDB step! However, I can’t seem to find where to download the 4.2+ version. Will you point me in the right direction?
Thanks,
Shannon
Matt Casters on 19 May 2011 at 21:55 #
Hi Shannon,
You can download bleeding edge builds from our CI server over here: http://ci.pentaho.com/job/Kettle/
Or you can download 4.2.0-M1 over here : http://sourceforge.net/projects/pentaho/files/Data%20Integration/4.2.0-M1/
Good luck!
Matt
David Forrest on 08 Jul 2011 at 23:29 #
Hi Matt,
Could we get a Pentaho component like the mongodb one that reads from couchDB? Being that they are similar shouldn’t take too much effort to develop? There will be tremendous call for this as couchDB becomes widely use…
Regards,
David
David Forrest on 08 Jul 2011 at 23:30 #
FYI… it seems the path mapping for a large nested json schema is unpredictable with Kettle….
Thx
Joel on 23 Sep 2011 at 11:18 #
Is there a plugin to write to mongo db..We would like to import a large raw data after transformation into mongo db..
Matt Casters on 23 Sep 2011 at 11:33 #
There’s no “MongoDB Output” step but if MongoDB has an API to do bulk loading I’m willing to write it. In that case please create a feature requrest (http://jira.pentaho.com) with details on how you would like to see this step work.
If you need it faster you could use the “User Defined Java Class” step to drive the MongoDB Java API.
Thanks in advance,
Matt
izek greenfield on 22 Apr 2012 at 9:25 #
Hi Matt,
how i filter by date field with the MongoDB input step.
thanks
Izek
Matt Casters on 22 Apr 2012 at 15:30 #
http://wiki.pentaho.com/display/EAI/MongoDB+Input
David on 10 Aug 2012 at 13:34 #
And what about get the database and/or collection as parameters? any plan to do that?
Thanks.
Dan on 06 Nov 2012 at 11:28 #
Just incase anyone should come across this blog now - It’s worth noting that there is now a MongoDB output step in PDI too.