Reading from MongoDB

Hi Folks,

Now that we’re blogging again I thought I might as well continue to do so.

Today we’re reading data from MongoDB with Pentaho Data Integration.  We haven’t had a lot of requests for MongoDB support so there is no step to read from it yet.  However, it is surprisingly simple to do with the “User Defined Java Class” step.

For the following sample to work you need to be on a recent 4.2.0-M1 build.  Get it from here.

Then download mongo-2.4.jar and put it in the libext/ folder of your PDI/Kettle distribution.

Then you can read from a collection with the following “User Defined Java Class” code:

import java.math.*;
import java.util.*;
import java.util.Map.Entry;
import com.mongodb.Mongo;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.BasicDBObject;
import com.mongodb.DBObject;
import com.mongodb.DBCursor;

private Mongo m;
private DB db;
private DBCollection coll;

private int outputRowSize = 0;

public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
	DBCursor cur = coll.find();

	if (first) {
		first=false;
		outputRowSize = data.outputRowMeta.size();
 	}

	while(cur.hasNext() && !isStopped()) {
		String json = cur.next().toString();
		Object[] row = createOutputRow(new Object[0], outputRowSize);
        	int index=0;
		row[index++] = json;

	    	// putRow will send the row on to the default output hop.
        	//
    		putRow(data.outputRowMeta, row);
	}

	setOutputDone();

    	return false;
}

public boolean init(StepMetaInterface stepMetaInterface, StepDataInterface stepDataInterface)
{
	try {
        	m = new Mongo("127.0.0.1", 27017);
		db = m.getDB( "test" );
    		coll = db.getCollection("testCollection");

 		return parent.initImpl(stepMetaInterface, stepDataInterface);
	} catch(Exception e) {
	  	logError("Error connecting to MongoDB: ", e);
    		return false;
	}
}

You can simply paste this code into a new UDJC step dialog. Change the parts in the init() method to server your needs. This code reads all the data from a collection in a Mongo database.  The output of this step is a set of rows contain each one JSON string. So make sure to specify one JSON String field as output of your step.  These JSON structures can be parsed with the new “JSON Input” step and then you can do whatever you want with it.

Please let us know what you think of this and whether or not you would like to see support for writing to MongoDB and/or dedicated steps for it.  I’m sorry to say I have no idea of the popularity of these new NoSQL databases.

Until next time,

Matt

UPDATE: The functionality described in this UDJC code is available in a new “MongoDB Input” step in 4.2.0-M1 or later.

UPDATE2: We also added authentication for MongoDB in PDI-6137

P.S. To install and run MongoDB on your Ubuntu 10.10 machine, do this:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10
sudo apt-get update
sudo apt-get install mongodb

20 comments