MongoDB pipeline for Scrapy

January 07, 2013

Reading time ~2 minutes

I just released a MongoDB pipeline for Scrapy, called scrapy-mongodb. The module supports both regular MongoDB deployments as well as replica sets. When logging your items with scrapy-mongodb you will instantly see the collected items in MongoDB. This post will show you how to use it in your Scrapy project.

See the scrapy-mongodb GitHub page for source code and additional documentation.

Installing scrapy-mongodb

The installation is straight forward. You simply install scrapy-mongodb using pip:

pip install scrapy-mongodb

Note that you might need to run pip as administrator.

Option 1: Configuring scrapy-mongodb for single MongoDB instances

We need to know some details about the MongoDB database that you want to store your items in. So update your Scrapy with the following:

MONGODB_HOST = 'localhost'
MONGODB_DATABASE = 'myDatabaseName'
MONGODB_COLLECTION = 'myCollectionName'

If you want us to create and use a unique key for your items, please add the following setting as well:


scrapy-mongodb will automatically ensure an index on that key.

Then we need to tell Scrapy to use the new pipeline. Add the following to your file:


Additional configuration options can be found at

Option 2: Configuring scrapy-mongodb for MongoDB replica sets

If you are logging the items to a MongoDB replica set, you will need to configure scrapy-mongodb to be replica set aware. Update your Scrapy with the following:

MONGODB_REPLICA_SET = 'replicaSetName'
MONGODB_DATABASE = 'myDatabaseName'
MONGODB_COLLECTION = 'myCollectionName'

If you want us to create and use a unique key for your items, please add the following setting as well:


scrapy-mongodb will automatically ensure an index on that key.

Then we need to tell Scrapy to use the new pipeline. Add the following to your file:


Additional configuration options can be found at


Done! Now start your spider just as usual and have a look in MongoDB for your items. They will show as soon as the spider has found and processed them, so you can see the progress as the spider crawls :).