In order to send CouchDB data for Business Intelligence cubes of one of my costumers, I searched the Internet for Foreign Data Wrappers who perform this access in a simple and fast way.
I’ve quickly found only these options:
- couchdb_fdw [https://github.com/ZhengYang/couchdb_fdw]
- couch-to-postgres [https://github.com/sysadminmike/couch-to-postgres]
The first one, is a FDW written natively in C (which I thought a bit complicated and long time without maintenance)
And the second is not a FDW, it’s just a solution that replicates data (I think I don’t need my data duplicated, so I’ve discarded this option).
Without any valid options, i remembered a year ago, I had contact with the excellent Multicorn, which abstracts the creation of Foreign Data Wrappers using python.
So I decided to write one that fits to my needs, then I started the couch-fdw r [https://github.com/mdaparte/couch-fdw] repository with a Foreign Data Wrapper written in python using the Multicorn.
Now, to perform SQL queries against CouchDB views (and therefore integrating with traditional ETL tools) was very simple as described below: (copied from the github repo)
- git clone https://github.com/mdaparte/couch-fdw.git
- python setup.py install
- Defining the Foreign Data Wrapper extension
Once in the postgresql you have to reference the extension and the foreign data wrapper like the code above
create extension if not exists multicorn; drop server if exists multicorn_couchdb cascade; -- THIS LINE CREATES THE COUCHDB FOREIGN DATA WRAPPER IN THE DATABASE create server multicorn_couchdb foreign data wrapper multicorn options ( wrapper 'couchfdw.CouchDBForeignDataWrapper' );
- Creating the foreign tables
With the extension referenced, the next step is to build the foreign tables, this way (to reference the whole database, not a specific view):
create foreign table couch_all ( "_id" character varying , "_rev" character varying , "_doc" json, "_runtime_error" character varying ) server multicorn_couchdb options( host 'http://localhost:5984/', database 'databasename', target_view 'all' );
Or this way (to reference a mapreduce view in particular):
create foreign table couch_mapreduceview ( "key" character varying , "value" character varying, "_runtime_error" character varying, "p_startkey" character varying, "p_endkey" character varying, "p_group" character varying, "p_group_level" integer, "p_reduce" character varying, "key_0_key_name" integer, "key_1_key_name" integer, "key_2_key_name" integer ) server multicorn_couchdb options( host 'http://localhost:5984/', database 'databasename', target_view 'view_container.mapreduceview' );
NOTE : The p_, key_N and _runtime_error columns are special.
- Querying the foreign tables
You can query the foreign tables following some basic rules described below: the parameters that will be passed to the CouchDB are the columns (defined in the table) with the pattern starting with p_ (like p_group, p_reduce, p_grouplevel and so on);
The startkey and endkey can be filled manually or inferred from the key_N columns;
When querying and using the WHERE clause, the special key_N columns will be converted to the startkey and endkey properties, so you yave to fill the key columns following the order key_0_… key_1_…, otherwise, the query will not run and the wrapper will fill the _runtime_error column with the description of the error.
select * from couch_mapreduceview where key_0_key_name = 2 and key_1_key_name = 63 and key_2_key_name = 27 and p_group = 'True' and p_reduce = 'True' and p_group_level = 3
So, I hope this project can be useful for someone that passed for the same problem