swarmforest

coding stuff

Simple way to access CouchDB views using SQL

facebooktwittergoogle_plusredditpinterestlinkedinmail

In order to send CouchDB data for Business Intelligence cubes of one of my costumers, I searched the Internet for Foreign Data Wrappers who perform this access in a simple and fast way.

I’ve quickly found only these options:

  1. couchdb_fdw [https://github.com/ZhengYang/couchdb_fdw]
  2. couch-to-postgres [https://github.com/sysadminmike/couch-to-postgres]

The first one, is a FDW written natively in C (which I thought a bit complicated and long time without maintenance)

And the second is not a FDW, it’s just a solution that replicates data (I think I don’t need my data duplicated, so I’ve discarded this option).

Without any valid options, i remembered a year ago, I had contact with the excellent Multicorn, which abstracts the creation of Foreign Data Wrappers using python.

So I decided to write one that fits to my needs, then I started the couch-fdw r [https://github.com/mdaparte/couch-fdw] repository  with a Foreign Data Wrapper written in python using the Multicorn.

Now, to perform SQL queries against CouchDB views (and therefore integrating with traditional ETL tools) was very simple as described below: (copied from the github repo)

Installation

  • git clone https://github.com/mdaparte/couch-fdw.git
  • python setup.py install

Usage

  • Defining the Foreign Data Wrapper extension

Once in the postgresql you have to reference the extension and the foreign data wrapper like the code above

create extension if not exists multicorn;
drop server if exists multicorn_couchdb cascade;

-- THIS LINE CREATES THE COUCHDB FOREIGN DATA WRAPPER IN THE DATABASE
create server multicorn_couchdb foreign data wrapper multicorn
options (
  wrapper 'couchfdw.CouchDBForeignDataWrapper'
);

 

  • Creating the foreign tables

With the extension referenced, the next step is to build the foreign tables, this way (to reference the whole database, not a specific view):

create foreign table couch_all (
    "_id" character varying ,
    "_rev" character varying ,
    "_doc" json,
    "_runtime_error" character varying

) server multicorn_couchdb options(
    host 'http://localhost:5984/',
    database   'databasename',
    target_view 'all'
);

Or this way (to reference a mapreduce view in particular):

create foreign table couch_mapreduceview (
    "key" character varying ,
    "value" character varying,
    "_runtime_error" character varying,
    "p_startkey" character varying,
    "p_endkey" character varying,
    "p_group" character varying,
    "p_group_level" integer,
    "p_reduce" character varying,
    "key_0_key_name" integer,
    "key_1_key_name" integer,
    "key_2_key_name" integer
) server multicorn_couchdb options(
    host 'http://localhost:5984/',
    database   'databasename',
    target_view 'view_container.mapreduceview'
);

NOTE : The p_, key_N and _runtime_error columns are special.

  • Querying the foreign tables

You can query the foreign tables following some basic rules described below: the parameters that will be passed to the CouchDB are the columns (defined in the table) with the pattern starting with p_ (like p_group, p_reduce, p_grouplevel and so on);
The startkey and endkey can be filled manually or inferred from the key_N columns;
When querying and using the WHERE clause, the special key_N columns will be converted to the startkey and endkey properties, so you yave to fill the key columns following the order key_0_… key_1_…, otherwise, the query will not run and the wrapper will fill the _runtime_error column with the description of the error.

select * from couch_mapreduceview 
where
key_0_key_name = 2 and key_1_key_name = 63 and key_2_key_name = 27
and p_group = 'True' and p_reduce = 'True' and p_group_level = 3

 

So, I hope this project can be useful for someone that passed for the same problem

facebooktwittergoogle_plusredditpinterestlinkedinmail

Previous

How to use js libs (like underscorejs) in your CouchDB views

1 Comment

  1. Mike

    Hi,

    It is possible to use couch views in postgres by just installing the pgsql-http module (https://github.com/pramsey/pgsql-http)

    Eg:

    view in couch:
    {
    “_id”: “_design/mw_views”,
    “language”: “javascript”,
    “views”: {
    “by_feedName”: {
    “map”: “function(doc) { emit(doc.feedName,null); }”,
    “reduce”: “_count”
    },
    “by_tags”: {
    “map”: “function(doc) { for(var i in doc.tags) { emit (doc.tags[i],null); } }”,
    “reduce”: “_count”
    }
    }
    }

    To access view in postgres:

    WITH by_feedname_reduced AS (
    SELECT * FROM json_to_recordset(
    (
    SELECT (content::json->>’rows’)::json
    FROM http_get(‘http://192.168.3.23:5984/articles/_design/mw_views/_view/by_feedName?group=true’))
    ) AS x (key text, value text)
    )

    SELECT * FROM by_feedname_reduced WHERE value::numeric > 6000 ORDER BY key

Leave a Reply

Powered by WordPress & Theme by Anders Norén