Superset ❤ ️TreasureData

3 min readApr 5, 2018

Superset is a famous OSS BI tool. In this blog, I introduce a temporary way to connect to TreasureData from Superset, as of 2018/04/05.

Ref. https://medium.com/airbnb-engineering/superset-scaling-data-access-and-visual-insights-at-airbnb-3ce3e9b88a7f

Background

I think redash / Superset / Metabase is popular as OSS BI tools in Japan. I already introduced how to connect to Treasure Data from redash and metabase at the following documentations;

However, I couldn’t figure out why superset was not able to access to Treasure Data by using the Treasure Data Presto API endpoint. I tried it several times and gave up. (Because I’m not familiar with the SQL Alchemy which is used for connection to DB with Superset)

But, finally, I figured out the way to connect to Treasure Data from Superset. If PyHive, which is a library to manage Presto API, merge the following PR, you can connect to Treasure Data from Superset without any difficulties.

https://github.com/dropbox/PyHive/pull/200

Moreover, the Treasure Data Presto API endpoint called Prestobase” provides a mechanism of authentication. In Prestobase, authentication is performed based on TreasureData’s APIKEY which is included in the X-Presto-User header. However, PyHive doesn’t add X-Presto-User into the header when making a GET request. This is the root cause why Superset is currently unable to connect to Treasure Data.

Until the PR is merged, the following is a workaround to use the change in your Superset.

How to Connect?

Apply a patch to PyHive

In order to access TreasureData from Superset, it is necessary to fetch the above PR first. Let’s apply the following patch (At the time of patch creation, the version of PyHive is v 0.5.1.)

https://gist.github.com/toru-takahashi/69c96165645b8b47cc50b23198d19d0e

On the installed superset server, move to the PyHive installation directory and apply the patch.

$ patch -u presto.py < pyhive.patch

If you have presto.pyc in the library directory, delete it and start superset.

Setup a connection to Treasure Data

SQLAlchemy URI setting

presto://api-presto.treasuredata.com/td-presto/<default_db_name>

Extra setting

{
    "metadata_params": {},
    "engine_params":{"connect_args":{"username":"TD_APIKEY","port":443,"protocol":"https"}}
}

The connection setting is completed. After that, do the table setting, etc… as same as the other connection of superset.

Recommended table setting

Since I’m not familiar with Superset itself, I’m not sure whether this is the best, but please consider the following setting.

A table in Treasure Data is partitioned on an hourly basis based on the data of the time column (unixtime). Superset treats it as bigint data as default, so it can not be handled as timestamp/date type. Therefore, I think it would be better to set the following, because you can handle the time column as as a timestamp type in Superset.

Type: TIMESTAMP
Expression: CAST(TD_TIME_FORMAT(time, ‘yyyy-MM-dd HH:mm:ss’) AS TIMESTAMP)
Is temporal: TRUE

Enjoy Treasure Data with Superset!