Superset is a famous OSS BI tool. In this blog, I introduce a temporary way to connect to TreasureData from Superset, as of 2018/04/05.
I think redash / Superset / Metabase is popular as OSS BI tools in Japan. I already introduced how to connect to Treasure Data from redash and metabase at the following documentations;
However, I couldn’t figure out why superset was not able to access to Treasure Data by using the Treasure Data Presto API endpoint. I tried it several times and gave up. (Because I’m not familiar with the SQL Alchemy which is used for connection to DB with Superset)
But, finally, I figured out the way to connect to Treasure Data from Superset. If PyHive, which is a library to manage Presto API, merge the following PR, you can connect to Treasure Data from Superset without any difficulties.
Moreover, the Treasure Data Presto API endpoint called Prestobase” provides a mechanism of authentication. In Prestobase, authentication is performed based on TreasureData’s APIKEY which is included in the X-Presto-User header. However, PyHive doesn’t add X-Presto-User into the header when making a GET request. This is the root cause why Superset is currently unable to connect to Treasure Data.
Until the PR is merged, the following is a workaround to use the change in your Superset.
How to Connect?
Apply a patch to PyHive
In order to access TreasureData from Superset, it is necessary to fetch the above PR first. Let’s apply the following patch (At the time of patch creation, the version of PyHive is v 0.5.1.)
On the installed superset server, move to the PyHive installation directory and apply the patch.
$ patch -u presto.py < pyhive.patch
If you have presto.pyc in the library directory, delete it and start superset.
Setup a connection to Treasure Data
- SQLAlchemy URI setting
- Extra setting
The connection setting is completed. After that, do the table setting, etc… as same as the other connection of superset.
Recommended table setting
Since I’m not familiar with Superset itself, I’m not sure whether this is the best, but please consider the following setting.
A table in Treasure Data is partitioned on an hourly basis based on the data of the time column (unixtime). Superset treats it as bigint data as default, so it can not be handled as timestamp/date type. Therefore, I think it would be better to set the following, because you can handle the time column as as a timestamp type in Superset.
- Type: TIMESTAMP
- Expression: CAST(TD_TIME_FORMAT(time, ‘yyyy-MM-dd HH:mm:ss’) AS TIMESTAMP)
- Is temporal: TRUE
Enjoy Treasure Data with Superset!