Cumul.io is a great tool to build visualizations on top of your data, but how do we connect our raw data to the platform? While there are out-of-the-box connectors for you to use, you might:
Plugins are an answer to all these desires. They are data connectors that work as a bridge between Cumul.io and external data sources, like Facebook, Stripe, Athena, etc. They are a quick and convenient way to connect any data to Cumul.io.
In this post, we’ll write a basic plugin step by step, and we will get familiar with the concept of plugins. Since we will be creating a plugin for an open data source, there is no need to add authorization to our plugin here. In a future post, you’ll learn how to build more complicated plugins with authorization (OAuth or key/token).
We’ll connect the open API from citybik.es as an example, which contains data on bike sharing across the world. It has data on the availability of bikes per bike sharing station at any point of time. This API is freely available (so you can easily follow the tutorial) at https://api.citybik.es/v2/
Once we’ve written the plugin and added it to Cumul.io, it will appear as a new data source in your Cumul.io account, from which users can use datasets.
Below is an overview of what a user with plugin access will see.
Before we start building, it is important to realize that a plugin is in essence an API which fulfills a contract defined by the Cumul.io team. The contract defines that there are 4 endpoints in the API. Since we are not yet using authorization in this plugin, in this tutorial we will only use 2 of them.
Since we will be building an API, we need a webserver.
To get started, you only need 2 things:
Verify in a terminal that node and npm (node’s package manager) are installed and that your node version is greater than 8.x.x.
Create a project folder (in our case it’s called citybikes) and initialize your node project. For programmers new to nodejs: Initialization means that we create a package.json which contains some basic configuration of our project. This will be used to store the package versions that are used in this project. We can also use npm to create a package.json interactively in the terminal using the following code.
Aside of that, we will need an index.js (where we will program the endpoints) and a webserver.js (which will host our webserver).So, the folder where we will be coding the plugin should look like this.
Alrighty then, let’s start programming the plugin.
We will be using several libraries to build our plugin:
Install them all in your terminal using npm.
(–save makes sure they appear in the package.json so the next person can install all dependencies with npm install)
Open the webserver.js, and write the import statements for the required libraries.
Immediately after that, we’ll write a function that will be exported. This function will contain the configuration of our webserver. It can then be imported, and thus be reused to build other plugins.
Note that we use a module where we will export our server, in order to reuse it in multiple plugins. That means you’ll only have to write this once! Then, we’ll add the code to configure the server to that function.
We will not go into detail on the webserver itself, since it is beyond the scope of this tutorial. The code configures our webserver to parse the bodies as json, use compression, sets a few headers and configures the server to run at port 3030. To understand the code, it is useful to think of app.use(...middlewaremethod...) as a way to chain middleware methods that will be called on each request. Now that we have written the webserver, we can reuse it in any other plugin we make.
Our basic plugin only needs two endpoints, since it is an open plugin. The skeleton of our plugin looks as follows.
In this snippet, we instantiate the webserver and define that we will be writing a GET/datasets endpoint of type and a POST /query endpoint of type.
The /datasets endpoint is responsible to give information about the structure of the data to Cumul.io. Let’s start simple by returning static data. This will give you a clear example of the required return structure.
This example defines a burrito plugin that will only provide one dataset called ‘Burritos’. The code defines the dataset contains three columns, each of every available type (hierarchy, numeric, datetime). Regarding these data types, Cumul.io keeps things simple and only supports 3 data types.
We only wrote about 10 lines of code on a temporary plugin, yet now it’s a good time to see how to test our plugin.
Start your server: run the following command in your terminal to start the server node index.js.
To test the results, we can check out the result in our browser by going to localhost:3030/datasets.
Tip: other calls will be using the post method, use postman or something similar to test these.
Optional: try adding the plugin. More interesting is to see what these few lines of code would look like if you add your plugin to Cumul.io. In order to do so, you will need to make sure it is accessible from outside of your pc, and that it runs on https. One way achieve this, is to use ngrok or something similar. Running this in a separate terminal will give you a https url that looks like: https://dc0c717f.ngrok.io. You can always put your plugin online directly, or wait until you put it online to test this step.
Once your plugin is reachable through an https address, you can put it online and use it. Note that we can already see the datasets we defined, when we try to use the plugin (by adding a dataset).
Note that when you create the plugin, an app secret will be generated. You can retrieve this through Profile > Plugins.
This is a secret to ensure that only Cumul.io can call the plugin. We will add it to our .env file that will be loaded by the dotenv library.
Calling require('dotenv').config() in the script will export the contents of the .env file as environment variables, so we can use it on our code. You can of course export your environment variables manually as well if you prefer that.
Using this secret, we can ensure that only Cumul.io can call your plugin. Even for an open plugin, it can be useful to avoid that people use the plugin differently than intended. This could cause unnecessary traffic.
We can do this by adding the following lines of code to each of the endpoints we will write.
Until now, we used a static definition of burritos in /datasets as an example. Static definitions are perfectly fine (especially for databases or APIs without a defined schema). However, the citybik.es API obviously is not about Burritos. In the case of citybik.es, the names of the datasets can be derived from the API. The citybik.es API exposes several ‘networks’ which we will map to datasets. Each network (or dataset) contains the same information, so we can easily map this to 6 columns: station_name, last_update, latitude, longitude, free_bikes and empty_slots.
To write the actual datasets endpoint for citybik.es, we use the request library to call the citybik.es API, to retrieve the networks.
Then, we transform this result to a dataset with Cumul.io columns. Here, we use the ids and names of the original networks as the dataset id and name.
The result is that we now have a plugin that exposes the datasets of our citybik.es API. Users retrieve a full list of available sets when they access the plugin.
Of course, when your users add a dataset at this point, no data will come in. That is what we need the /query endpoint for. Whenever a dataset of the plugin is used in a chart, or is viewed using the data table, a call will be launched to the plugin’s /query endpoint.
Before we proceed, know that there are two types of plugins: pushdown enabled plugins and regular plugins. In this tutorial, we’re writing a regular plugin. This means that we will not handle aggregations (sum/avg/etc) in the plugin or in the underlying API/database. An example of a pushdown enabled plugin will be written in a next post. In the meantime, you can get more information about plugin in our developer docs.
In a regular plugin, the result of the /query call will return an array of arrays, which follows the exact structure that was defined by the /datasets endpoint. So in other words, if a query is done for a specific dataset, the return structure will have exactly that many values in a row as there are columns defined for that dataset. In the case of citybik.es, all datasets have the same structure which makes our implementation straightforward.
We will start off by checking whether the secret is provided in the query call.
To be clear on the data structure, we will start with a static example again. Note that we return an array of rows, in which we follow the same order of columns as we defined in our /datasets endpoint. Also note that dates have to be sent in ISO format (or you can use javascript dates, which will be serialized to ISO formatted strings).
In the case of /query, static data is of course not very useful (you’d better upload a CSV then). So, let’s retrieve the data from the citybik.es API and transform it to the correct format.
The plugin is finished, and the complete code for the plugin is only 56 lines long!
We can test the plugin in a similar way as before. However, we have a post call now, so let’s use postman or anything else that can send post calls.
First of all, make sure you set the header this time and set the Content-type, since we will provide a body.
In this body, we will send the id of the dataset that needs to be retrieved. We can see the result coming in below.
You can also test the local plugin using ngrok again, and see the results.
When we consider our plugin to be ready, we need to host it somewhere. This can be your own server or a cloud platform. Although we mainly use AWS, we will write an example for this plugin on Heroku, since it’s easy to start and has quite some functionality available for free.
Create a free account on Heroku and choose your app name and region.
Heroku uses a specific file, called a ‘Procfile’ (this is also the filename) where you define how Heroku should run your app. If you place a file with filename ‘Procfile’ in the root of your repository heroku will find it and know how to run your code. Ours looks like this.
A new Heroku app provides a remote git repository. Deploying means that you have to push your code to this repository. Heroku will detect changes to this git repository and prepare your deploy. To execute this command, we will have to install the Heroku CLI though.After making your repository, Heroku will present you with the instructions to deploy your code. In our case, we didn’t create a repository yet, so our deploy commands initialize the git repository first.Note: ‘git add .’ will push all files. It is generally not a good idea to push secrets such as the .env file. Use a .gitignore file to prevent that.
Finally, do not forget when you add the plugin to the platform (a new url requires you to add a new plugin) that you will receive a secret, which you’ll need to add to the environment variables of your API. In the case of Heroku, you can add environment variables here.
You should now have your API running on Heroku and can add the plugin to our platform.
Since we have now built a regular, open plugin, this post might give rise to some questions.
This will become clearer in future posts in this plugin series. If you’d like to delve further into our plugins, stay tuned for the next tutorials! For now we can briefly explain the possibilities of our powerful plugin API.
Authorization is possible by implementing the authorize endpoint for authorization (key & token), and both the authorize and exchange token for OAuth based authorization. These endpoints receive information from the user, which allows you to verify in your own database or authorization systems whether these users can have access. Information about the user is also provided to the query endpoint, which allows you to filter the dataset that this specific user can see.
The calculations of aggregations (for example sums or averages for a bar chart) happen by Cumul.io in this example. However, the query endpoint receives a query-like json structure which contains the information of the specific query. In case you want to write a more efficient query, you can calculate these aggregations on your database. This is what we call a pushdown enabled plugin, which we will cover in a future post.
All of this and much more… is coming!
PS: don't hesitate to share your feedback and get in touch with any questions!