A Real Connection
by Koby Frank
Data Data Everywhere. Ideally, analysts should spend their time doing two things: (1) manipulating raw data, and (2) gaining insight from it. These two steps, broadly speaking, are both interesting and nontrivial. If done well, they are the drivers behind good policy-making or engineering or a healthy supply chain or good purchase order decisions. There is, however, a silent step zero to this process: getting the raw data in the first place. This step is not particularly profound. Though step zero should not take up much of anyone’s time, it can often take most of the time.
Data can live in many places. In fact, the modern data ecosystem is quite fractured. There are “data warehouses” like Snowflake and Amazon Redshift, where spreadsheet-type files are stored in the cloud and indexed for fast retrieval and querying. There are “data lakes” provided by Amazon, Google, Databricks, and other vendors, where information of any sort can be tossed into storage without being filed, allowing for more flexibility and fewer formatting constraints. There is data living locally, spread around on employee computers, and with vendors of all sorts — Google Sheets, Shopify, SAP, Salesforce, bank transactions and Instagram Business and so on.
Extracting data from one of these sources and moving it to a place where you can manipulate and gain insight from it can be particularly cumbersome. In general, it’s a manual process. Maybe your data is not in the correct format to be read into its destination. Maybe it’s updated weekly or daily or hourly, and each time it’s updated someone has to export the data all over again, an exercise that cannot be particularly enjoyable.
What’s a Data Connector? Hence data connectors. Through our data connectors, users have access to over 130 sources of data from within the Ikigai platform. These connectors incrementally sync with the source, providing the user a live linkage between the source of the data and Ikigai, where they can manipulate, visualize, and export the data with human-in-the-loop. Ikigai uses a collection of existing libraries of data sources (e.g., Fivetran and Plaid), as well as internally built-out connections less commonly provided by vendors (e.g., Snowflake), to provide users with an easy interface when pulling in their data. The process usually involves a single step — inputting a few fields about the path to the specific data, and, if necessary, entering your credentials. After that, Ikigai handles the rest. We keep this connection up-to-date, always grabbing the most recent snapshot of your data from the source, and handle all of your permissions to the data in the backend.
Of course, there are more than 130 sources of data in the world. For those sources we do not yet support, the more advanced user can create what we call a custom connector — with custom python code or our web-scraping capabilities, users can extract data from anywhere on the internet. Just like with our out-of-the-box connectors, Ikigai handles the incremental loading and syncing of this data to always keep it up-to-date.
Open Authorization. Many of Ikigai’s internally built-out connectors use Open Authorization, or OAuth. OAuth is a protocol for letting applications connect with one another without passing around actual user credentials. For instance, Plaid allows Ikigai to securely get transaction data from thousands of banks. Users authenticate from within Plaid or their own bank’s website, and instead of seeing those credentials, Ikigai receives tokens we show to the bank to prove we are authorized. These tokens are attached to our own client account, meaning nobody but us could use them to authorize themselves with the bank.
For the user, it is as easy as choosing your bank and entering your credentials. After doing so, Ikigai utilizes Plaid to get a full record of transactions made in that bank account, including the amount, date, location, and category involved in each transaction, along with other relevant fields. With this data inside of Ikigai, users can then move to gaining insight from it — tracking a budget, visualizing how much is spent on each category, and so on.
Another application we connect with using OAuth is Snowflake. Snowflake is perhaps the most popular and user-friendly data warehouse on the market. It is, however, not widely available to connect to, or requires users to download and configure database drivers on their local machines. Ikigai connects to Snowflake without this hassle. Similar to Plaid, we use OAuth instead of handling any user credentials.
The following table is a summary of how Ikigai achieves OAuth connection to Plaid and Snowflake. While there is a lot going on in the backend, for the user it can be a sub-sixty-second process.
About the Author
Koby Frank (Backend Software Engineer)
Koby Frank works on building out data connector capabilities, among other features, at Ikigai. He is a recent graduate from the University of Pennsylvania, where he received a Bachelor’s degree in Mathematics and Computer Science.