dwh - Ruby Gem for Analytics
Open source gem for connecting to popular data warehouses
The Ruby ecosystem has never been great with Data Engineering and Analytics. Python dominates here. Nothing wrong with that. I believe in best tool for the job. However, when we are building our Ruby apps, whether Rails or otherwise, we need to be able to connect to popular data warehouses for certain use cases.
Introducing dwh, a lightweight adapter providing a unified interface across popular data warehouses and databases for ruby. It is not an ORM and is designed be simple enough to quickly add new adapters.
Why didn’t we just contribute to Sequel?
Libraries like Sequel are amazing and comprehensive. However, its broad coverage also makes it more laborious to add new databases. We seem to be having an explosion of databases recently and a lightweight interface will allow us to integrate faster.
The adapter only has 5 core methods (6 including the connection method). A YAML settings file controls how it interacts with a particular db. It is relatively fast to add a new db. See the Druid implementation as an example. And here is corresponding YAML settings file for druid.
Core Methods
Six core methods rounds out the main functionality we need when deally with data warehouses. There is no direct attempt to type cast results into Ruby types. However, some of the adapters use existing gems like pg and mysql2, these provide type casted result sets. You can get back native results like so:
adapter.execute('select * from customers', format: :native) connection - Creates a reusable connection based on config hash passed in
tables(schema: nil, catalog: nil) - returns a list of tables from the default connection or from the specified schema and catalog
metadata(table_name, schema: nil, catalog: nil) - provides metadata about a table
stats(table_name, date_column: nil) - provides table row count and date range
execute(sql, format: :array, retries: 0) - runs a query and returns in given format
execute_stream(sql, io, stats: nil) - runs a query and streams it as csv into the given io
Tutorials and Guides
Quick Start
# Install gem
gem install dwh# Create an adapter instance
require 'dwh'
postgres = DWH.create(:postgres, {
host: 'localhost',
database: 'mydb',
username: 'user',
password: 'password'
})
# Execute query and return results as row of hashes
postgres.execute('select * from customers', format: :object)

