Self Service Analytics using Dremio

Self Service Analytics using Dremio

Overview

Dremio, a self-service data platform, helps data analysts and data scientists to determine, organize, accelerate, and share any data at any time irrespective of volume, velocity, location, or structure. Dremio allows business users to access data from a variety of sources and prevents them from relying on developers.

In this blog, let us discuss about data transformation and data analysis using Dremio and data visualization using Tableau.

Pre-requisites

Download and install Dremio from the following link:
https://www.dremio.com/download/

Data Description

Online retail data with different product types, product prices, and quantities sold from December, 2010 to December, 2011 is used as a data source.

Sample Data Source

sample_data_source1

Synopsis

  • Connect different data sources with Dremio
  • Perform data transformation
  • Create virtual datasets in Dremio
  • Connect virtual datasets with BI tools
  • Visualize results in Tableau

Connecting Different Data Sources with Dremio

Different types of data sources available for performing data transformation activities are shown in the below screenshot:

connecting_different_data_sources_with_dremio

To connect Amazon S3 data sources with Dremio, perform the following:

  • In Data Source Types page, select Amazon S3 data source.
  • Connect to Amazon S3 location as shown in the below screenshot:

connecting_different_data_sources_with_dremio11

  • Connect to MySQL connection and provide required credentials as shown in the below screenshot:

connecting_different_data_sources_with_dremio22

  • Connect to Network Attached Storage (NAS) as shown in the below screenshot:

connecting_different_data_sources_with_dremio33

Performing Data Transformation

To transform data, perform the following:

  • Use UNION function to merge data from 3 different data sources such as S3, MySQL, & NAS and load data as virtual dataset as shown in the below screenshot:

performing_data_transformation

As price values are based on single quantity, total price needs to be calculated based on quantity.

  • Add “Total_Price” as a new field.
  • Calculate total price based on number of quantity as shown in the below diagram:

performing_data_transformation1

  • Perform aggregation with stock quantity and stock price based on the products in the source data as shown in the below diagram:

performing_data_transformation2

  • Round off the total price values to 2 decimal digits as shown in the below diagram:

performing_data_transformation3

Creating Virtual Datasets in Dremio

On successfully transforming data, create virtual datasets (view) on Dremio spaces to store the data based on source.

The virtual dataset for purchases done by each customer is as shown below:

creating_virtual_datasets_in_dremio

The virtual dataset for most quantity sold based on the product is as shown in the below diagram:

creating_virtual_datasets_in_dremio1

Connecting Virtual Datasets with BI Tools

To connect the virtual datasets with BI tools, export virtual dataset in .tds format to be used with BI tools such as Tableau, Qlik Sense, and Power BI as shown in the below diagram:

connecting_virtual_datasets_with_bi_tools

 

connecting_virtual_datasets_with_bi_tools1

Visualizing Results in Tableau

On clicking .tds file in Tableau, you will be redirected to Tableau for visualizing the data.

Most Purchases by Customers

most_purchases_by_customers

Maximum Number of Products Sold

maximum_number_of_products_sold

References

Dremio: https://www.dremio.com/

216 Views 1 Views Today