Kylo Setup for Data Lake Management

Kylo Setup for Data Lake Management


Kylo is a feature-rich data lake platform built on Apache Hadoop and Apache Spark. It provides data lake solution enabling self-service data ingest, data preparation, and data discovery. It integrates best practices around metadata capture, security, and data quality. It contains many special purposed routines for data lake operations leveraging Apache Spark and Apache Hive.

Furthermore, it provides a flexible data processing framework (leveraging Apache NiFi) for building batch or streaming pipeline templates and for enabling self-service features without compromising governance requirements. It has an integrated metadata server currently compatible with databases such as MySQL and Postgres. It can be integrated with Apache Ranger or Sentry and CDH Navigator or Ambari for cluster monitoring.

Kylo’s web application layer offers features oriented to business users, including data analysts, data stewards, data scientists, and IT operations personnel. It utilizes Apache NiFi as its scheduler and orchestration engine for providing an integrated framework for designing new types of pipelines with 200 processors (data connectors and transforms).


  • Install MySQL (password: hadoop).
    Optional: change “bind_ip” to in /etc/mysql/my.cnf file and restart MySQL to enable access from outside server.
  • Ensure that “/opt/” has root privileges.
  • Download Java8 and extract to /opt/java8.
    Source: wget –no-check-certificate –no-cookies –header “Cookie: oraclelicense=accept-securebackup-cookie” -P /opt/

Note: Ensure that jdk1.8.92 or above is configured. Else, module “kylo-alerts-default” will not be compiled.

  • Download Scala and extract data into /opt/scala2.
    Source: wget -P /opt/
    wget -P /opt/
  • Download Spark2 and extract data into /opt/spark2
    Source: wget -P /opt/
  • Download Maven3 using binary and extract data into /opt/maven3
    Source: wget -P /opt/



  • In Maven module, install Alien in Ubuntu for RPM package. It will build both RPM & deb packages.
  • Set environment variables in ~/.bashrc & “/etc/profile (for all users)” file.
    • JAVA_HOME=/opt/java8
    • JRE_HOME=/opt/java8/jre
    • SCALA_HOME=/opt/scala2
    • SPARK_HOME=/opt/spark2
    • MAVEN_HOME=/opt/maven3
    • M2_HOME=/opt/maven3
  • Open new session in Putty or execute the below command to load added environment variables.

Test Configuration

  • Check whether Java, Scala, and Maven are properly configured.


  • Check whether Spark is properly configured.


  • Note: Move all the downloaded tar files into another directory called “tar_files”.

Building, Installing, and Setting up Kylo using Deb Package in Linux Ubuntu Machine

Downloading Kylo from GitHub

  • Download Kylo from the GitHub location provided in the Reference section.
  • Extract zip file: unzip


Executing Maven to Create Deb

  • Create deb using the below comment:
    It will take around 10-20 mins to download packages.
  • Clean and compile all class files, and package all modules (core, UI, service, setup) into RPM & deb packages using the below comment:
  • Skip unit testing for faster Maven builds using the below comment:
  • If you already have downloaded packages, run MVN in offline mode using the below command:
    Note: “mvn clean install” will create both RPM & deb packages. To build only one package, go to install module (/opt/kylo-master/install/) and execute the below command after building all other modules:

Copying Deb

Copy deb from “/opt/kylo-master/install/target/deb/kylo-x.x.x-SNAPSHOT.deb” to “/opt/kylo/setup” using the below command:

Creating Users and its Groups

  • Create the following users:
    useradd -r -m -s /bin/bash nifi
    useradd -r -m -s /bin/bash kylo
    useradd -r -m -s /bin/bash activemq
    useradd -r -m -s /bin/bash elasticsearch


  • Check whether groups were created for the above users in “/etc/group”.


  • If not, create groups for the users by executing the below command:

Installing kylo.deb

Install kylo.deb that has packaged whole setup in it using the below command:


Downloading Binary Files

To download all required binary files (JDK, Elasticsearch, ActiveMQ, Apache NiFi) locally, run the below script:


These files will be added to the below directories with different user privileges.

  • Directory: /opt/kylo/


  • Directory: /opt/kylo/setup


Setting up Binary Files

  • Run the below script to setup JDK, Elasticsearch, ActiveMQ, and NiFi.
  • To download it offline, run the below script:


    • Before executing the above script, ensure that spark home is setup.
    • During setup, perform the following:
      • Choose MySQL and carefully provide connection details (host: localhost, username: root, password: hadoop).
      • Give “y” 3 times to install Elasticsearch, ActiveMQ, and NiFi.
      • Choose Java option [3] and provide home “/opt/java8”.

The below image is provided for reference:


Once setup wizard is completed, the below services will be added:

      • ActiveMQ
      • Elasticsearch
      • kylo-service
      • kylo-spark-shell
      • kylo-ui
      • NiFi


Note: Manually install the services if any of the services is not installed.
For example, Nifi: cd /opt/nifi/current/bin/; ./nifi start

Optional Step: Run the below SQL scripts to create all needed tables and its data.


  • Check whether the tables are created in MySQL databases. If not, execute the below command:

Starting Server

To start the server (kylo-ui, kylo-services, kylo-spark-shell), execute the below script:


Checking Service Status

  • Check status of all services using the below script:
  • Run the below script to check the Kylo All Services:



  • Run the below script to check Nifi service:
  • Run the below script to check ActiveMQ service:


  • Run the below script to check Elasticsearch service:


Accessing UI

  • Open Kylo UI by accessing the URL: http://{IP}:8400/
  • Provide login credentials as “Username / password: dladmin / thinkbig”




ActiveMQ is not Running

Problem: ActiveMQ is not running and shows the below error:


The problem is caused as ActiveMQ reads the JAVA_HOME file only from any of the below locations where ever if finds the file first even if the file is defined in /etc/environment.

  • /etc/default/activemq
  • $HOME/.activemqrc
  • $INSTALLDIR/apache-activemq-/bin/env

Solution: Add “JAVA_HOME=/opt/java8” in the first line of the file “/etc/default/activemq” and start it.


Elasticsearch is not Running

Problem: Elasticsearch is not running and shows the below error when trying to start:


This problem is caused as JAVA_HOME setup alone is done in “root” user environment. But, Elasticsearch will run in “elasticsearch” user.

Solution: Add “JAVA_HOME=/opt/java8” in the first line of the file “/etc/init.d/elasticsearch” and restart it.


Else, install Java using apt-get. When running on Ubuntu or Debian, the package comes with OpenJDK due to licensing issues. To fix this Java path problem, run the below command:

Kylo-spark-shell is not Running

Problem: Kylo-spark-shell is not running and shows the below error in the file “/var/log/kylo-services/kylo-spark-shell.err”.


Solution: Add environment variables such as JAVA, Spark, and Scala (if possible all variables) in “/etc/profile” and make sure that the environment variables are set for all users.


Kylo-alerts-default is not Compiling

Problem: Kylo-alerts-default is not compiling and throws the below error:


Solution: Make sure that jdk1.8.92 or above is configured. Else, module “kylo-alerts-default” will not be compiled.

Integrating with Hortonworks

  • Login into namenode server and execute the below commands to add users into HDFS:
    • For kylo-service node
    • For namenode / masternode
  • Change metastore configuration in property file “/opt/kylo/kylo-services/conf/”.
    • hive.datasource.url=jdbc:hive2://xxxxxxxx:10000/default
    • hive.datasource.username=hive
    • hive.datasource.password=hive
    • nifi.service.hive_thrift_service.database_connection_url=jdbc:hive2://xxxxxxxx:10000/default
  • Restart server.


Kylo is a feature-rich data lake platform built on Apache Hadoop and Spark. Now, you can successfully setup Kylo.

In the upcoming blog, we will discuss about changing NiFi component’s configurations (for example, HiveThriftConnection and so on) of existing template and creating new templates.


12643 Views 3 Views Today
  • M Baig

    Very informative.
    Can Kylo be installed on Ubuntu 14.04

    • Treselle Systems Blog

      Thanks!!! We have installed Kylo on AWS Linux Ubuntu 16.x version. Hope, it should work on Ubuntu 14.04 too.

  • Future

    I have followed the above steps and able to launch the kylo UI but unable to login in it by the passwd given here as “Username / password: dladmin / thinkbig”. Can you please suggest me the password through which i can login or where I can get it from mysql DB??

    Thanks in Advance!!!

  • Rohit Kumar

    is it necessary to run HDP or clouderra sandbox in my system or it can work with standalone spark application as well


    Unit elasticsearch.service could not be found.
    Timeout reached
    elasticsearch service did not start within a reasonable time. Please check and start it. Then, execute this script manually before starting Kylo.
    This script will create Kylo indexes in Elasticsearch.
    NOTE: If they already exist, an index_already_exists_exception will be reported. This is OK.
    /opt/kylo/setup/../bin/ localhost 9200 1 1
    Elasticsearch index creation complete