Apache Falcon Data Pipeline with Apache Atlas Lineage

In this blog article, Apache Falcon is used to centrally define data pipelines. Few definitions are used to auto-generate workflows in Apache Oozie.
As Apache Falcon dataflows are sank with Apache Atlas through Kafka topics, Atlas can manage Falcon metadata. Atlas provides Falcon feed lineage and provides the details of the tables & its source tables.

read more

Apache Falcon Data Pipeline with Apache Atlas Lineage

MongoDB Shard – Part I

Sharding is a method for distributing data across multiple machines. It supports deployments with very large data sets, high throughput operations, and horizontal scaling. It shards data at the collection level and distributes the collection data across the shards in the cluster. For aggregation operations running on multiple shards, if the operations do not require running on the database’s primary shard, these operations can route the results to any shard so as to merge the results. Thus, avoiding overloading of the primary shard for that database. It divides the data across multiple servers and reduces the amount of data stored by each server. A shard cluster can have shard/non-shard collections without causes.

read more

MongoDB Shard – Part I

Dynamic Jasper Reports Automated in Talend

Dynamic Jasper is a great tool for designing and creating simple or complex dynamic reports. Talend is not only used as the most common tool for data transformation. It is also used for dynamic Jasper report generation using tJasperInput component.

Automation with context parameters is the most important value add to it. It helps to resolve many challenges involved in dynamic report creation such as on the fly changes like column name, report header, date, and so on. It helped developers in saving the report generation time.

read more

Dynamic Jasper Reports Automated in Talend

Hive Streaming with Kafka and Storm with Atlas

With the release of Hive 0.13.1 and HCatalog, a new Streaming API was released to support continuous data ingestion into Hive tables. This API is intended to support streaming clients like Flume or Storm to better store data in Hive, which traditionally had batch oriented storage.

In our use case, we are going to use Kafka with Storm to load streaming data into bucketed Hive table. Multiple Kafka topics produce the data to Storm that ingests the data into transactional Hive table. Data committed in a transaction is immediately available to Hive queries from other Hive clients. Apache Atlas will track the lineage of Hive transactional table, Storm (Bolt, Spout), and Kafka topic, which will help us to understand how data is ingested into the Hive table.

read more

Hive Streaming with Kafka and Storm with Atlas

Microservices – Rules of Engagement

Rules of Engagement is a set of principles and best practices that are valuable to follow to ensure an efficient microservices environment. Treselle Systems captured these rules while going thru the book “Microservices From Day One” by Cloves Carneiro Jr & Tim Schmelmer. Guiding principles like these help the development team spend less time thinking about high-level architecture, and more time writing business-related code and providing value to stakeholders. Check out how one of our microservice architecture implementation is inline with these Rules of Engagement

read more

Microservices – Rules of Engagement

Apache Falcon Data Pipeline with Apache Atlas Lineage

In this blog article, Apache Falcon is used to centrally define data pipelines. Few definitions are used to auto-generate workflows in Apache Oozie.
As Apache Falcon dataflows are sank with Apache Atlas through Kafka topics, Atlas can manage Falcon metadata. Atlas provides Falcon feed lineage and provides the details of the tables & its source tables.

read more

Apache Falcon Data Pipeline with Apache Atlas Lineage

MongoDB Shard – Part I

Sharding is a method for distributing data across multiple machines. It supports deployments with very large data sets, high throughput operations, and horizontal scaling. It shards data at the collection level and distributes the collection data across the shards in the cluster. For aggregation operations running on multiple shards, if the operations do not require running on the database’s primary shard, these operations can route the results to any shard so as to merge the results. Thus, avoiding overloading of the primary shard for that database. It divides the data across multiple servers and reduces the amount of data stored by each server. A shard cluster can have shard/non-shard collections without causes.

read more

MongoDB Shard – Part I

MongoDB Shard – Part I

MongoDB Shard – Part I

Sharding is a method for distributing data across multiple machines. It supports deployments with very large data sets, high throughput operations, and horizontal scaling. It shards data at the collection level and distributes the collection data across the shards in the cluster. For aggregation operations running on multiple shards, if the operations do not require running on the database’s primary shard, these operations can route the results to any shard so as to merge the results. Thus, avoiding overloading of the primary shard for that database. It divides the data across multiple servers and reduces the amount of data stored by each server. A shard cluster can have shard/non-shard collections without causes.

Dynamic Jasper Reports Automated in Talend

Dynamic Jasper is a great tool for designing and creating simple or complex dynamic reports. Talend is not only used as the most common tool for data transformation. It is also used for dynamic Jasper report generation using tJasperInput component.

Automation with context parameters is the most important value add to it. It helps to resolve many challenges involved in dynamic report creation such as on the fly changes like column name, report header, date, and so on. It helped developers in saving the report generation time.

read more

Dynamic Jasper Reports Automated in Talend

Dynamic Jasper Reports Automated in Talend

Dynamic Jasper Reports Automated in Talend

Dynamic Jasper is a great tool for designing and creating simple or complex dynamic reports. Talend is not only used as the most common tool for data transformation. It is also used for dynamic Jasper report generation using tJasperInput component.

Automation with context parameters is the most important value add to it. It helps to resolve many challenges involved in dynamic report creation such as on the fly changes like column name, report header, date, and so on. It helped developers in saving the report generation time.

Hive Streaming with Kafka and Storm with Atlas

With the release of Hive 0.13.1 and HCatalog, a new Streaming API was released to support continuous data ingestion into Hive tables. This API is intended to support streaming clients like Flume or Storm to better store data in Hive, which traditionally had batch oriented storage.

In our use case, we are going to use Kafka with Storm to load streaming data into bucketed Hive table. Multiple Kafka topics produce the data to Storm that ingests the data into transactional Hive table. Data committed in a transaction is immediately available to Hive queries from other Hive clients. Apache Atlas will track the lineage of Hive transactional table, Storm (Bolt, Spout), and Kafka topic, which will help us to understand how data is ingested into the Hive table.

read more

Hive Streaming with Kafka and Storm with Atlas

Hive Streaming with Kafka and Storm with Atlas

Hive Streaming with Kafka and Storm with Atlas

With the release of Hive 0.13.1 and HCatalog, a new Streaming API was released to support continuous data ingestion into Hive tables. This API is intended to support streaming clients like Flume or Storm to better store data in Hive, which traditionally had batch oriented storage.

In our use case, we are going to use Kafka with Storm to load streaming data into bucketed Hive table. Multiple Kafka topics produce the data to Storm that ingests the data into transactional Hive table. Data committed in a transaction is immediately available to Hive queries from other Hive clients. Apache Atlas will track the lineage of Hive transactional table, Storm (Bolt, Spout), and Kafka topic, which will help us to understand how data is ingested into the Hive table.

Microservices – Rules of Engagement

Rules of Engagement is a set of principles and best practices that are valuable to follow to ensure an efficient microservices environment. Treselle Systems captured these rules while going thru the book “Microservices From Day One” by Cloves Carneiro Jr & Tim Schmelmer. Guiding principles like these help the development team spend less time thinking about high-level architecture, and more time writing business-related code and providing value to stakeholders. Check out how one of our microservice architecture implementation is inline with these Rules of Engagement

read more

Microservices – Rules of Engagement

Microservices – Rules of Engagement

Microservices – Rules of Engagement

Rules of Engagement is a set of principles and best practices that are valuable to follow to ensure an efficient microservices environment. Treselle Systems captured these rules while going thru the book “Microservices From Day One” by Cloves Carneiro Jr & Tim Schmelmer. Guiding principles like these help the development team spend less time thinking about high-level architecture, and more time writing business-related code and providing value to stakeholders. Check out how one of our microservice architecture implementation is inline with these Rules of Engagement