Text Normalization with Spark – Part 1

Numerous methods such as text mining, Natural Language Processing (NLP), information retrieval, and so on, exist for analyzing unstructured data. Due to the rapid growth of unstructured data in all kinds of businesses, scalable solutions have become the need of the hour. Apache Spark is equipped with out of the box algorithms for text analytics, and it also supports custom development of algorithms that are not available by default. In this blog post, our main goal is to perform basic text normalization using simple regular expression technique with Apache Spark and then decipher Spark stages, jobs, and DAG’s in the next blog post.

read more

Text Normalization with Spark – Part 1

Big Data Pipeline Architectures

Prior to jumping on the big data adventure, it is important to ensure that all essential architecture components required to analyze all aspects of the big data set are in place. Understanding the high level view of this reference architecture provides a good background of Big Data and how it complements existing analytics, BI, databases and systems. Treselle has solved interesting Big Data problems that required different types of architectures most apt for that particular business use case.

read more

Big Data Pipeline Architectures

Data Matching – Entity Identification, Resolution & Linkage

Data matching is the task of identifying, matching, and merging records that correspond to the same entities from several source systems. The entities under consideration most commonly refer to people, places, publications or citations, consumer products, or businesses. Besides data matching, the names most prominently used are record or data linkage, entity resolution, object identification, or field matching.

read more

Data Matching – Entity Identification, Resolution & Linkage

Big Data Pipeline Architectures

Prior to jumping on the big data adventure, it is important to ensure that all essential architecture components required to analyze all aspects of the big data set are in place. Understanding the high level view of this reference architecture provides a good background of Big Data and how it complements existing analytics, BI, databases and systems. Treselle has solved interesting Big Data problems that required different types of architectures most apt for that particular business use case.

read more

Big Data Pipeline Architectures

Big Data Pipeline Architectures

Big Data Pipeline Architectures

Prior to jumping on the big data adventure, it is important to ensure that all essential architecture components required to analyze all aspects of the big data set are in place. Understanding the high level view of this reference architecture provides a good background of Big Data and how it complements existing analytics, BI, databases and systems. Treselle has solved interesting Big Data problems that required different types of architectures most apt for that particular business use case.

Data Matching – Entity Identification, Resolution & Linkage

Data matching is the task of identifying, matching, and merging records that correspond to the same entities from several source systems. The entities under consideration most commonly refer to people, places, publications or citations, consumer products, or businesses. Besides data matching, the names most prominently used are record or data linkage, entity resolution, object identification, or field matching.

read more

Data Matching – Entity Identification, Resolution & Linkage

Data Matching – Entity Identification, Resolution & Linkage

Data Matching – Entity Identification, Resolution & Linkage

Data matching is the task of identifying, matching, and merging records that correspond to the same entities from several source systems. The entities under consideration most commonly refer to people, places, publications or citations, consumer products, or businesses. Besides data matching, the names most prominently used are record or data linkage, entity resolution, object identification, or field matching.