Prior to jumping on the big data adventure, it is important to ensure that all essential architecture components required to analyze all aspects of the big data set are in place. Understanding the high level view of this reference architecture provides a good background of Big Data and how it complements existing analytics, BI, databases and systems. This architecture is not a fixed, one-size-fits-all approach. Each component of the architecture has at least several alternatives with its own advantages and disadvantages for a particular use case.
The ability of an organization to realize business value from big data relies on the organization’s ability to easily and quickly:
- Identify the right source of data
- Define the analytics required to extract the value & insights
- Bring the data into an analytics environment for advanced analytics and data science activity
- Curate the data where it is ready for analysis
- Design & Architect the required infrastructure to support the analytics in accordance with the desired performance and throughput requirements
- Execute the analytic models against the curated data to derive business insights
- Deliver the analytic results in an actionable manner to the business
Architecting a big data pipeline should consider business needs and data strategy so that appropriate tools & technologies can be put in place rather than forcing it to be built on a particular ecosystem. Treselle’s deep expertise in big data pipeline architecture has solved interesting big data problems and use cases. Some of the architecture proposal and implementations by Treselle are listed below.