Database

Overview

For years, the relational model has been the de facto option for data problems big and small. We don’t expect relational databases to fade away and they will continue to have a huge role in serving business’ needs. However, enterprises are emerging from the relational databases fog to discover alternate options, such as schema-less or alternate data structures, simple replication, high availability, horizontal scaling, and new query methods. It’s very important to remember that enterprises can solve most of their data challenges by one or a combination of multiple types of databases. The question is which database or combination of databases is best suitable for your problem space and usage patterns. We have the expertise across the spectrum of databases including relational, columnar, graph, search, document, and in-memory. We have executed a number of data management projects for several clients across a diverse set of industries and situations. This is a pivotal time in the database world. It is important to match the right database for the right data challenge. We can help. Contact us.

RESULTS

Retail Client Goes Geospatial with Mongo DB

One of our clients had legacy geospatial data in MySQL. This did not meet their new functionalities and capabilities as lot of code had to be written to get the geo data and convert them into GeoJSON standards.

Treselle’s engineering team did a thorough study on various geospatial frameworks, which included ESRI, Google Places, QGIS, PostGIS, Maptitude, and few others. We chose MongoDB for geospatial capabilities as it supports GeoJSON open-source specification and offers good community support. It offers scalability, high availability and is cost-effective. Our client’s requirements did not warrant complex geospatial functionalities as found in commercial packages and thus MongoDB fit the bill. This choice enabled them to store richer data and index on LineString, Polygon, Multipoint, MultiLineString, MultiPolygon, and Geometry, which are very helpful to plot proximities, trade areas, intersections, inclusions, and others.

Business benefits delivered:

  • Savings of approximately $25,000 annually by not using commercial framework.
  • GeoJSON open source specification standards.
  • Ability to scale to large geospatial needs due to MongoDB scaling capabilities.
  • No special resource skills or training needed as the team was well versed with MongoDB and JSON.
  • Quick turnaround time to go to market.
  • Feature rich capabilities that includes – Inclusions to get all retails in a state, Intersection to get all neighboring states of a particular state, and Proximity to get all retails within certain mile radius.
Investment Firm Chooses Cassandra for Timeseries Data

One of our clients managed stock and other financial related data in MySQL and found it difficult to perform statistical calculations as the SQL queries were becoming complex and lot of tables had to be created to duplicate the data in different dimensions, which consumed lot of disk space. Our client was horizontally scaling this database by adding more disk space and memory to accommodate this sort of datasets. The queries and stored procedures were becoming very slow and complex and lot of data had to be filtered and thrown away to make everything faster. Our client finally gave up and wanted to find a better solution to this problem.

Treselle’s engineering team understood the problem and implemented the above datasets in Cassandra as it has high write throughput so that same data can be written in different column families for different needs. We also modeled the key space with variety of design patterns that includes composite partition key, clustering columns, counter columns, day/month/quarter/half yearly/annual roll ups, dynamic column families, expiring columns, and many other techniques for properly partitioning the data across the cluster so that reads will be faster for different time intervals.

Business benefits delivered:

  • Reduced time significantly from several hours to 15 minutes to insert 2 million rows daily with all pre-computations to store materialized views.
  • Improved speed of data retrieval by an order of magnitude 50x with simplified CQL queries compared to complex SQL queries.
  • Created the ability to grow data as wide as possible as part of Cassandra’s columnar storage.
  • Reduced administrative cost by 75% with Cassandra’s auto clustering feature.
  • Reduced lines of code from 13,000 to 2,500 and thus reduced cost on code management.
US Client Saves $880,000 on Oracle License Cost
One of our clients wanted active-active high availability set up for their infrastructure needs but did not have the budget for Oracle RAC license. The Oracle applications on top of the Oracle database were not supported for High Availability other than Oracle RAC. Treselle team of engineers developed a custom failover mechanism using Oracle Streams Technology to sync multiple databases in active-active mode and made sure the applications on top of them were able to distribute the queries across these active databases. Multiple proactive measures and monitoring capabilities were put in place to make sure the infrastructure was operating in high availability mode and a separate standby database ensured backup and recovery.
Business benefits delivered:

  • Saved client $880,000 in Oracle RAC license cost by choosing Oracle Streams.
  • Offered the ability to operate in high availability mode from all the layers of the infrastructure increased revenue opportunities.
  • Introduced the capability to add more database servers in the syncing process to distribute the load during peak season.
  • Ability to run databases with read only mode enabled the client to process complex reporting applications on real time data without impacting the performance of the online databases.
Global Client Migrates from Hadoop/Pig to MySQL
Treselle team of engineers were challenged to migrate a multi-node Hadoop/Pig cluster that was responsible for loading and processing one of the datasets to MySQL to save infrastructure cost, and fill in the skillset gaps to maintain and manage. The client and Treselle team evaluated the set up and the current data set processing and decided to move away from Hadoop, as it was not ideal for such a small dataset that was processing around 40 GB worth of data. The project involved migrating around 80 Pig scripts and a dozen UDF functions into MySQL stored procedures with only one instance. This took less time to process the data than in Hadoop ecosystem.
Business benefits delivered:

  • Saved thousands of dollars annually on Amazon EC2 instances by shutting down multi-node cluster and moved to MySQL.
  • Enabled the client to find the talent easily to manage MySQL instead of Hadoop ecosystem in an economical way.
  • Enabled the client’s business analyst team to directly get the needed data from the database instead of waiting for the Hadoop/Pig specialist to provide the needed reports.
  • Reduced the dataset analysis processing time by 50% compared to Hadoop/Pig execution time.

TECHNOLOGY EXPERTISE

  • Relational Databases: Oracle, MySQL, Sybase
  • Columnar Databases: Cassandra, HBase, Redis
  • Graph Databases: Neo4j
  • Search Databases: Solr, ElasticSearch
  • Document Databases: MongoDB
  • In-Memory Databases: Hazelcast, Memcache, VoltDB

Capabilities

  • Physical and logical database design
  • Infrastructure setup
  • Database migration
  • Database cluster implementation
  • Database replication/mirroring setup
  • Database administration & monitoring
  • Relational database modeling
  • NoSQL database modeling
  • Database PL/SQL programming
  • NoSQL query language
  • Database consolidation
  • Indexing, sharding, & partitioning

LATEST THINKING

TALK TO US

How can we help you accomplish your goals?