From monolithic databases via distributed systems to data in real time: stakes, opportunities, use-cases and technologies.



In our data-driven age, the ability to ingest all types of data in real time is henceforth critical when it comes to unlock valuable and actionable insights from data sets.

As you can imagine or you know, the IT industry is moving from monolithic databases to distributed systems.

For those who unfamiliar, Connectikpeople.co recalls that when it comes to monolithic databases, we talk about: primary place where people store and process the most interesting data; a primary place where more features are accumulated, where databases become more complicated and it gets harder to add new features while still maintaining all the legacy ones.

While the distributed systems like (HDFS) and a computation engine (MapReduce) overcome these limitations providing respectively inter alia: a distributed file system and a computation engine for storing and processing data in batches.

In fact, in this momentum, by using HDFS, companies can now afford to collect additional data sets that are valuable, but are too expensive to store in databases.
 By using MapReduce, people can generate reports and perform analytics.

But there is still a problem: ingest all types of data in real time.
Apache Kafka comes into play with the following features:
  • Can store high volume of data on commodity hardware,
  • It's a multi-subscription system,
  •  The same published data set can be consumed multiple times,
  • can deliver messages to both real-time and batch consumers at the same time without performance degradation,
  • Can be used to provide the reliability needed for mission critical data, and more.
Uber, Twitter, Netflix, LinkedIn, Yahoo, Cisco, and Goldman Sachs use Kafka as a central place to ingest all types of data in real time. 

Connectikpeople.co recalls that a data platform powered by distributed pub/sub systems like Kafka will play an important role in the Bigdata eco system as more companies are moving towards more real-time processing. 

The following specialized systems enable companies to derive new insights and build streamlined applications.
  • Key/value stores: Cassandra, MongoDB, HBase, etc.
  • Search: Elastic search, Solr, etc.
  • Stream processing: Storm, Spark streaming, Samza, etc.
  • Graph: GraphLab, FlockDB, etc.
  • Time series: Open TSDB, etc.

Popular Posts