Introducing the #Azure #CosmosDB Change Feed Processor Library


Share on Facebook1Share on Google+0Share on LinkedIn0

Introducing the #Azure #CosmosDB which Change Feed Processor Library

Azure Cosmos DB is a fast and flexible globally-replicated database service which is used for storing high-volume transactional and an operational data with predictable millisecond latency for both reads and writes. To help you to build powerful applications on top of the Cosmos DB, we built change feed support, which provides a sorted list of all the documents within a collection in the order in which they were modified. Now, to address the scalability while preserving the simplicity of use, we introduce the Cosmos DB (DataBase) Change Feed Processor Library. In this blog, we look at when & how you should use Change Feed Processor Library.

Change feed: Event Sourcing with the Cosmos DB

Storing your data is just the beginning of the adventure. With the change feed support, you can integrate with many different services depending upon what you need to do once changes appear.

Example #1: You are building an online shopping website and the need to trigger an email notification once a customer completes a purchase. Whether you prefer to use the Azure Functions, Azure App Services, Azure Notification Hub, or your custom-built micro services, change the feed which allows seamless integration by surfacing the changes in the order that they occur.

Example #2: You are storing data from an autonomous vehicle and you need to detect the abnormalities in the incoming sensor data. As the new entries are stored in the Cosmos DB, then these changes will appear on the change feed can be directly processed by Azure HDInsight, Apache Storm, or Apache Spark. With the change feed support, you can apply for the intelligent processing in real-time while data is stored into Cosmos DB.

Example #3: Due to architecture changes were done, you need to change the partition key for your Cosmos DB collection. Change feed will allow you to move your data to a new collection. While processing for the incoming changes. The result will be zero down time while you move data from anywhere to the Cosmos DB.

What about working with the larger data storage with multiple partitions?

As your data storage needs grow, it’s likely that you will use only multiple partitions to store your data. Although it is possible to manually read changes from each partition, the Change Feed Processor will make it easier by abstracting the change feed API. This function will facilitate the reading across the partitions and will distribute change feed event processing across multiple consumers. This library provides a thread-safe, safe runtime environment, multi-process, with checkpoint and partition lease management for change feed operations. The Change Feed Processor Library will be available as a NuGet package for the .NET development.

When to use the Change Feed Processor Library:

  • Pulling updates from the change feed when the data is stored across the multiple partitions.
  • Moving or replicating the data from one collection to the another one.
  • Parallel execution of actions triggered by the updates to data and the change feed.

Getting started with the Change Feed Processor Library will be simple and lightweight. In the below following example, we have a collection of documents which contains all the news events associated with different cities. We use the “city” as the partition key. In just a few steps ahead, we can print out all the changes made to any of the document from any partition.

To set this up, install the Change Feed Processor Library NuGet package and to create a lease collection. The lease collection must be created through an account close to the write region. This collection will keep track of any change feed reading the progress as per the partition and host information.

To define the logic performed when new changes surface, edit the ProcessChangesAsync function. Here, we are simply printing out all the document ID of the new or updated document. You can also modify this function for performing all the different tasks.

Next, to begin the Change Feed Processor, instantiate ChangeFeedProcessorHost, which provides the appropriate parameters for your Azure Cosmos DB collections. Then, call the RegisterObserverAsync to register your IChangeFeedObserver (DocumentFeedObserver in this example) implementation with the runtime. At this point, the host attempts you to acquire a lease on every partition key range in the Azure Cosmos DB collection using a “greedy” algorithm.

These leases last for a given timeframe and must then be renewed. As the new nodes come online, in this case, worker instances, they will place lease reservations. Over time the load may shifts between nodes as each host attempt to acquire more leases.