Event Queue is the mechanism by which different servers in a Sitecore cluster communicate with each other. An event that is generated on one server is relayed to other servers through the Event Queue, so that the remaining servers can replay the same event. For instance, if an item is saved in one CM server, the other servers are notified of this event so that they can take whatever necessary steps they need to take – such as updating their cache, updating the index, etc. In a nutshell, Event Queue is the heart of a scalable Sitecore solution. If you are a newbie on this topic, I would highly recommend reading some of the below blog posts:
On a Sitecore solution having high volume of activity – such as content authoring, publishing, indexing, etc. – the number of records that get into this table is way too high. If you talk to some folks in Sitecore, they would say that you should ideally not have more than a couple thousand records in that table. However that is very far from reality. It is quite normal for the number of records in this table to go beyond 100,000 in a single day. I have seen it going up to 400,000+ records in a single day (mostly during peak hours) in our Sitecore implementation!
Couple of challenges that we have seen with such a high volume of data entry into this table are:
- Frequent SQL lock escalations when multiple threads attempt to read, insert and/or delete from this table. There have been scenarios where we had to kill the SQL process holding the lock for event processing to resume.
- Some events not being processed, which in turn results in environment inconsistency – cache not updated, index not updated, cache not getting cleared after a publish.
In this post, I would like to describe few approaches that you can consider adopting to reduce pressure on Event Queue table, and thereby bring about better environment consistency.
Remove unwanted entries from Event Queue table
We noticed that about 70% of the records that go into the Event Queue table is the PropertyChangedRemoteEvent, which is generated whenever there is a change to a property in the Properties table. The Sitecore Content Search stores metadata about indexes on each server in this table, and so if you have frequent index updates, the number of PropertyChangedRemoteEvent records in this table will grow pretty quickly.
Since the server specific index metadata need not be propagated to other servers, writing this data to the Properties table can be completely avoided. Sitecore provides a patch (# 417664) to alleviate this problem. Details of this patch can be found at https://kb.sitecore.net/articles/930920. There are two solutions provided, viz.
- Reduce the number of PropertyChangedRemoteEvent records written to Event Queue table. This solution is in turn dependent on another patch (# 420602) – https://kb.sitecore.net/articles/623004
- Write the PropertyChangedRemoteEvent records to local file system instead of Event Queue table
You can choose either of the two solutions, but I would probably recommend going with solution #1 as that appears to be the more elegant solution to me. If you decide to go with solution #2, remember to cleanup older files from the file system using the Sitecore.Tasks.CleanupAgent job.
Run Event Queue cleanup at one place
If you have a multi-server environment, there is no need for the Sitecore Event Queue cleanup agent (Sitecore.Support.Tasks.CleanupEventQueue) to run on all the machines. Multiple servers trying to issue DELETE statements on this table can result in SQL LOCK escalations. So you can configure it to run on just one server – maybe your dedicated publishing instance, if you have one!
Granular cleanup of Event Queue records
The Event Queue cleanup agent that comes default with Sitecore only provides you the ability to remove records older than “x” days, i.e., the smallest time interval you can give is days.
<agent type="Sitecore.Tasks.CleanupEventQueue, Sitecore.Kernel" method="Run" interval="04:00:00"> <DaysToKeep>1</DaysToKeep> </agent>
This means that you will have at the least one day’s worth of data in this table. On a high volume Sitecore application, this can result in performance issues when working with the Event Queue table. In addition, the cleanup agent now has to do lot more work and the DELETE statement it executes can result in lock escalations.
In our case, Sitecore Support provided us a patch (# 392673) that enables you to have granular control over the Event Queue retention period. With the patch, we were able to configure the Cleanup Agent as follows:
<agent type="Sitecore.Support.Tasks.CleanupEventQueue, Sitecore.Support.392673" method="Run" interval="01:00:00"> <param desc="MinutesToKeep">60</param> </agent>
Cleanup Event Queue records through a SQL job
The third approach for cleaning up Event Queue records is to move this to a SQL job and disable the agents in Sitecore. With this, you get better control of the SQL statements that you execute – for instance, you can tune your DELETE statement so as to reduce lock escalations as described at https://support.microsoft.com/en-us/kb/323630. Below is an approach that we used, wherein you limit the number of records being deleted at a time to a small value and thereby reducing chances of lock escalations.
DECLARE @DeletionDate DATETIME SET @DeletionDate = DATEADD(HOUR, -1, GETDATE()) WHILE @@ROWCOUNT <> 0 BEGIN DELETE TOP(500) FROM dbo.EventQueue WHERE [Created] < @DeletionDate END
Replicate only INSERT operations
If your deployment topology has an isolated CM and CD environment with Core database being replicated, then all Event Queue transactions that happen in your CM Core database get replicated to the CD Core database. While this is fine for any inserts happening to this table, there is no point in replicating deletes (or even updates) to the CD side, as this will take a toll on the replication throughput. Since the only way deletes happen to the Event Queue table is through the cleanup job, you can consider running the cleanup job separately on the CD Core database and turn off DELETE statements replication in SQL Server replication configuration.
Increase the Event Queue processing interval
And finally, you can also consider increasing the event queue processing interval so as to provide sufficient time for Sitecore to process any pending events – maybe to 5 or 10 seconds from the default 10 seconds.
<eventQueue> <processingInterval>00:00:05</processingInterval> </eventQueue>
Do keep in mind that this will result in delays for your servers in the cluster to be in sync.
Hope this post helps you in handling any Event Queue relates issues during high load, and thereby a more consistent Sitecore cluster !!