Current limitations with Sitecore PaaS on Azure

We are currently in the process of evaluating Sitecore PaaS (Platform-as-a-Service) on Azure for one of our customers. It is in fact a migration of large multi-site Sitecore implementation – scaled out CM tier, scaled out CD tier, dedicated publishing and indexing server, SolrCloud and having ~50 websites per environment. It is still early days in our evaluation, but I thought of sharing my experience so far and the limitations that I see with Sitecore PaaS.

Note: the limitations mentioned here are in the context of our Sitecore application and its architecture. I do not necessarily consider them as Sitecore limitations, but something to support in upcoming versions.

No Support for Multiple Publishing Targets

By default, you only have one publishing target supported, i.e., the Web database. If your application uses multiple publishing targets (ex: Staging and Live), then there is no provision to spin up an environment with such a deployment configuration. This is similar to the default Sitecore installation on premise.

Scalability Challenges in CM Tier

All Sitecore PaaS deployment configurations (xM1 through xM5) creates only a single CM server in the Basic (B2) service plan. As you may know, Basic plan does not provide Auto Scale and Load Balancer. So there is no way for you to scale up or scale our the CM tier automatically. You will need to manually change the service plan.

Also, B2 instance provides only 2 cores, 3.5 GB RAM and 10 GB of storage. This will definitely not be enough for large Sitecore installations such as ours.

No Support for Dedicated Publishing Instance

Due to the large volume of content authoring and publishing in our Sitecore application, we have a dedicated Publishing Instance so as to have a better content authoring performance. However Sitecore PaaS does not support this today, i.e., having a separate set of config files on one of the CM servers to designate it for publishing. As mentioned above, it anyway doesn’t support multiple CM servers.

No Support for Dedicated Indexing Instance

We use Solr Cloud today as our indexing solution, and in such instances, you would want one of your servers acting as the dedicated Indexing Instance that is responsible for sending indexing requests to Solr Cloud. This is typically done again through a separate set of config files on one of the CM servers to designate it for indexing. This is also not supported by Sitecore PaaS today. Again it anyway doesn’t support multiple CM servers.

And yes, we intend to use Solr Cloud on IaaS mode rather than Azure Search.

There may be others that we may uncover as we progress along. But I do believe that all of these limitations can however be overcome if you customize the ARM templates used by Sitecore PaaS for your deployment topology. We intend to do this and I will share our approach and experience in an upcoming post.

Advertisements

Securing Solr and authenticating from Sitecore

It may be a common requirement to not have your Solr endpoints and its Admin client to be open for anyone to access; you would want it to be secured. And As you may already know, you can secure access to Solr by configuring the different authentication plugins available such as the Basic Authentication plugin and the Kerberos Authentication plugin. However the challenge was that Sitecore did not have any built-in capability to connect to a secured Solr (atleast as far as I am aware)

In this post, I will walk you through the steps required for enabling basic authentication in Solr – but I will take a different approach, i.e., configure basic authentication with the built-in Jetty HTTP server. And then I will walk through the steps for configuring Sitecore’s search provider to use this basic authentication. The steps mentioned here were tried with Solr 5.2.1 and Sitecore 8.2, update 1.

Enable Basic Authentication in Solr (Jetty)

Configure endpoints, auth type, security realm and role

Open “<SolrRoot>/server/etc/webdefault.xml” file and make the following changes towards the end of the file:

  • Configure the endpoints that need to be secured along with which security role(s) can have access to those endpoints
<security-constraint>
  <web-resource-collection>
    <web-resource-name>Authenticated Solr</web-resource-name>
    <!-- Define the endpoint URL pattern that need to be secured. In the below case, all endpoints including the Admin site is secured. -->
    <url-pattern>/</url-pattern>
  </web-resource-collection>
  <auth-constraint>
    <!-- Define the security role that can access the secured endpoints -->
    <role-name>my-solr-user</role-name>
  </auth-constraint>
</security-constraint>
  • Configure the authentication type to be used for the application as well as the security realm being used. A realm is a repository where users, passwords and roles are stored.
<login-config>
  <!-- This is where the authentication method is defined; Basic in this case -->
  <auth-method>BASIC</auth-method>

  <!-- Name of the security realm where users and roles are going to stored -->
  <realm-name>My Solr Realm</realm-name>
</login-config>

Configure security realm for Jetty

Open “<SolrRoot>/server/etc/jetty.xml” file and make the following changes towards the end of the file. This configures the Login service to be used by Jetty, the security realm and the path to the file that contains user credentials and their roles.

<Call name="addBean">
  <Arg>
    <New class="org.eclipse.jetty.security.HashLoginService">
      <!-- Name of the security realm defined earlier -->
      <Set name="name">My Solr Realm</Set>
      <Set name="config">
        <!-- Path to the properties file that stores user, roles and passwords -->
        <SystemProperty name="jetty.home" default="."/>/etc/realm.properties
      </Set>
      <Set name="refreshInterval">0</Set>
    </New>
  </Arg>
</Call>

Define Users and their Roles

Create a new file “<SolrRoot>/server/etc/realm.properties” at the path mentioned earlier and add user details as shown in the example below.

username1:password1,my-solr-user
username2:password2,my-solr-user

Configure Sitecore’s Search Provider

With 8.2, Sitecore has provided the ability to configure the HTTP Web Request Factory to be used by SolrNet when communicating with Solr and this is configured in “sitecore/contentSearch/indexConfigurations” section. By default, the factory used is HttpWebRequestFactory. This can be replaced with BasicAuthHttpWebRequestFactory to configure basic authentication credentials as shown in the example below.

Note: there was a bug in Sitecore 8.2, u1 that was preventing basic authentication to work. We had to install a Sitecore support patch (Sitecore.Support.141324) to get it working. 

<sitecore>
  <contentSearch>
    <indexConfigurations>
      <solrHttpWebRequestFactory set:type="HttpWebAdapters.BasicAuthHttpWebRequestFactory, SolrNet">
        <param hint="username">username1</param>
        <param hint="password">password1</param>
      </solrHttpWebRequestFactory>
    </indexConfigurations>
  </contentSearch>
</sitecore>

Voila! You now how Sitecore communicating with Solr using Basic Authentication.

Sitecore Publish Exclusions Module

This blog post is actually installation, configuration and usage manual for my Publish Exclusions module.

Overview

The Publish Exclusions module provides the capability to configure items and its descendants from being excluded during a publish. This is different from Sitecore’s Publishing Restrictions capability wherein already published items will get un-published. This module however does not un-publish the item if it was already published.

The module supports both the new Publishing Service as well as the legacy publishing approach.

More details about this capability is available at my following blog posts:

Installation and Configuration

Module Installation

  • Install the module (Sitecore.PublishExclusions-2.0.0.zip) as you would install any Sitecore module, i.e., through Development Tools ⇒ Installation Wizard.

Configuring for Publishing Service

  • Extract the content of Sitecore.PublishExclusions.PublishingService.zip to the root folder where you have Publishing Service installed and running. Ensure that DLLs and Configs go into their respective folders.
  • Run the DBScripts\Scripts.sql against your Master database
  • Restart the publishing service

Usage Instructions

    1. Once the module has been installed to your Sitecore instance, you will find it available at “/sitecore/system/Modules/Publish Exclusions”.Sitecore Publish Exclusions Module
    2. Create new “Publish Exclusion” items under the folder “/sitecore/system/Modules/Publish Exclusions/Exclusions Repository”.new publish exclusion item
    3. Configure the “Publish Exclusion” item by specifying the items and its descendants to be excluded from publish for specific publishing targets and publish modes.publish-exclusion-item
      • Publishing Target ⇒ specify the publishing target for which this publish exclusion is applicable. If this needs to be applicable for multiple targets, create multiple “Publish Exclusion” items.
      • Publish Modes ⇒ specify the publish modes for which this exclusion is applicable.
      • Excluded Nodes ⇒ this is where you choose the items and its descendants that needs to be excluded from publish for the current publishing target and publish modes.
      • Exclusion Overrides ⇒ choose descendants of excluded nodes that needs to be published even though its ancestor is excluded. This would be useful if a complete node, except for specific portions of the tree, has to be excluded from publish.

      Since Sitecore requires “/sitecore/system/Languages” to be publishable for each publish, this is never excluded even if you explicitly configure it for exclusion.

    4. In addition, this module also comes with a global configuration item (“/sitecore/system/ Modules/Publish Exclusions/Global Configuration”) to control the behavior of this module.publish exclusion global configuration
      • Return Items to Publish Queue ⇒ if enabled, all items skipped during an incremental publish will be returned back to the Publish Queue, so that the items are again considered for publish during the next incremental publish. Be aware that this setting can result in your Publish Queue growing over a period of time.
      • Show Content Editor Warnings ⇒ if enabled, a warning is shown in the Content Editor if the item is excluded from publish for any publishing target or publish mode.publish exclusion content editor warning

Extensibility

The module has been designed for extensibility if you need to customize the implementation logic – whether you are using the default out-of-box Sitecore publishing approach or you are using the new Publishing Service.

Default Publishing

If you are using Sitecore’s default publishing option, the module comes with a Controller that houses all the business logic for this functionality and a Repository that retrieves Publish Exclusion rules from Sitecore. Key methods in both the Controller and the Repository are defined “virtual” – so you can override it with your custom implementation. Your custom Controller and Repository can also be configured in the include config file, so that all pipeline processors provided by this module will work with your classes as well – as long as they implement the IPublishExclusionsController and IPublishExclusionsRepository interface.

Publishing Service

By virtue of using the new Publishing Service, you will have the ability to go ahead and replace any of the components with your custom ones and then updating the configuration file. All components used by the module are configured in the sc.publishing.services.xml file present in the module zip file, and the key ones are:

  • PublishExclusionsProvider, implementing the IPublishExclusionsProvider interface
  • PublishExclusionsRepository, implementing the IPublishExclusionsRepository interface
  • Publish handlers TreePublishHandler and IncrementalPublishHandler inherit from BaseHandler
  • The stored proecedure Publishing_Delete_ExcludedItemsFromManifest

Sitecore Publishing Service – Database Design

In this post, I would like to describe the database design adopted in Sitecore’s new Publishing Service.

Note: the content of this post, especially the purpose of the database tables explained below, is based on what I found when playing with Publishing Service. This is not officially documented or validated by Sitecore.

Database Schema Changes

Once Publishing Service is installed and configured, the first thing you will notice is that you will notice that it has made changes to all the Sitecore databases. The Master database will have the following tables and stored procedures created.

master-db-tables

(Master DB Tables)

master-db-procedures

(Master DB stored procedures)

Likewise, the Core database as well as each publishing target database (like Web) will also be updated with new stored procedures. These new DB objects are used by the Publishing Service.

Purpose of Database Objects

Let me make an attempt at explaining the purpose of the various database objects created when installing the Publishing Service.

Database Tables

  • Publishing_PublisherOperation ⇒ this is the new table that acts as the publish queue instead of the PublishQueue table. Any content changes are recorded in this table and is used by Publishing Service when doing a Site Publish.publisher-operation
  • Publishing_Data_Params_FieldIds ⇒ this table stores the Ids of the fields of a content item that needs to be considered before deciding whether to publish the item or not. These are fields such as workflow state, publishing restrictions, etc. These are frequently used fields and is used to speed up publishing. This is a static table having about 13 rows with one row for each field.
  • Publishing_JobQueue ⇒ when a publish job is initiated from Sitecore, it is first queued in this table after which the Publishing Service picks up and processes it. This table stores the status of the publish job, the start and end times as well as the publish metadata in XML format (type of publish, root item, languages, related items, child items, etc.). Data from the table is what gets displayed in the publishing dashboard.job-queue
  • Publishing_JobMetadata ⇒ the publish metadata (type of publish, root item, languages, related items, child items, etc.) associated with a publish job is stored in this table.job-metadata
  • Publishing_Data_Params_Languages ⇒ stores the list of languages for which the current publish job is running.
  • Publishing_JobManifest ⇒ this table stores the Id of the Manifests associated with a publish job. There is one manifest created for each publishing target in a publish. For those who are new to Publishing Service, a Manifest is a collection of items that can be published or promoted to the publishing target.job-manifest
  • Publishing_JobManifestStatus ⇒ the current status of a Publish Manifest is stored in this table. When the Manifest is being processed, the status would be “Running” and once processed, the status would be “Complete”.manifest-status
  • Publishing_ManifestStep ⇒ this is the table that stores the individual items in a manifest that needs to be promoted to a publishing target. Every item also has an action that needs to be performed during the publish, viz:
    • PromoteItemVariant – when the current version of the item variant has to be published from the source to the target. This happens when a new item is created, an existing item is modified or renamed and is ready to be published (workflow approved, no publish restrictions, etc.).
    • DeleteItem – this happens when an item has been deleted from the source and needs to be deleted from the target.manifest-step
  • Publishing_ManifestOperationResult ⇒ for each item in the publish manifest, this table stores the result of promotion of the item to the publishing target, along with details of the change done including previous and new values. The content of this table is made available to the publishEndResultBatch pipeline.manifest-operation-result

Stored Procedures

Before talking about stored procedures, it is important to understand the data access design pattern followed within Publishing Service. Every data entity type – such as Items, Nodes, Media Items, Workflow States, Event Queue, Publish Job Queue, etc.  – have their own specific Repository and Provider combination within the Publishing Service. The individual components within the Publishing Service such as the publish handlers, manifest service, promoters leverage the Repository to communicate with the database. The Repository in turn delegates the responsibility to executing specific SQL statements and stored procedures against the database to the corresponding Provider.

data-access-design

So the stored procedures that gets created in the database with Publishing Service are called by the individual providers. Below is a snapshot of the stored procedures that get created across different databases.

master-db-sps

(Master DB SPs)

web-db-sps

(Web DB SPs)

core-db-sps

(Core DB SPs)

Hope this post has been useful to get some insights into the database design of Publishing Service!

Sitecore Publishing Service – Publish Types

Ever since Sitecore 8.2 was released, I have been planning to look at the new Publishing Service. One of the primary reasons was for me to find out the impact it has on my Publish Exclusions module. And I finally managed to get the time this weekend to look into the Publishing Service. Setting up the service has been well documented in Sitecore Developer Portal and is also nicely explained by Jonathan in his post here.

Over the next few posts, I will try to share whatever I have learned about Publishing Service. Hopefully some of you may find it useful. In this post specifically, I would like to discuss about the different Publish Types available with the Publishing Service.

Legacy Publish Modes

To set the context, let us revisit the publish modes available with the legacy publishing approach. In fact, I should not be calling it “legacy”, as this is still the default publishing option when you install Sitecore.

  • Incremental ⇒ publishes any changes that have happened since the last publish, and thereby makes it the fastest publishing option. This approach makes use of the PublishQueue table for tracking content changes and the Properties table to track the last publish time.
  • Smart ⇒ publishes differences between the source and target database by comparing the Revision field of the items between the databases.
  • Full ⇒ does a complete republish irrespective of whether items have changed or not, and thereby is the slowest publish option.
  • Single Item ⇒ this approach publishes a content item along with the option to publish its children and any related items. The single item publish can be done by either using a Smart Publish or using a Republish option.

New Publish Types

When you have installed and configured the new Publishing Service, you will notice that the Publishing Dialogs and the publish modes in them are no longer the same – you do not see Incremental or Smart publish option. This section should throw some light on the new publish types available and how they function.

1. Single Item

This is used when you publish a single item, along with the option to publish its children (sub-items) as well as related items. This publish approach compares the Revision field of the item across the two databases to determine if the item has changed in the authoring database and then does a publish accordingly. So it is primarily same as the Smart Publish option we had with the legacy approach.

The Single Item publish is invoked from the Content Editor or Experience Editor.

single-item-experience-editor

(Single Item – Experience-Editor)

single-item-content-editor

(Single Item – Content-Editor)

2. Full Publish

In this case, all content in the Sitecore instance is published using a Smart Publish approach, i.e., based on Revision field comparison. So this effectively the Single Item publish with “publish subitems” option and the item being “/sitecore”.

This publishing option is invoked from the publishing dashboard by administrators.

full-publish

(Full Publish)

3. Full Re-publish

All content in the Sitecore instance is completely re-published irrespective of whether items have changed or not. Unlike the above publish types, this option does not use the Smart Publish approach.

This publishing option is also invoked from the publishing dashboard by administrators.

full-re-publish

(Full Re-Publish)

Note: apparently you can configure the Sitecore roles that will need access for pull publish and re-publish in Sitecore.Publishing.Service.config, but I have not been able to get this working for non-administrators even with the configuration.

<api>
 <services>
  <allowedFullPublishRoles>
   <!--This is a list of the roles that are able to perform a Full Re-publish-->
   <!-- <role>sitecore\Sitecore Client Publishing</role> -->
  </allowedFullPublishRoles>
 </services>
</api>

 4. Site Publish

This approach does an incremental publish of the entire Sitecore instance since the last publish, by leveraging changes recorded in the Publishing Queue. Do note that the Publish Queue used here is not the same as what the legacy approach uses, but instead it is a completely different table (Publishing_PublisherOperation). I will talk a bit more about the database changes in a future post.

The Site Publish is invoked from the Content Editor.

site-publish-2

(Site Publish – Start Menu)

site-publish-1

(Site Publish – Ribbon)

In conclusion, the new Publish Types are quite similar to the legacy Publish Modes in terms of how it functions, but is implemented and invoked differently. Atleast that has been my understanding so far based on the exploration I have done so far… 🙂

Key Takeaways from Sitecore Symposium 2016 – Part 2

In Part 1, I covered key highlights from the recently concluded Symposium around Sitecore Experience Accelerator (SXA), Sitecore Azure, Publishing Service and xConnect. In this concluding part, let me talk about takeaways from a few other areas.

Data Services

  • Content as a Service (CaaS) through OData compliant micro services and token based authentication
  • This is Sitecore’s attempt to enter into the “Headless CMS” space
  • Based on .NET Core

Data Exchange Framework

  • A new framework to support exchanging data between Sitecore and any external systems
    • Ex: importing product catalog from a PIM to Sitecore as items, importing CRM contact data into xDB (or even other way around)
  • The intent is to have everyone build integrations in a consistent / standardized way

Sitecore Commerce

  • Completely ground up development of commerce capabilities in Sitecore
  • No more “powered by” Commerce Server or Dynamics AX
  • Continues to use the Commerce Connect API
  • Built using .NET Core
  • Plan to be released with Sitecore 8.2.1 – maybe by end of the year or early next year

Helix and Habitat

There is better clarity from Sitecore now on what Habitat and Helix are, to clear the confusion within the community.

Helix
  • Principles, conventions, design patterns and recommended best practices for modular and multi-tenant Sitecore implementations
  • Sitecore advises you to follow Helix principles, but is obviously not mandated
  • Sitecore also trying to internalize this and is evident with the SXA implementation which uses Helix principles
Habitat
  • This is just a reference implementation to demonstrate the use of Helix principles
  • Not meant as an accelerator; for that, consider SXA

Sitecore NuGet Feed

  • At last, Sitecore libraries are now available through a public NuGet repository
  • Feed for Sitecore version 7.2 and above are available, including a few add-on modules
  • This is only for Sitecore assemblies and not for Sitecore items

.NET Core Strategy

  • Sitecore appears to make attempts to align any of their ground-up development with .NET Core
  • Data Services, Publishing Service, upcoming Sitecore Commerce and xConnect (?) are a few examples
  • Do not expect core Sitecore product to be converted to .NET Core anytime soon; it would be a huge undertaking 🙂

Hope these two posts were useful in getting an overview of where Sitecore is headed…

Key Takeaways from Sitecore Symposium 2016 – Part 1

I had the privilege to attend the Sitecore MVP Summit and the Sitecore Symposium last week at New Orleans. It was my first experience attending both these events, even though I have attended the Sitecore User Group Conference in 2015.

In this post, I would like to share some of the key takeaways for me from the Symposium. Hope you find it useful, especially those who could not attend it. Some of these were already known to the Sitecore community.

Sitecore Experience Accelerator (SXA)

  • An add-on module providing ability to accelerate your website development
  • Comes with a pre-built set of templates and UI components (about 70 of them)
  • Support for multi-tenancy / multi-site implementation
  • Drag-and-drop capability in Experience Editor; more business user friendly
  • Currently tied to the 960 Grid system
  • Plans to support other CSS frameworks in future through a plug-in based architecture
  • Developers can build custom components adhering to the SXA architecture and design
  • The module has been architected, designed and implemented with adherence to Helix principles
  • Needs a separate license :(; I do hope Sitecore can include it with the main product

Sitecore on Azure

  • First class support for Sitecore on Azure coming in the next few months
  • Azure Resource Manager (ARM) templates being made available for the following Sitecore deployment configuration:
    • Session State – Azure Redis Cache (Yay !!)
    • Search – Azure Search
    • Database – SQL Azure
    • Logging and Telemetry – Application Insights
    • Support for various scalable deployment topologies

Publishing Service

  • A new high-speed publishing service available as an add-on module  / service
  • Provides significant performance improvements for large scale implementations
  • Moving away from single item publish to bulk publish
  • Since there is no more single item publish, existing publish pipelines would no longer be used
  • Can be run in IIS or as a Console app, with ability to run as a Windows service coming shortly
  • Ground up development using .NET Core
  • Warm standby option for Publishing Service would come in future

xConnect

  • Unified API to interact with Contacts and their Interactions; so any interaction with xDB will be using xConnect
  • Better support for capturing interactions and events across online and offline touchpoints

I will cover the remaining highlights in my next post (hopefully soon) !

Solr Indexes with High Availability in Sitecore

I have been recently working on a project that involves migrating from Lucene to Solr as the search technology for one of our Sitecore implementations. In this post, I would like to discuss how high availability of Solr indexes can be achieved in a Sitecore implementation.

Default Setup with SolrSearchIndex

The default index configuration in Sitecore when using Solr is as below:

<index id="sitecore_web_index" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
  <param desc="name">$(id)</param>
  <param desc="core">$(id)</param>
  <param desc="propertyStore" ref="contentSearch/indexConfigurations/databasePropertyStore" param1="$(id)" />
  <configuration ref="contentSearch/indexConfigurations/defaultSolrIndexConfiguration" />
  <strategies hint="list:AddStrategy">
    <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsync" />
  </strategies>
  <locations hint="list:AddCrawler">
    <crawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
      <Database>web</Database>
      <Root>/sitecore</Root>
    </crawler>
  </locations>
</index>

The “core” specifies the Solr collection to which the index documents are written to and queried from. The issue here is that when when you do a rebuild of your Sitecore index, Sitecore’s Solr Provider first deletes all documents in the Solr collection before rebuilding the index again one-by-one. So, depending on the size of your index, there is a period of time when the collection has partial documents and any query from your application can be returned with incorrect results.

Here comes SwitchOnRebuildSolrSearchIndex!

The solution to this problem is to use Sitecore’s SwitchOnRebuildSolrSearchIndex, instead of the default SolrSearchIndex configured earlier. With this, there are two collections maintained for each index – one is primary and is used for index updates / reads and the second one is used for index rebuilds. After a rebuild, Sitecore’s Solr Provider swaps the indexes. Solr aliases play a key role in realizing this functionality. This is explained through the diagrams below.

solrsearchindex

As you can see here, there are two aliases – main alias and rebuild alias – each pointing to a different Solr collection / core. Sitecore’s Solr Provider always issues index update and query requests against the main alias.

switchonrebuildsolrsearchindex

When a Sitecore index is rebuilt, the rebuild request is issued against the rebuild alias and hence the second collection gets updated. During this time, the first collection continues to serve query requests. Once the rebuild is completed, SwitchOnRebuildSolrSearchIndex swaps the main alias and rebuild alias pointers. And this process continues with each rebuild.

The SwitchOnRebuildSolrSearchIndex is configured as shown below.

<index id="sitecore_web_index" type="Sitecore.ContentSearch.SolrProvider.SwitchOnRebuildSolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
  <param desc="name">$(id)</param>
  <param desc="mainalias">Web_MainAlias</param>
  <param desc="rebuildalias">Web_RebuildAlias</param>
  <param desc="collection">Web_Collection1</param>
  <param desc="rebuildcollection">Web_Collection2</param>
  <param desc="propertyStore" ref="contentSearch/indexConfigurations/databasePropertyStore" param1="$(id)" />
  <configuration ref="contentSearch/indexConfigurations/defaultSolrIndexConfiguration" />
  <strategies hint="list:AddStrategy">
    <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsync" />
  </strategies>
  <locations hint="list:AddCrawler">
    <crawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
      <Database>web</Database>
      <Root>/sitecore</Root>
    </crawler>
  </locations>
</index>

Hope this post was useful in knowing how to configure Sitecore for high availability of Solr indexes for query requests !!

Note: The mapping between main alias / rebuild alias and their corresponding current collections are maintained in the Properties table of the Core database.

Dedicated Indexing Instance with Sitecore and Solr

As you are probably aware, if Lucene is used as the search engine in your Sitecore implementation, then all instances in your cluster will perform the role of indexing and will maintain a local copy of the index on that server. Whenever an indexing related event occurs, such as content change or publish complete, the event is distributed to all instances in the Sitecore cluster and each instance will update its local copy of the index.

But Solr is an enterprise-grade, distributed, scalable and fault-tolerant search engine with Lucene being used internally. So if you move your Sitecore implementation to Solr, there is no need for each instance to update the Solr index – just one of the Sitecore instances can take up the responsibility of sending index update requests to Solr.

This post describes couple of approaches towards setting up a dedicated indexing instance in Sitecore when using Solr.

1. Configure Index Update Strategy by Instance Role

dedicated-indexing-server-approach-1

The strategy used with this approach is:

  • Have indexes configured on respective instance roles as required – master index on CM instances and web index on CD instances. Even though the instances may not be the ones dedicated for updating indexes, you will still need the indexes defined for querying purpose.
  • Configure the index update strategy as manual on all CM and CD instances, so as to ensure that automatic indexing on content change or on completion of a publish does not happen from these instances, i.e., they are in “index read-only” mode.
<index id="sitecore_web_index" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
  <param desc="name">$(id)</param>
  <param desc="core">$(id)</param>
  ...
  ...
  <strategies hint="list:AddStrategy">
    <strategy ref="contentSearch/indexUpdateStrategies/manual" />
  </strategies>
  ...
  ...
</index>
  • Configure automatic index update strategy (sync, intervalAsyncMaster or onPublishEndAsync as the case may be) only on the dedicated indexing instance.
<index id="sitecore_web_index" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
  <param desc="name">$(id)</param>
  <param desc="core">$(id)</param>
  ...
  ...
  <strategies hint="list:AddStrategy">
    <strategy ref="contentSearch/indexUpdateStrategies/onPublishEndAsync" />
  </strategies>
  ...
  ...
</index>

2. Configure Indexing Instance Name

(read spoiler alert below)

dedicated-indexing-server-approach-2

And the strategy used in this approach is:

  • Define indexes on respective instance roles along with appropriate index update strategies as required:
    • CM instance ⇒ master index with sync or intervalAsyncMaster update strategy
    • CD instances ⇒ web index with onPublishEndAsync update strategy
  • Designate one of the instances as the dedicated indexing instance by configuring the following settings. This is exactly similar to how you would set up dedicated publishing instance in Sitecore.
<!--
INSTANCE NAME Unique name for Sitecore instance. Default value: (machine name and IIS site name)
-->
<setting name="InstanceName" value="IDX_SERVER" />

<!--
INDEXING INSTANCE Assigns the instance name of dedicated Sitecore installation for indexing operations.
When empty, all indexing operations are triggered on the local installation of Sitecore. Default vaue: (empty)
-->
<setting name="ContentSearch.Solr.IndexingInstance" value="IDX_SERVER" />
  • Have all the other instances in the cluster configure their indexing instance.
<!--
INDEXING INSTANCE Assigns the instance name of dedicated Sitecore installation for indexing operations.
When empty, all indexing operations are triggered on the local installation of Sitecore. Default vaue: (empty)
-->
<setting name="ContentSearch.Solr.IndexingInstance" value="IDX_SERVER" />
  • With this approach, Sitecore automatically converts all instances, apart from indexing instance, to “index read-only” mode. Basically the index update strategy is converted to manual based on the following setting.
<!--
READ ONLY STRATEGY NAME Specifies strategy that is used for read only indexes.
Default vaue: contentSearch/indexUpdateStrategies/manual
-->
<setting name="ContentSearch.Solr.ReadOnlyStrategy" value="contentSearch/indexUpdateStrategies/manual" />

This approach definitely helps in having the same index configuration files across all Sitecore instances and having a dedicated indexing instance set up with a couple of settings configuration.

Note: and here is the disappointing part 😦 – approach 2 is not available in Sitecore today; we worked with Sitecore Support to get this capability into our implementation.

Update (28 Oct 2016): the solution described in approach 2 has now made its way into the newer releases of Sitecore !! 🙂

SolrCloud and ZooKeeper Reliability Issues

  • Does it take a long time for your Solr instance to start?
  • Do individual shard replicas show that they are not in Active state, and instead are perpetually in Recovering or Recovery Failed or Down state?
  • When one replica goes down, does it take a long time for Zoo Keeper to elect a new leader?
  • Have you run into issues where no leader has been elected for a collection even if there are active shard replicas for that collection?

If you have run into any of the above issues, continue reading. All of these are indications that your SolrCloud is not in a healthy state; basically it is unstable. There could be different reasons for this and I would like to cover our learning here.

Scenario

We have a multi-tenant Sitecore implementation running about 200 websites. Initially this implementation was using Lucene as the indexing technology. From an isolation standpoint, we had each site having its own index as explained in one of earlier posts.

  • Number of sites = 200
  • Number of publishing targets = 2 (staging, internet)
  • Total number of databases = 3 (master + one for each publishing target)
  • Number of indexes per site = 3 * 2 = 6 (primary index + secondary index for SwitchOnRebuild strategy)
  • Total number of indexes = 200 * 6 = 1200

Given the growing volume of content and the scalability needs, we decided to migrate from Lucene to Solr. In line with our indexing strategy in Lucene, we decided to go ahead and create a separate collection in Solr for each index defined in Sitecore. Basically we had 1200 collections created in Solr. The primary reason for following this approach was to ensure that the same code  written for Lucene-based site-specific indexing strategy also works with Solr.

<index id="sitecore_web_index_site1" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
  <param desc="name">$(id)</param>
  <param desc="core">$(id)</param> <!-- Same as index id -->
  ...
  ...
</index>
<index id="sitecore_web_index_site2" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
  <param desc="name">$(id)</param>
  <param desc="core">$(id)</param> <!-- Same as index id -->
  ...
  ...
</index>

Problems

And this is where the fun started !! Apparently SolrCloud does not work quite well when you have a large number of collections. While there is no hard limit, many folks have run into similar issues that I have mentioned above when the number of collections go beyond 1000.

Everything would work fine if all nodes are running in a healthy state. However all hell breaks loose when you restart the Solr cluster. When Solr cluster restarts, it reads all collections and starts sending messages to the Overseer queue in ZooKeeper. And it appears that Overseer queue and the clusterstate.json file becomes a bottleneck with a large number of collections.

You can check the status of the Overseer queue using the below Collections API

/solr/admin/collections?action=OVERSEERSTATUS

and the response you get will indicate the queue size. If the queue size is quite high and not reducing fast, then you have an issue.

<response>
  <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">764</int>
  </lst>
  <str name="leader">11.222.333.44:8983_solr</str>
  <int name="overseer_queue_size">2345</int>

There are ways to manually clear the Overseer queue and you can probably find references on the web for doing this. One way is to use ZooKeeper client to connect to ZooKeeper and removing the queue entries by running the command

rmr /overseer/queue

Note: We were running Solr 5.2.1; not sure if these issues are handled better in newer versions of Solr.

Solutions

So if you are running into similar SolrCloud issues, you should definitely consider keeping the number of collections to a smaller number that ZooKeeper can manage. There are couple of approaches that we could have adopted in our implementation.

Approach 1 – Shared Sitecore Index writing to a Shared Collection

In this approach, all sites / tenants would share the same index definition and thereby would write to the same Solr core / collection.

Approach 2 – Individual Sitecore Indexes writing to a Shared Collection

In this approach, each site will continue to maintain its own individual index definition on the Sitecore side, but have all of them (or a group of them) update the same Solr core / collection. This approach is configured as below.

<index id="sitecore_web_index_site1" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
 <param desc="name">$(id)</param>
 <param desc="core">Web_Collection1</param>
 ...
 ...
</index>
<index id="sitecore_web_index_site2" type="Sitecore.ContentSearch.SolrProvider.SolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
 <param desc="name">$(id)</param>
 <param desc="core">Web_Collection1</param>
 ...
 ...
</index>

In our case, we definitely did not want to go with Approach 1 as that would have meant some code changes to our implementation. Hence we went with Approach 2.

Note: With Approach 2, if you are using SwitchOnRebuildSolrSearchIndex to provide high availability for querying and if you rebuild one index from Sitecore, then documents of all indexes sharing that Solr core will also get deleted. This definitely is an issue. We worked with Sitecore Support to address this limitation, and I will cover that in my next post.