Optimizing Sitecore Indexing Performance – Part 1

Part 1 | Part 2 | Part 3

One of the areas you can tune to improve the performance of your Sitecore solution is Lucene indexes. Indexes get updated with different activities you perform in Sitecore – such as content authoring, content deletion, package installation, publishing, full index rebuild and so on. I am assuming it also gets updated through some of the maintenance activities such as database cleanup, though I have not verified it myself. Given that Lucene is not a centralized indexing / search solution in Sitecore, i.e., Lucene indexes are updated on each machine in your Sitecore installation – it is even more important that you optimize this piece of the Sitecore solution.

In a series of posts, I will cover the some of the strategies that you can consider adopting to tune your Lucene indexes and thereby improving the overall performance of your Sitecore application. Do keep in mind that you need to weigh in each of the below options and only adopt the ones that are applicable for your implementation.

Index only required templates and fields

Index only what you intend to search / query – by default, Sitecore indexes all templates and fields. However in most cases, this may not be required as you would hardly search for all content in your instance. You can exclude templates and fields from being indexed – this can be done not just for your custom templates and fields, but also for the ones coming from Sitecore’s templates (workflow, help text, validation rules, security, etc.).

Template exclusions can be configured in Sitecore.ContentSearch.Lucene.DefaultIndexConfig uration.config file as shown below, or I would recommend to patch it in a separate config file.

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/"> 
  <sitecore>
    <contentSearch>
      <indexConfigurations>
        <defaultLuceneIndexConfiguration type="Sitecore.ContentSearch.LuceneProvider.LuceneIndexConfiguration, Sitecore.ContentSearch.LuceneProvider">
          <exclude hint="list:ExcludeTemplate">
            <!-- Specify templates that need to be excluded - ID is the template ID and node name can be anything -->
            <MyTemplate>{71AB14E3-AE77-46C1-B219-9E507918AAF0}</MyTemplate>
          </exclude>
        </defaultLuceneIndexConfiguration>
      </indexConfigurations>
  </sitecore>
</configuration>

Field exclusions can be configured likewise.

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/"> 
  <sitecore>
    <contentSearch>
      <indexConfigurations>
        <defaultLuceneIndexConfiguration type="Sitecore.ContentSearch.LuceneProvider.LuceneIndexConfiguration, Sitecore.ContentSearch.LuceneProvider">
          <exclude hint="list:ExcludeField">
            <!-- Specify fields that need to be excluded - ID is the field ID and node name can be anything -->
            <__is_bucket>{D312103C-B36C-4CA5-864A-C85F9ABDA503}</__is_bucket>
            <MyField>{C9283D9E-7C29-4419-9C28-5A5C8FF53E84}</MyField>
          </exclude>
        </defaultLuceneIndexConfiguration>
      </indexConfigurations>
  </sitecore>
</configuration>

Or even better is to turn off indexing of all fields using <indexAllFields>true</indexAllFields>, and include only the fields you need.

It is possible that you may want to have everything indexed in your authoring environment so that they are searchable in the Content Editor, but index only what you need for your site functionality in your delivery environment. Or even in your authoring environment, you want only master index to include every content and web to include only what you need. These can be achieved by having different indexConfigurations and have indexes refer to the appropriate indexConfiguration.

This strategy alone will considerably reduce the size of your index and the time taken to index. In one of my implementations, I did see ~65% improvement in indexing performance. And now with a reduced index size, your queries against it will also be relatively faster !!

Index only required content versions

By default, all versions of an item is stored in the index. Over a period of time, you will have too many versions of content and this can impact your indexing performance. Obviously this is applicable only for your master indexes. You may want to consider using Marketplace modules such as Version Manager or Version Pruner to keep the number of versions in control and thereby reduce the size of your index.

If it does not impact any of your functionalities, you can also consider to store only the latest version of an item in your index. This can be achieved using an indexing inbound filter pipeline as shown below.

namespace SitecoreZone.Indexing
{
    public class InboundIndexVersionFilter : Sitecore.ContentSearch.Pipelines.IndexingFilters.InboundIndexFilterProcessor
    {
        public override void Process(Sitecore.ContentSearch.Pipelines.IndexingFilters.InboundIndexFilterArgs args)
        {
            var item = args.IndexableToIndex as Sitecore.ContentSearch.SitecoreIndexableItem;
            if (item != null && item.Item != null &&
                !item.Item.Versions.IsLatestVersion())
                    args.IsExcluded = true;
        }
    }
}
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/"> 
  <sitecore>
    <pipelines>
      <indexing.filterIndex.inbound>
        <processor type="SitecoreZone.Indexing.InboundIndexVersionFilter, SitecoreZone.Indexing" />
      </indexing.filterIndex.inbound>
	</pipelines>
  </sitecore>
</configuration>

Index only required content locations

If you never intend to search for content within a specific location of your content tree (such as templates, layouts or maybe even media), you can configure your index crawlers to include only the portion that it needs to crawl and index. Do keep in mind that if you adopt this approach with your master indexes, the Search functionality in some of the Sitecore authoring scenarios will not show the excluded content – Search tab, Quick Search, Search option in Media Browser dialog and Copy Presentation Details dialog, etc.

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/"> 
  <sitecore>
    <contentSearch>
      <configuration type="Sitecore.ContentSearch.ContentSearchConfiguration, Sitecore.ContentSearch">
        <indexes>
          <index id="sitecore_master_index">
            <locations>
              <crawler name="content" type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
                <Database>master</Database>
                <Root>/sitecore/content</Root>
              </crawler>
              <crawler name="media" type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
                <Database>master</Database>
                <Root>/sitecore/media library</Root>
              </crawler>
            </locations>
          </index>
        </indexes>
      </configuration>
    </contentSearch>
  </sitecore>
</configuration>

I will cover a few other approaches in my next post in this series. Stay tuned!!

Advertisements

4 thoughts on “Optimizing Sitecore Indexing Performance – Part 1

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s