allBlogsList

Indexing Custom Fields for Sitecore 7

First of all, I’d like to state that this is for Sitecore 7, it will not work for earlier versions. Furthermore, this post will cover both how to include a custom field in your index as well as exclude certain content from within that field. If you are looking to exclude content from the index and are using a default index field, you can simply add the id of the content you’d like excluded to the Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config and be done with it. This blog post is intended to assist developers in creating custom field indexing and omitting certain content from within that.

Recently, I launched a site that had a custom search written against the Lucene index for the site search. For the most part, after lots of fine tuning, the search worked well. There were two major issues we came across. First, some content we needed to be indexed was not being indexed. Once that was resolved, we started to get content we did not want indexed showing up in the results. We needed to find a way to exclude certain content from the index, in this case, Image credit information.

First, creating a custom indexing field for Lucene

My goal here was to have Lucene index text that shows up on a rendered page. First, I created a class called ParsedPageContent. As you can see in the code below, I get the Sitecore item that is being indexed, verify that it has a Layout, and pass the Item in to two different methods.

  1. GetItemContent returns a concatenated string that contains all of the relevant content within the Sitecore Item itself.
  2. GetDataSourceContent returns a concatenated string that contains all of the relevant content for every Datasource that is associated with the Sitecore Item.

Finally I combine these strings and return them to be included in the index.

public class ParsedPageContent : IComputedIndexField

        public object ComputeFieldValue(Sitecore.ContentSearch.IIndexable indexable)
        {
            Item item = (Item)(indexable as SitecoreIndexableItem);
            if (item.Visualization.Layout != null)
            {
                string itemContent = GetItemContent(item);
                string dataSoruceContent = GetDataSourceContent(item);
                return StringUtil.RemoveTags("{0} {1}".FormatWith(itemContent, dataSoruceContent));
            }
            return null;
        }

Before we get in to these methods that fetch the relevant data, I want to first mention a third function that is used. IsDataField(), is what gives this whole task power. Within this function I can determine what is and is not relevant information to include in my index. In this case, as I mentioned above, I wanted to exclude image credit from being indexed.
IsDataField returns a bool letting me know if the data should be included in the index for not (true it is relevant data or false it is not). In my case, I check to see if the field id is equal to the field id of the image credit, if so I return false and it is never included in to the index.
Now that we have that out of the way. We can focus on the 2 methods that give us our custom data.

Getting Item Content

This is the content that lives within the item being indexed. To start we can just do this:

IEnumerable<string> fieldValues = item.Fields.Where(field => IsDataField(field)).Select(f => GetFieldValue(f));

This gets the fields that are valid data types. You can do a lot more with this, for example add to the string the data fields of the descendants of your item.

Getting the Datasource Content

This takes a few more steps than getting the item content. There are two checks I did before jumping in and grabbing all of the Datasource items. First I checked to see if the device was valid (Default, print, mobile etc.) next verifying that the item contains sub layouts (eg item.Visualization.GetRenderings()). Next, for each rendering I got the Datasource item and again verified that it was a valid data field that I wanted to include in my index. Finally, I did a check to see if the Datasource has children and if the content from those children should be included in the index.

Here is how my code looks for getting the Datasource content.

        private string GetDataSourceContent(Item item)
        {
            try
            {
                DeviceItem device = DeviceItem.ResolveDevice(item.Database);
                if (device == null)
                    return string.Empty;
                //if it has sublayouts
                List<RenderingReference> renderings = item.Visualization.GetRenderings(device, false).ToList();
                if (!renderings.Any())
                    return string.Empty;
                List<string> allFieldValues = new List<string>();
                foreach (RenderingReference rendering in renderings)
                {
                    //get the datasource item
                    Item dsItem = item.Database.GetItem(rendering.Settings.DataSource);
                    if (dsItem == null)
                        return string.Empty;

                    // get data from each field - 'isdatafield' make sure it gets only fields that have strings 
                    IEnumerable<string> fieldValues = dsItem.Fields.Where(field => IsDataField(field)).Select(f => GetFieldValue(f));
                    allFieldValues.AddRange(fieldValues);
                    // if the data source item has children then index fields from all the children.
                    // this is useful in situations where the actual content comes from the children of the data source item. You can add logic to prohibit this
                    if (dsItem.HasChildren)
                    {
                        IEnumerable<string> fieldValues1 = dsItem.Axes.GetDescendants().SelectMany(childItem => childItem.Fields).Where(field => IsDataField(field)).Select(f => GetFieldValue(f));
                        allFieldValues.AddRange(fieldValues1);
                    }                    
                }
                // join the string and add it to a field in lucene document
                if (allFieldValues.Any())
                    return string.Join(" ", allFieldValues);
            }
            catch (Exception ex)
            {
                Log.Error("could not index datasource", ex, this);
            }

            return string.Empty;

        }

Updating Lucene to include your class
Now that our class is defining exactly what content to include (and exclude) from the index, we need to let Lucene know where to find it. For this we’ll take a look inside your App-Config/Include directory. Open the Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config and look within the within section. This is what I added for my example, tailor yours to fit accordingly.
Definition of the field:

              <fieldType fieldName="parsedpagecontent"    storageType="Yes"  indexType="TOKENIZED"  vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
                <Analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
              </fieldType>

Adding computed field to index:

SitecoreExt.ContentSearch.ComputedFields.ParsedPageContent,SitecoreExt

ParsedPageContent:

using Sitecore;
using Sitecore.ContentSearch;
using Sitecore.ContentSearch.ComputedFields;
using Sitecore.Data.Items;
using Sitecore.Data.Managers;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Sitecore.StringExtensions;
using Sitecore.Layouts;
using Sitecore.ContentSearch.Utilities;
using Sitecore.Diagnostics;

namespace SitecoreExt.ContentSearch.ComputedFields
{
    public class ParsedPageContent : IComputedIndexField
    {
        public object ComputeFieldValue(Sitecore.ContentSearch.IIndexable indexable)
        {
            Item item = (Item)(indexable as SitecoreIndexableItem);
            if (item.Visualization.Layout != null)
            {
                string itemContent = GetItemContent(item);
                string dataSoruceContent = GetDataSourceContent(item);
                return StringUtil.RemoveTags("{0} {1}".FormatWith(itemContent, dataSoruceContent));
            }
            return null;

        }

        public string FieldName
        {
            get;
            set;
        }

        public string ReturnType
        {
            get;
            set;
        }


        private string GetItemContent(Item item)
        {
            List allvalues = new List();
            IEnumerable fieldValues = item.Fields.Where(field => IsDataField(field)).Select(f => GetFieldValue(f));
            allvalues.AddRange(fieldValues);
            if (item.IsValidType())
            {
                IEnumerable novelContentItems = item.GetDescendantsByTemplateWithFallback(//insert your Item Id here);
                IEnumerable novelContent = novelContentItems.SelectMany(itm => itm.Fields).Where(field => IsDataField(field)).Select(f => GetFieldValue(f));
                allvalues.AddRange(novelContent);
            }

            if (item.IsValidType())
            {
                IEnumerable expertContentItems = item.GetDescendantsByTemplateWithFallback(//insert your Item Id here);
                IEnumerable expertContent = expertContentItems.SelectMany(itm => itm.Fields).Where(field => IsDataField(field)).Select(f => GetFieldValue(f));
                allvalues.AddRange(expertContent);
            }

            return string.Join(string.Empty, allvalues);
        }

        private string GetDataSourceContent(Item item)
        {
            try
            {
                DeviceItem device = DeviceItem.ResolveDevice(item.Database);
                if (device == null)
                    return string.Empty;
                List renderings = item.Visualization.GetRenderings(device, false).ToList();
                if (!renderings.Any())
                    return string.Empty;
                List allFieldValues = new List();
                foreach (RenderingReference rendering in renderings)
                {
                    Item dsItem = item.Database.GetItem(rendering.Settings.DataSource);
                    if (dsItem == null)
                        return string.Empty;

                    // get data from each field
                    IEnumerable fieldValues = dsItem.Fields.Where(field => IsDataField(field)).Select(f => GetFieldValue(f));
                    allFieldValues.AddRange(fieldValues);
                    // if the data source item has children then index fields from all the children.
                    if (dsItem.HasChildren)
                    {
                        IEnumerable fieldValues1 = dsItem.Axes.GetDescendants().SelectMany(childItem => childItem.Fields).Where(field => IsDataField(field)).Select(f => GetFieldValue(f));
                        allFieldValues.AddRange(fieldValues1);
                    }
                    
                }
                // join the string and add it to a field in lucene document
                if (allFieldValues.Any())
                    return string.Join(" ", allFieldValues);
            }
            catch (Exception ex)
            {
                Log.Error("could not index datasource", ex, this);
            }

            return string.Empty;

        }

        private bool IsDataField(Sitecore.Data.Fields.Field field)
        {
            //if you return false from here, it will exculde it from the index

            string fieldId = field.ID.ToString();
            return (//insert your Item Id here != fieldId || //insert your Item Id here != fieldId) && TemplateManager.IsDataField(field.GetTemplateField());
        }

        private string GetFieldValue(Sitecore.Data.Fields.Field field)
        {
            try
            {
                Sitecore.Data.Fields.FieldType fieldType = Sitecore.Data.Fields.FieldTypeManager.GetFieldType(field.Type);
                switch (fieldType.Type.FullName)
                {
                    case "Sitecore.Data.Fields.TextField":
                    case "Sitecore.Data.Fields.HtmlField":
                    case "Sitecore.Data.Fields.ValueLookupField":
                        return StringUtil.GetString(field.Value);
                }
            }
            catch (Exception ex)
            {
                Log.Error("could not parse field", ex, this);
            }

            return string.Empty;
        }
    }
}