Sitecore DMS, Lucene and Website Search


If you have to design a solution to leverage all the features that DMS has to offer, the first thing you might do is, implement data sources for Sublayouts and move the content over to data source item(s). This definitely makes the page DMS friendly but makes website search results inaccurate and probably unreliable. This is because the default Database Crawler is designed to index all the fields that are defined on the item.

An easy solution to the problem is buying a third party tool that can crawl the website. But that would cost money and would make this blog irrelevant. So, here is one way of solving the problem.

1) Extend the default database crawler
2) Add checks to ensure that only items with layouts defined are being indexed
3) Get all the renderings associated with the item and index the fields of the data source item
4) Update search query to also query the newly indexed custom fields

Code

To implement, create a custom crawler class that extends “Sitecore.Search.Crawlers.DatabaseCrawler” and add method(s) that will allow us to add fields to Lucene Document

 public class WebsiteCrawler : DatabaseCrawler
    {
        public void AddField(Document doc, string fieldKey, string fieldValue, Field.Store storage, Field.Index index)
        {
            AddField(doc, fieldKey, fieldValue, storage, index, Field.TermVector.NO, 1f);
        }

        public void AddField(Document doc, string fieldKey, string fieldValue, Field.Store storage, Field.Index index, Field.TermVector vector, float boost)
        {
            Field field = new Field(fieldKey, fieldValue, storage, index, vector);
            field.SetBoost(boost);
            doc.Add(field);
        }

Next, create a method that can read data sources assigned to renderings.

     private void AddRenderingDataSource(Document document, Item item, bool versionSpecific)
        {
          // Resolve device
  DeviceItem device = DeviceItem.ResolveDevice(item.Database);
            if (device != null)
            {
                List renderings = item.Visualization.GetRenderings(device, false).ToList();
                if (renderings.Any())
                {
                    List allFieldValues = new List();
// foreach rendering get the data source item
                    foreach (RenderingReference rendering in renderings)
                    {
                        Item dsItem = item.Database.GetItem(rendering.Settings.DataSource);
                        if (dsItem != null)
                        {
// get data from each field
                            IEnumerable fieldValues = item.Fields.Where(field => IsDataField(field)).Select(f => StringUtil.GetString(f.Value));
                            allFieldValues.AddRange(fieldValues);
// if the data source item has children then index fields from all the children.
// this is useful in situations where the actual content comes from the children of the data source item. You can add logic to prohibit this
                            if (dsItem.HasChildren)
                            {
                                IEnumerable fieldValues1 = dsItem.Axes.GetDescendants().SelectMany(childItem => childItem.Fields).Select(f => StringUtil.GetString(f.Value));
                                allFieldValues.AddRange(fieldValues1);
                            }
                        }
                    }
// join the string and add it to a field in lucene document
                    if (allFieldValues.Any())
                        AddField(document, "DS_Content", string.Join(" ", allFieldValues), Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.NO, 1f);

                }
            }
        }

Override the “AddAllFields” method and call our method “AddRenderingDataSource” to index data source content

  protected override void AddAllFields(Document document, Item item, bool versionSpecific)
        {
            base.AddAllFields(document, item, versionSpecific);
this.AddRenderingDataSource(document, item, versionSpecific);
        }

   private bool IsDataField(Sitecore.Data.Fields.Field field)
        {
            return TemplateManager.IsDataField(field.GetTemplateField());
        }

Finally, add a new index that uses this crawler, rebuild search index and include “DS_content: ” to your website search query

     
        
          
            $(id)
            __website
            
            
              
                web
                /sitecore/content/Home
                true
                web search content
              
            
          
        
      
Categories: Search

SEARCH ARTICLES