CH Downstream Integrations part 3: Example Azure Functions to Read Content Hub Entities
Intro
An Azure Logic App to crawl and index entities of Content Hub with Apache Solr.
Introduction: Using Azure Logic Apps in Integration Flows
This post is the 3rd part in a 3-piece series, describing an integration approach that allows connecting Content Hub with pretty much any external system via APIs. In this case, Content Hub is the source of truth, from which content gets published to other systems, so this is an example of downstream integration.
I'm leveraging Azure Functions as integration building blocks and Azure Logic Apps to compose them together into integration flows, where data from the external system would be extracted, transformed, or processed and pushed to Content Hub via its APIs. I chose to use Content Hub Web Client SDK, which is a .NET abstraction on top of the Sitecore CH REST API because it helps to deal with CH API throttling, and does a few more helpful things.
I believe Logic Apps is a good way to visually orchestrate various building blocks (Azure Functions) together with very little to no code required: easy to build, and easy to change, but of course, this isn't the only way. I'm sharing all source code here, so others can use it for building the custom integration solutions for Sitecore Content Hub.
All posts in this collection:
- Part 1 . Example Azure Functions: Describes example functions for reading content entities and their IDs from Content Hub
- Part 2. Solr Indexer Logic App: Describes example Logic App to publish content changes from Sitecore Content Hub to an external system. In this case, I am extracting, transforming, and then saving Content Hub content into Solr, so this Logic App is effectively a Solr Indexer.
- Part 3 (this post). A Content Crawler for Full Index Rebuild: Another Logic App, which reads IDs of entities in Content Hub and pushes them to an Azure Service Bus Queue, so they would be picked up and processed by the Logic App from part 2.
It's worth noting Sitecore that recently announced Sitecore Connect along with other new great products, so consider using Sitecore Connect before implementing your custom solution.
Useful Information
Relations in Sitecore Content Hub
Querying and Reading Content Hub Data with Web SDK Client
Azure Logic App
This example Azure Logic App will index all Assets in a given Content Hub instance. The approach is quite simple, I'm using a two-step process:
- Retrieve IDs of all M.Asset entities using my custom TODO function.
- Run a foreach loop to send each ID into the Service Bus Queue for further processing
- Let all items in a given Service Bus queue be processed by Solr Indexer, another Logic App described in this post.
Details on the Crawler Logic App
Add Logic App Parameters
I'm adding Content Hub URL and authentication credentials into Logic App parameters, again, for simplicity. In real-life scenarios, Azure Key Vault is a much better choice for storing secure keys.
High-level view
Following is the high-level view of the Logic App, implementing the above-described steps of reading all Asset IDs from the Content Hub and then sending them to Service Bus queue for further processing
When a HTTP request is received
Nothing special about this step, it's just a trigger to get the whole process started. This need to be secured of course to only allow authorized users to kick of the index rebuild.
Get IDs of all Assets
This is a call to my custom Azure function, which effectively takes definitionName as parameter for the search criteria to search and return all IDs of all entities with a given definition name. Notice how CH Url and OAuth App settings are passed in headers.
OAuth App needs to be configured in Content Hub in order for clients to be able to communicate with its APIs. Here is the link to instructions for adding OAuth client in Content Hub
Parse IDs Array
This is Azure's OOTB Parse JSON function. In this case, I'm parsing an output of the previous "Get IDs of All Assets" array in order to pass it to the following foreach loop.
For each Asset ID - Send message to Service Bus
foreach loop, sending all Asset IDs received in the previous call, to the Service Bus for further processing. Note how the format of the outgoing message to the Service Bus queue is mimicking Content Hub's own format of its Service Bus integration hook. The below JSON is an example of what the full message from the Content Hub looks like and I'm only sending what's meaningful for my own Solr Indexer Logic App, which picks up these messages. This way both real-time changes from the Content Hub and messages from this crawler app have a similar structure and can be all processed by the same Solr Indexer App.
{ "saveEntityMessage": { "EventType": "EntityUpdated", "TimeStamp": "2020-11-26T21:29:03.719Z", "IsNew": false, "TargetDefinition": "M.Asset", "TargetId": 72193, "TargetIdentifier": "jQ_9Xryr6UihojWlrwrd0g", "CreatedOn": "2021-11-26T01:35:19.9869541Z", "UserId": 11743, "Version": 7, "ChangeSet": { "PropertyChanges": [ { "Culture": "(Default)", "Property": "Title", "Type": "System.String", "OriginalValue": " Example Asset Title. \n", "NewValue": " Asset Title \n" } ] }
Messages in Service Bus Queue
As the crawler is running, messages start to show up in the Service Bus queue and then disappear as they are picked up and processed by the Solr Indexer Logic App, one by one.