Is using AI for Content and Data Migration in XM Cloud Creepy or Annoying

Craig
Architect - Sitecore
  • Twitter
  • LinkedIn

It has been my personal experience that using Artificial Intelligence has generally fallen into one of two emotional buckets. In one bucket sometimes AI has done what I hoped it would and sometimes exceeded my expectations. In these circumstances while pleasing, I do find AI a bit creepy because it's a little too human-like for comfort. In bucket number two AI has behaved more like a petulant child than a valued colleague, this is the annoying bucket.

The year is 2024 and AI is increasing its influence on my daily work. While there is no going back, I thought I would share my experiences around where AI is excelling and where it can be detrimental and the reasons for this.

The Creepy

At XCentium we have developed a json file format (called XCIF) to exchange data and content between different systems. These files can be used as intermediaries so that content can be collected from scraping websites, or via a direct connection to Sitecore XP, XM Cloud, Content Stack, Order Cloud, Kontent.ai etc. These files can then be ingested to create content in the varies DXPs. However, for each implementation of these we needed to create files in this format. 

For Order Cloud we can export information into json files and since the destination files were also json. This was the easiest to perform via AI. I simply went to https://chatgpt.com/ and gave it a list of instructions to translate a set of json files from one format to another. After a bit of trial and error I arrived at a list of the following instructions:

Create a powershell script that will read in all json files. In a directory for each file it will create several other files:

The format of each of these input files is:

**Here I did a cut and paste from an example OCForward exported file**

The format of the product file is:

** Here I did a cut and paste from an example XCIF file**


With the following rules:

  1. 1) For each Facet Property in the Meta Array Create a New Facet File
  2. 2) The name of the facet files and the NewPageName should be the facet Name but in the sub directory called OutputData/Facets
  3. 3) The Title field in the facet file should be the same as the Name field in the first file
  4. 4) For each value that is in the array of Values Facet Property in the Meta Array Create a New FacetValue File
  5. 5) The name of the facetValue files and the NewPageName should be the value Name but in the sub directory called OutputData/FacetValues
  6. 6) The Title field in the facetValue file should be the same as the value FacetValue
  7. 7) The ParentLocation field in the facetValue file should be the same as the Name field in the facet file
  8. 9) For each Item in the Items Array in the first file Create a New Product File, a New ProductVariant File
  9. 10) The product file and the NewPageName for the product file should have the value of the Brand field in the xp object property in the Items array
  10. 11) the product files should be created in the OutputData/Products folder
  11. 12) The Description field in the Product file should be the same as the Description field in the first file
  12. 13) The Title field in the Product file should be the same as the Name field in the first file
  13. 14) The content field in the Product file should have the same value as the description
  14. 15) The Image field in the Product file should have the the value of the URL field for the first entry in the xp.Images array if the first entry in the xp.Images array exists
  15. 16) The ParentLocation of the Product file should be the value of the Category field
  16. 16) The productVariant file and the NewPageName and Name Field should have the name of Name property in the Items array
  17. 17) the productVariant files should be created in the OutputData/ProductVariants folder
  18. 18) The SKU field in the ProductVariant file should be the same as the ID field in the first file
  19. 19) The Price field in the ProductVariant file should be the same as the Price field in the xp object in the first file
  20. 19) The Category field in the ProductVariant file should be the same as the Category field in the xp object in the first file
  21. 20) The Brand field in the ProductVariant file should be the same as the Brand field in the xp object in the first file
  22. 21) The ProductVariant file should have a ParentLocation field should have the value of the category then the / character then the first word of the Name property in the Items array with "/Product Variants" appended to the end
  23. 22) In the Meta Array for each facet for the facets where the name of the facet is Category for each value create a New ProductCategory File
  24. 23) The productCategory files have the value of the value property in the Items array
  25. 24) the productCategories files should be created in the OutputData/Categories folder
  26. 25) For any file name replace the double quote character with the word inch
  27. 26) For any file name replace 1/4 with the word quarter
  28. 27) For any file name replace 1/2 with the word half
  29. 28) For any file name replace the single quote character with the word foot
  30. 29) For the product, product variant or the productCategory file name replace the / character with - character
  31. 30) For the product, product variant or the productCategory file name replace the , character with an empty string
  32. 31) The input files location is C:\Work\Content-Migration\InputData\
  33. 32) The output file location is C:\Work\Content-Migration\OutputData

 

And that was almost enough for it to create a powershell script that would translate from one format to another as well as do a little bit of text massaging. In the end it created about 150 lines of powershell script that would have taken me much longer to create by hand. In this task it seemed to understand exactly the objective and how accomplish it. It scores 8 out of 10 on the creepy scale. Why not a 10? Well, as Pablo Picasso once said, “Computers are useless. They can only give you answers”. I would have given it higher marks if it had questioned the need to do this or had suggested an alternative, but instead for now it just did what it was told.

The Annoying

Another content migration task I had to scrape a website and import these into XCIF files.

While it created the Powershell, that script made references to functions that it assumed would be there but were not. They weren't there, however, due to a missing reference, but because they didn’t exist on planet earth yet. AI just assumed there SHOULD be a function that didn't exist and that this mythical function would perform what it wanted to. I have encountered this same type of issue with a few different AI products.

The knowledge the AI passes on is statistically inferred. It is not necessarily correct. It is not proven. It is just the most common. As Noam Chomsky put it AI just “infers brute correlations”.  For instance, in working with copilot, it really thinks I should be including the LINQ assembly for almost everything I do. Why? Because that is the average thing to do. It’s not the smartest thing, or something even useful. But simply based on the sample set that copilot uses, its recommendation is to include a reference to this unused library.

No matter how many times I delete references to unused assemblies, as soon as I make changes to a file it will often put them back in. I have mused that maybe copilot thinks of the LINQ assembly as a long lost relative and is just trying to show it some love.

Another interesting thing about AI which is both a bit creepy and a little bit annoying is that it often does not produce the same results when given the exact same set of. This makes it unpredictable.

The Future

AI has moved very quickly over the past few years. The progress has been amazing. I do wonder if this progress is sustainable. The reason being is that amount of data that it can be trained on. Take for instance copilot. As part of its training, it used 159 GB of Type code sourced from 54 million public GitHub repositories. That is a huge amount of endeavor to create that much meaningful code. Now in order for AI to get “smarter” and more refined it is not probable that even if it had access to double that amount of information it would be double as refined as this works on a logarithmic scale. Furthermore, there is a risk of a negative feedback loop whereby AI is to a degree responsible for checking in code which it then uses as a reference. This has the potential for the quality of results to actually decrease over time.

In Summary

For the type of work that I do when AI is good and creepy it is good and creepy. When it is bad it is pretty bad. Imagine if you will, you had an infinitely large army of a team of junior developers that have never met someone or been outside and you asked them to come up with a solution to a problem. Their instant recommendation as a collective is not the best answer but the most common. On the other hand, it is instant, and it does get you part of the way there. This can save you time IF you are very succinct in what you communicate. So, for us developers an emerging skill set is how do we go about instructing AI in the most effective way possible. 

As the old saying goes, “may you live in interesting times”.  Well, the road ahead will be interesting and potentially increasingly creepy and annoying.

Related Blogs

Latest Blogs