Recently I was talking to a colleague, Devin Prevost, about a project where he is developing a solution for a client on the Power Platform. One of the challenges he was trying to solve was how he could ingest data from the client’s data warehouse into the Power App solution so that it could do a lookup and pre-fill fields with default data. Due to cost, the client wanted to keep this under the free licenses for Power Apps and Power Automate, so no premium features and data must be in SharePoint. In doing we started looking at how we could feed data into a SharePoint list via a CSV file.
The scenario we will be using starts with a CSV file being created and uploaded to a library in SharePoint Online by a 3rd party application from there we will:
- Fetch the data from the SP List and parse the contents of the CSV (~30,000 rows each)
- Compare the data in the SP List to the data in the CSV
- Delete items from the SP List where they do not exist in the CSV or have been updated
- Create items that appear in the CSV but not the SP List or have been updated
Now ideally we would handle items that need to be updated separately and avoid deleting and recreating them. However, due to performance concerns at large volumes of data we opted to stick with the create and delete scenarios as the MVP candidate of this solution.
At a very high level our flow is composed of a trigger and 4 main scopes:
- Fetch Data – Where we will get the data from the SP List as well as format the data from the CSV file
- Comparison – Here we will be formatting the data to get it ready to process it for a comparison check, ultimately the output here is 2 arrays: 1 for all items that need to be delete and another of items that need to be created (Note: I have a separate blog post on comparing items.)
- Process Items to Delete – Here we will loop through the items to delete in batches of 1000 and use the SP batch API to update them
- Process Items to Create – Same as the delete but process the items to create
Our trigger for this Power Automate Flow is going to be a SharePoint one call, When a file is created in a folder. Our folder will be the root of a super creative document library we created for testing called, "Update SharePoint List from CSV."
To ensure that our Flow only runs on CSV files we are going to put a trigger condition on the trigger that looks at the Content-Type property in the header and ensures it is “text/csv.” That way if someone accidentally uploads a file to this document library it doesn’t trigger our flow to run, unless it is a CSV of course.
In addition to the trigger setup we will be initializing all the variables we will use in our Flow. To save you from all those screenshots here is a list of them:
- siteUrl (Makes it easy to update the URL for when switching from dev to prod site collections)
- Type: String
- Value: Url of the site that houses the SP list with items
- listGuid (Makes it easy to update the list for when switching from dev list to prod list)
- Type: String
- Value: GUID to the SP list with items
- terminateCall (Used to break our Do until loop when fetching SP items)
- Type: Boolean
- Value: False
- Type: String
- Value: Initial REST API to fetch items from SP List
- spItems (Used to store the current items in the SP list)
- Type: Array
- Value: 
- createItems (Used to store the items that need to be created in SP)
- Type: Array
- Value: 
- deleteItems (Used to store the items that need to be deleted from SP)
- Type: Array
- Value: 
- remainingItems (Used to store remaining items to loop through in the Delete/ Create items scopes
- Type: Array
- Value: 
- arrayChunkSize (Used to break up the createItems / deleteItems arrays in batches for the batch API)
- Type: Integer
- Value: 1000
Now that the Flow is setup we will move into our first Scope, Fetch Data.
Our fetch data scope has 2 main objectives:
- Fetch all items from the designated SP List
- Extract the CSV data from the triggerBody and filter out any blank rows
Our Do until – getSPItems loop runs until the terminateCall variable is true. The first action in the loop is our call to SharePoint, notice that we are using the odata=nometadata on the Accept header to help limit what is sent back from SharePoint.
You can see that we are referencing variables defined in the setup here as well. The reason for the endPoint variable is because we will be updating the value or that variable with the value passed back in the nextLink provided by SharePoint.
Additionally we are fetching data in this manner for 2 reasons:
- We are limited to 5000 items with the Get Items action in Flow
- Performance, using this method we were able to pull back 30000 items in a minute.
To help with performance make sure you use the select query parameter as well as top to on your REST API endpoint.
When SharePoint returns the first call we are going to combine the items in the spItems array with the items that were just returned using the union function and a compose action, as pictured below.
Then we will update the spItems variable with the Outputs of our Compose action that combined both arrays together.
Next we need to check the body of the SharePoint call to see if an odata.nextLink property is present. If it is not present then it means there are no more items in the list and we can update the terminateCall variable to be true to break out of our Do until loop. If the property is present we update our endPoint variable with the value of the property and run through the loop again.
Once we are done with fetching all the items from the SharePoint List we move onto formatting the file contents in the triggerBody and removing any blank rows from the CSV. Paul Murana did an awesome blog post on parsing CSV content that we used, Power Automate: How to parse a CSV File to create a JSON array.
At a high-level we are using the split function on the content within the body and splitting on the rn characters present in the CSV. Then we are skipping the first item in the array because that is our headers.
Then we are using the Filter array action to remove all the empty rows from the file.
The bulk of the logic in our comparison scope is covered in my post on how to Find the differences between two arrays of objects. The main concept here is that we will be using the intersection function to find the common items in the array, then we will be filtering out those common elements. The driving force behind this is performance. One of the driving factors behind how the intersection function works is that when you compare objects the schema or properties of the objects in both arrays need to match, thus we start off our array formatting both the SP List items and the CSV items.
After we format both arrays using the Select action we can perform the intersection on their outputs. This will return us an array of common items between the two.
Now that we have an array of all the common elements we need to filter them out. Before we can do this we need to make a small adjustment to our array of common elements, and that is to only select the Title property. This property is the primary key between both datasets so it can be used to correctly identify the items that need to be removed. Also you guessed it doing so will make the Filter array action more performant because it doesn’t need to check the properties on each item.
After we format run the Select action we add a parallel branch to increase performance and run our Filter array actions at the same time, one to filter common items out of spItems and one to out of the CSV.
We use a simple contains function to see if the Title exists in our array of common Titles between both and if it does we filter it out.
We then add the results to their respective variables. The unique spItems will be the ones that we need to delete and the unique CSV items are the ones we need to create. Before we add those items to the createItems variable though we format the results into the object needed by our REST endpoint AddValidateUpdateItemUsingPath().
Process Items to Delete
We start off this scope by copying the items in the deleteItems variable to the remainingItems variable. We do this because Power Automate doesn’t allow you reference a variable when you are updating that same variable. After setting the variable we do a check to ensure it is not empty and this is to avoid making a call to SharePoint and running through additional actions when not necessary. This is needed because a Do until loop will run through the actions in the loop at least once before performing the condition check.
We are using a Do until action because SharePoint batch calls can only process 1000 commands per call and since our array could theoretically have more than 1000 we loop through the array in chunks of 1000. The setup for the actions may look familiar as it was modelled after how John Liu calls the batch API (see shout-outs below for a link) and we used it because of how elegant and simple it is to use. We start by creating 2 guids to use, one for the bath and one for the changeSet using the guid function.
After that we setup the body of our changeSet that we will use to make our Delete calls. Formatting is key here, make sure you:
- Spaces not carriage returns between the Method and Call AND Call and Protocol
- For the Delete method ensure you have 2 carriage returns after the If-Match: *
Notice how the variables help to simply the call and the –Outputs variable at the top is a reference to the changeSet guid. Additionally take note of the ID text in our call as we will be replacing that with the ID of the item we need to delete in the next call.
Next we use the Select action to loop through either all the items in the remainingItems array (because it is less than 1000) or the first 1000 items in the array. The output is the changeSet chunk text but with the ID being replaced by the ID of the item in the loop. It looks like this:
replace(outputs('Compose_-_changeSetChunk'), 'ID', string(item()?['Id']))
Now that we have our individual calls scaffolded up we need to configure our call to SharePoint’s batch API endpoint.
Take note of where we are referencing the batch guid versus the changeSet guid to ensure that your call is formatted correctly. We then are simply using the join function to take the output of the Select action above and turn it into a string.
After we make our call we simply update the remainingItems variable by using the skip function. We are simply skipping the first 1000 items in the deleteItems array. To ensure it will scale properly we multiple the 1000 by the current iteration index + 1, since it is 0 based. This means on the second call we will skip 2000 items. If we skip more items then are in the array we simply get back an empty array.
The Do until loop will then check to see if the length of the remainingItems array is 0 and if not it will run through the actions again.
Process Items to Create
The process items to create scope and actions are almost nearly identical to the steps in the Delete scope, the only differences besides setting the remainingItems variable to createItems are in the changeSetChunk and the Select – replace actions. The changeSetChunk is modified to support creating items and the Select simply updates the replace string from ID to INSERT ITEM.
Above you will see format for the create Items call. Notice our method is now POST and we have updated our call as well. Additionally, notice that our body for this particular call will be dynamically inserted with the Select action and replace the INSERT ITEM text.
Before we talk about performance it is important to let you know the size of the data set we are working with as the larger the data set the longer it will take certain actions to complete. For demo and testing purposes our SharePoint list had 30000 items in it and our CSV had 30000 items in it. Each item consisted of 7 properties and for the calls to fetch items from SharePoint we requested the ID column as well. With these numbers in mind we can talk about performance:
- Our fetch items from SharePoint consistently took around 1 minute to complete
- The intersection was extremely fast and would complete in about 2 seconds (main reason we used it)
- The filter array actions would consistently take about 1 minute each and on occasion 2 minutes
Now the delete and create scopes would vary based upon how many items needed to be deleted and how many needed to be created. We tested it 2 ways:
- 1000 items
- Deleting: ~1 minutes
- Creating: ~50 seconds
- 10000 items
- Deleting: ~12 minutes
- Creating: ~9 minutes
The average total run of the Flow when 1000 items needed to be deleted and updated was sub 5 minutes whereas a larger 10000 item run took around 24 minutes to complete.
Gotchas & Takeaways
There were some gotchas and takeaways we discovers along the way:
- Be aware of the default limits on the Do Until loop. The timeout is set to 1h and 60 max loops that may need to be increased based on the amount of data you need to process.
- If your call size it took large you will receive an error message and need to back down the 1000 commands per batch. This is all dependent on the amount of columns in your data set and the amount of items that need to be created.
- Be careful when copy pasting into the Select action as sometimes Flow will stringify your array and that ultimately messes up the format for the batch calls. You won’t see the action fail but will notice no items being created.
In an ideal world we would have had a Per-Flow-Plan license that we could of used for this solution but since we needed to keep it free here were the limitation that we had to contend with:
- 5000 item limit for Apply To Each action (doesn’t bode well when needed to loop through ~30,000 items)
- Max 5000 item limit on the Get All Items SharePoint action (Paginated Items limit)
- Size of the body being passed to the batch endpoint for SharePoint (Since we knew there was a lot of data to move we wanted to limit the amount of calls going to and from SharePoint to make the Flow as performant as possible.)
- 1000 commands per batch request
Like anything else in life where would we be without Google and others in the community blogging about different solutions and scenarios that we used in crafting our solution. In no particular order:
- John Liu – 5 Design Keys to make Flows run insanely fast – John Liu – Great resource that we leveraged for how we were going to make our batch calls to SharePoint
- William BR – Batch delete items in SharePoint with Power Automate – Crucial in getting that syntax just right for the batch statement.
- Paul Murana – Power Automate: How to parse a CSV File to create a JSON array – Great walk-through on how to parse a CSV file
- SharePointCass – Writing batch SharePoint API calls in Power Automate – Awesome post with great gotchas to watch out for in formatting your batch statement
Co-Developer Spotlight – Devin Prevost
Devin Prevost has been a consultant for 6+ years primarily in the Microsoft 365 and Application Development space. Devin’s passion to help solve complicated business problems with technical solutions that are maintainable by his clients has recently found himself architecting and developing solutions within the Power Platform. Connect with Devin on LinkedIn.