{"id":1261,"date":"2020-09-03T20:11:03","date_gmt":"2020-09-04T03:11:03","guid":{"rendered":"http:\/\/blog.nillsf.com\/?p=1261"},"modified":"2020-09-04T10:50:35","modified_gmt":"2020-09-04T17:50:35","slug":"using-key-vault-managed-storage-accounts-and-sas-tokens-in-azure-data-factory","status":"publish","type":"post","link":"https:\/\/blog.nillsf.com\/index.php\/2020\/09\/03\/using-key-vault-managed-storage-accounts-and-sas-tokens-in-azure-data-factory\/","title":{"rendered":"Using Key Vault managed storage accounts and SAS tokens in Azure Data Factory"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">I am working with a customer right now that is doing a lot of work with Azure Data Factory (ADF). ADF is a powerful cloud based data integration tool that lets you move data from a multitude of source, process that data and store it in a target data store. You can think of it as cloud-native ETL.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In working with this customer, a requirement came up to securely transfer data between blob storage accounts with the least permissions possible. There are a couple options to achieve this:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Encode storage connection string with the storage account <em>master<\/em> key in ADF.<\/li><li>Store the storage connection string with <em>master<\/em> key in Key Vault, and authenticate ADF to Key Vault.<\/li><li>Encode a static SAS token in ADF.<\/li><li>Store a SAS token in Key Vault, and use Key Vault to get the SAS token.<\/li><li>Have Key Vault manage your storage accounts, and get a dynamically created SAS token.<\/li><li>Use the managed identity of ADF to authenticate to Azure blob storage.<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">We decided to pursue the 5th option. This means we will have Key Vault manage the Azure storage accounts and frequently rotate the keys, and have Key Vault use those keys to generate SAS token for us. ADF comes out of the box with a managed identity now. We will then use this managed identity of ADF to authenticate towards Key Vault to get the SAS tokens. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>For those of you not familiar with <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/storage\/common\/storage-sas-overview\">SAS tokens<\/a>: SAS tokens are a way to give access to Azure storage (works with blob, file, queue and table) with a limited set of permissions. Permissions can be scoped to which service, can be time bound, can be limited by IP addresses etc. A lot safer than leveraging the storage account master key. <\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The purpose of this post is to explain the mechanism and show you how this would work in ADF.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Overview of the demo we&#8217;ll build in this blog post<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">I built a small demo environment for this demo. The demo environment consists of:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>2 storage accounts. One will be read by ADF, another will be written to.<\/li><li>A key vault. This key vault will manage both storage accounts and generate SAS tokens.<\/li><li>An Azure data factory, which will read data from storage account 1 and write it to storage account 2.<\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"502\" height=\"314\" src=\"\/wp-content\/uploads\/2020\/09\/ADF-SAS.png\" alt=\"\" class=\"wp-image-1262\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/ADF-SAS.png 502w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/ADF-SAS-300x188.png 300w\" sizes=\"auto, (max-width: 502px) 100vw, 502px\" \/><figcaption>The demo we&#8217;ll be building today.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">So, let&#8217;s start at the beginning, creating the two storage accounts, the key vault and configuring the key vault for managing the storage accounts.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Setting up storage accounts and key vault<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">To setup the storage accounts and have key vault manage the, I decided to use Azure PowerShell. If you want to execute this all-in-one, <a href=\"https:\/\/github.com\/NillsF\/blog\/blob\/master\/adf-kv-sas\/setup-storage.ps1\">you can find the script on Github<\/a>. I&#8217;ll walk you through the steps here.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">First up, we&#8217;ll setup a number of variables. We&#8217;ll use those throughout the setup.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Setup couple of variables\n$staccname = \"nfadfkvread\"\n$staccname2 = \"nfadfkvwrite\"\n$rgname = \"kv-adf\"\n$location = \"westus2\"\n$kvname = \"kv-nf-adf-sas\"\n$keyVaultSpAppId = \"cfa8b339-82a2-471a-a3c9-0fc0be7a4093\"\n$storageAccountKey = \"key1\"\n$SASDefinitionName = \"readFromAccount1\"\n$SASDefinitionName2 = \"writeToAccount2\"<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Next up, we&#8217;ll login to Azure. Then we&#8217;ll create the actual resources. Meaning resource group, storage accounts and key vault:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Login\nConnect-AzAccount  \n\n# Create all resources\nWrite-Output \"Create all resources\"\n\nNew-AzResourceGroup -Name $rgname -Location $location\n$stacc = New-AzStorageAccount -ResourceGroupName $rgname -Location $location -Name $staccname -SkuName Standard_LRS\n$stacc2 = New-AzStorageAccount -ResourceGroupName $rgname -Location $location -Name $staccname2 -SkuName Standard_LRS\n$kv = New-AzKeyVault -VaultName $kvname -ResourceGroupName $rgname -Location $location<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Then, we&#8217;ll do some role assignments. First, we&#8217;ll give key vault permission to rotate the keys in the storage account. Then we&#8217;ll give my user account permissions in the key vault itself (<em>FYI: Even if you are owner of a Key Vault that doesn&#8217;t give you access to the objects in the vault. The control (Azure API) and data plane (Key Vault itself) are configured independently).<\/em><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Give KV permissions on Storage to rotate keys\nWrite-Output \"Give KV permissions on Storage to rotate keys\"\n\nNew-AzRoleAssignment -ApplicationId $keyVaultSpAppId -RoleDefinitionName 'Storage Account Key Operator Service Role' -Scope $stacc.Id\nNew-AzRoleAssignment -ApplicationId $keyVaultSpAppId -RoleDefinitionName 'Storage Account Key Operator Service Role' -Scope $stacc2.Id\n\n# Give my user access to KV storage permissions\nWrite-Output \"Give my user access to KV storage permissions\"\n\n$userId = (Get-AzContext).Account.Id\nSet-AzKeyVaultAccessPolicy -VaultName $kvname -UserPrincipalName $userId -PermissionsToStorage get, list, delete, set, update, regeneratekey, getsas, listsas, deletesas, setsas, recover, backup, restore, purge<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">When that is done, we&#8217;ll need to wait a couple of seconds for the role assignments to propagate fully in Azure. The role assignment that is the most critical here is the permission of key vault over the storage accounts. I have a 30 second sleep in the script itself. After that sleep, we can add the storage accounts to key vault.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Add storage accounts to key vault\n$regenPeriod = [System.Timespan]::FromDays(2)\nWrite-Output \"Sleeping 30 seconds to have role assignments propagate and catch up\"\nStart-Sleep -Seconds 30\nWrite-Output \"Done sleeping. Add storage accounts to key vault\"\n\nAdd-AzKeyVaultManagedStorageAccount -VaultName $kvname -AccountName $staccname -AccountResourceId $stacc.Id -ActiveKeyName $storageAccountKey -RegenerationPeriod $regenPeriod\nAdd-AzKeyVaultManagedStorageAccount -VaultName $kvname -AccountName $staccname2 -AccountResourceId $stacc2.Id -ActiveKeyName $storageAccountKey -RegenerationPeriod $regenPeriod<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">With that done, we can onboard the first storage account. What we need to do here is configure a SAS definition in Key Vault. For this first storage account, we&#8217;ll configure very fine permissions: only read and list allowed for the blob service. This will protect the account in case the SAS token would potentially leak.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Onboard first account with list\/read permissions only\nWrite-Output \"Onboard first account with list\/read permissions only\"\n\n$storageContext = New-AzStorageContext -StorageAccountName $staccname -Protocol Https -StorageAccountKey Key1 \n$start = [System.DateTime]::Now.AddDays(-1)\n$end = [System.DateTime]::Now.AddMonths(1)\n\n$sasToken = New-AzStorageAccountSasToken -Service blob -ResourceType Container,Object -Permission \"rl\" -Protocol HttpsOnly -StartTime $start -ExpiryTime $end -Context $storageContext\n\nSet-AzKeyVaultManagedStorageSasDefinition -AccountName $staccname -VaultName $kvname `\n-Name $SASDefinitionName -TemplateUri $sasToken -SasType 'account' -ValidityPeriod ([System.Timespan]::FromDays(1))\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Then we do the same for the second storage account. For the second one, we will configure a SAS policy that will only allow write and list operations. (<em>I&#8217;m going to say it here already, having no read access on the destination account will not allow ADF to do integrity validation of the data that is written. This can be fine if you don&#8217;t want integrity validation, but if you want this, you&#8217;ll also want to add read permissions<\/em>. <em>Later on in the demo you&#8217;ll see my first pipeline run fail because I turn on integrity validation, but don&#8217;t have read permissions.<\/em>)<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Onboard second account with write\/list permissions only\nWrite-Output \"Onboard second account with list\/read permissions only\"\n\n$storageContext = New-AzStorageContext -StorageAccountName $staccname2 -Protocol Https -StorageAccountKey Key1 \n$start = [System.DateTime]::Now.AddDays(-1)\n$end = [System.DateTime]::Now.AddMonths(1)\n\n$sasToken = New-AzStorageAccountSasToken -Service blob -ResourceType Container,Object -Permission \"wl\" -Protocol HttpsOnly -StartTime $start -ExpiryTime $end -Context $storageContext\n\nSet-AzKeyVaultManagedStorageSasDefinition -AccountName $staccname2 -VaultName $kvname `\n-Name $SASDefinitionName2 -TemplateUri $sasToken -SasType 'account' -ValidityPeriod ([System.Timespan]::FromDays(1))<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Finally, we can check that the secrets work correctly by getting a SAS token for each account. The name of the secrets is important here. The secrets don&#8217;t show up in the Azure portal. The secrets have the naming pattern <code>storageAccountName-SASDefinitionName<\/code>.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Getting secrets to verify everything works\nWrite-Host \"Getting secrets to verify things work.\"\n\n$secret = Get-AzKeyVaultSecret -VaultName $kvname -Name \"$staccname-$SASDefinitionName\"\n$secret.SecretValueText\n$secret = Get-AzKeyVaultSecret -VaultName $kvname -Name \"$staccname2-$SASDefinitionName2\"\n$secret.SecretValueText<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">And this concludes setting up the storage accounts and key vaults. We can now use both in Azure Data Factory.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Setting up the Azure Data Factory<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Since the ADF piece is the interesting piece I wanted to dive into, I&#8217;ll do this work via the portal. To start, we&#8217;ll create the actual data factory. Look for Azure Data Factory in either the Azure search bar or in the marketplace. This will open the creation wizard for a new ADF:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"805\" height=\"563\" src=\"\/wp-content\/uploads\/2020\/09\/image.png\" alt=\"\" class=\"wp-image-1264\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image.png 805w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-300x210.png 300w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-768x537.png 768w\" sizes=\"auto, (max-width: 805px) 100vw, 805px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">I just filled in the basics, and skipped the git integration for now. When the ADF is created, the first thing we&#8217;ll do is give this ADF permissions to Key Vault. We could do this later on in the pipeline creation wizard, but I&#8217;d like to show this manually here. What we need here is the managed identity object ID, and then give that permissions in key vault. To start, get this object ID from the properties:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"748\" src=\"\/wp-content\/uploads\/2020\/09\/image-1-1024x748.png\" alt=\"\" class=\"wp-image-1265\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-1-1024x748.png 1024w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-1-300x219.png 300w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-1-768x561.png 768w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-1.png 1068w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Then open your key vault, and add an access policy. Look for the managed identity of your ADF by using the object ID, and give it secret list and get permissions.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1025\" height=\"698\" src=\"\/wp-content\/uploads\/2020\/09\/image-2.png\" alt=\"\" class=\"wp-image-1266\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-2.png 1025w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-2-300x204.png 300w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-2-768x523.png 768w\" sizes=\"auto, (max-width: 1025px) 100vw, 1025px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Important gotcha now, once you add the access policy, you can&#8217;t forget to actually save the access policy. Hit the save button before you move forward.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"786\" height=\"672\" src=\"\/wp-content\/uploads\/2020\/09\/image-3.png\" alt=\"\" class=\"wp-image-1267\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-3.png 786w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-3-300x256.png 300w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-3-768x657.png 768w\" sizes=\"auto, (max-width: 786px) 100vw, 786px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">With that setup, we can open up the ADF editor. Go back to your ADF, and hit the Author and Monitor button:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"371\" src=\"\/wp-content\/uploads\/2020\/09\/image-4-1024x371.png\" alt=\"\" class=\"wp-image-1268\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-4-1024x371.png 1024w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-4-300x109.png 300w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-4-768x278.png 768w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-4.png 1102w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">In this window, we&#8217;ll start off by adding the linked services we need. We need our key vault and both storage account. To add a linked service, start by clicking the manage button, go to linked services, hit the add button and look for Azure Key Vault:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"473\" src=\"\/wp-content\/uploads\/2020\/09\/image-5-1024x473.png\" alt=\"\" class=\"wp-image-1269\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-5-1024x473.png 1024w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-5-300x139.png 300w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-5-768x355.png 768w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-5.png 1044w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Then look for the Azure Key Vault we created earlier. As you can see, we could provide permissions to key vault for the identity of ADF here as well, but I wanted to actually show this manually so you know where to look for this.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"622\" height=\"590\" src=\"\/wp-content\/uploads\/2020\/09\/image-6.png\" alt=\"\" class=\"wp-image-1270\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-6.png 622w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-6-300x285.png 300w\" sizes=\"auto, (max-width: 622px) 100vw, 622px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Next up, add another linked service and look for Azure blob storage. In there, provide the following details:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Authentication method: SAS URI<\/li><li>Select SAS URI<\/li><li>Provide the URL of the container in blob you want to monitor<\/li><li>Select Key Vault<\/li><li>Select the key vault we configured before<\/li><li>Provide the secret name as  <code>storageAccountName-SASDefinitionName<\/code>. <\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">You can test this connection, and it should work.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"611\" height=\"854\" src=\"\/wp-content\/uploads\/2020\/09\/image-7.png\" alt=\"\" class=\"wp-image-1271\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-7.png 611w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-7-215x300.png 215w\" sizes=\"auto, (max-width: 611px) 100vw, 611px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Do the same for the second account, changing the account name and the secret name.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"614\" height=\"930\" src=\"\/wp-content\/uploads\/2020\/09\/image-8.png\" alt=\"\" class=\"wp-image-1272\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-8.png 614w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-8-198x300.png 198w\" sizes=\"auto, (max-width: 614px) 100vw, 614px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">That&#8217;s the setup of the data factory. Next step is to actually build the copy activity:<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Building the copy activity<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">To start building the copy activity, select the copy data wizard in the ADF wizard getting started page.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"685\" src=\"\/wp-content\/uploads\/2020\/09\/image-9-1024x685.png\" alt=\"\" class=\"wp-image-1273\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-9-1024x685.png 1024w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-9-300x201.png 300w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-9-768x514.png 768w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-9.png 1144w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">This is a guided wizard that will walk us through the copy activity. It&#8217;s pretty straightforward and well explained. Let me walk you through it:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Step one is to provide the metadata of the copy activity. I configured mine to run once every 15 minutes.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"690\" height=\"581\" src=\"\/wp-content\/uploads\/2020\/09\/image-10.png\" alt=\"\" class=\"wp-image-1274\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-10.png 690w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-10-300x253.png 300w\" sizes=\"auto, (max-width: 690px) 100vw, 690px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Next, you&#8217;ll select the source. Select the Read blob connection we created earlier:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"983\" height=\"495\" src=\"\/wp-content\/uploads\/2020\/09\/image-11.png\" alt=\"\" class=\"wp-image-1275\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-11.png 983w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-11-300x151.png 300w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-11-768x387.png 768w\" sizes=\"auto, (max-width: 983px) 100vw, 983px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Then, we&#8217;ll provide additional details for the read blob connection. I configured the file loading behavior to be based on the <code>LastModifiedDate<\/code> and to do an actual binary copy without compression.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"759\" height=\"528\" src=\"\/wp-content\/uploads\/2020\/09\/image-12.png\" alt=\"\" class=\"wp-image-1276\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-12.png 759w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-12-300x209.png 300w\" sizes=\"auto, (max-width: 759px) 100vw, 759px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Next up, we&#8217;ll select the write blob connection we configured earlier as the destination of this copy activity.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"997\" height=\"511\" src=\"\/wp-content\/uploads\/2020\/09\/image-13.png\" alt=\"\" class=\"wp-image-1277\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-13.png 997w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-13-300x154.png 300w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-13-768x394.png 768w\" sizes=\"auto, (max-width: 997px) 100vw, 997px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">This also asks you for additional configuration information. In my case, I only provided the target container to store the data into.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"426\" src=\"\/wp-content\/uploads\/2020\/09\/image-14-1024x426.png\" alt=\"\" class=\"wp-image-1278\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-14-1024x426.png 1024w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-14-300x125.png 300w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-14-768x319.png 768w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-14.png 1210w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Then we&#8217;ll provide additional settings for the copy activity. I configured additional data consistency verification and provided a container in my writing storage account to write logs to. <em>(If you followed along earlier in the blob, you&#8217;ll remember that the data consistency verification will actually cause my pipeline to fail because I don&#8217;t have read permissions in the target storage account. We&#8217;ll change this later on).<\/em><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1018\" height=\"546\" src=\"\/wp-content\/uploads\/2020\/09\/image-15.png\" alt=\"\" class=\"wp-image-1279\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-15.png 1018w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-15-300x161.png 300w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-15-768x412.png 768w\" sizes=\"auto, (max-width: 1018px) 100vw, 1018px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Finally, we&#8217;ll get a summary (no screenshot) and we can deploy the copy operation. This will create a pipeline in ADF.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"435\" src=\"\/wp-content\/uploads\/2020\/09\/image-16-1024x435.png\" alt=\"\" class=\"wp-image-1280\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-16-1024x435.png 1024w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-16-300x127.png 300w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-16-768x326.png 768w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-16.png 1290w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">To have a file to copy, I uploaded a file to blob storage that will be picked up by ADF in 15 minutes when it runs. <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"610\" height=\"378\" src=\"\/wp-content\/uploads\/2020\/09\/image-17.png\" alt=\"\" class=\"wp-image-1281\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-17.png 610w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-17-300x186.png 300w\" sizes=\"auto, (max-width: 610px) 100vw, 610px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">After waiting a couple minutes, the ADF was triggered, and I could see the file appear in my destination store. (which is good) <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"703\" height=\"408\" src=\"\/wp-content\/uploads\/2020\/09\/image-18.png\" alt=\"\" class=\"wp-image-1282\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-18.png 703w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-18-300x174.png 300w\" sizes=\"auto, (max-width: 703px) 100vw, 703px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">However, checking the ADF logs showed an error:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"680\" height=\"215\" src=\"\/wp-content\/uploads\/2020\/09\/image-19.png\" alt=\"\" class=\"wp-image-1283\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-19.png 680w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-19-300x95.png 300w\" sizes=\"auto, (max-width: 680px) 100vw, 680px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">This shows that ADF couldn&#8217;t retrieve the sink file, meaning the file in the target store. Let&#8217;s have a look at solving the error:<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Solving the data consistency error (can&#8217;t read from sink)<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">There are two ways to solve this error:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Add read permissions to the SAS token.<\/li><li>Disable consistency checks.<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">In my case, I actually went ahead and added read permissions to the SAS token. I did this using PowerShell (this is part of the comments below in the GitHub script btw.)<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Change SAS token to include read for second account\nRemove-AzKeyVaultManagedStorageSasDefinition -AccountName $staccname2 -VaultName $kvname `\n-Name $SASDefinitionName2\n\n\n$storageContext = New-AzStorageContext -StorageAccountName $staccname2 -Protocol Https -StorageAccountKey Key1 \n$start = [System.DateTime]::Now.AddDays(-1)\n$end = [System.DateTime]::Now.AddMonths(1)\n\n$sasToken = New-AzStorageAccountSasToken -Service blob -ResourceType Container,Object -Permission \"wlr\" -Protocol HttpsOnly -StartTime $start -ExpiryTime $end -Context $storageContext\n\n# Need to give this a new name due to default key-vault soft delete behavior\n$newSasName = $SASDefinitionName2 + \"bis\"\nSet-AzKeyVaultManagedStorageSasDefinition -AccountName $staccname2 -VaultName $kvname `\n-Name $newSasName -TemplateUri $sasToken -SasType 'account' -ValidityPeriod ([System.Timespan]::FromDays(1))\n\n$secret = Get-AzKeyVaultSecret -VaultName $kvname -Name \"$staccname2-$newSasName\"\n$secret.SecretValueText<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">As you can see in the script above, I actually needed to change the name of the SASDefinition. I believe this is due to default Key Vault soft delete behavior. Because I was a little lazy, I just gave it a new name. (<em>in all honesty, I wasn&#8217;t 100% lazy. I tried hard deleted the soft deleted SAS definition, but it appears Az PowerShell doesn&#8217;t support this yet. I plan to open a GitHub once this blog is posted).<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">With the new secret created, we need to change the storage account 2 definition. We&#8217;ll do that in the linked services. There we&#8217;ll provide the updated secret name:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"612\" height=\"926\" src=\"\/wp-content\/uploads\/2020\/09\/image-20.png\" alt=\"\" class=\"wp-image-1284\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-20.png 612w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-20-198x300.png 198w\" sizes=\"auto, (max-width: 612px) 100vw, 612px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">And with that out of the way, let&#8217;s test our pipeline again. I uploaded a second file to our read container:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"519\" height=\"374\" src=\"\/wp-content\/uploads\/2020\/09\/image-21.png\" alt=\"\" class=\"wp-image-1285\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-21.png 519w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-21-300x216.png 300w\" sizes=\"auto, (max-width: 519px) 100vw, 519px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">This time, I don&#8217;t want to wait 15 minutes for the pipeline to trigger. In stead, I&#8217;ll head over to the designer, select our pipeline and hit the debug button:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"548\" src=\"\/wp-content\/uploads\/2020\/09\/image-22-1024x548.png\" alt=\"\" class=\"wp-image-1286\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-22-1024x548.png 1024w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-22-300x161.png 300w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-22-768x411.png 768w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-22.png 1153w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">And this time, the copy job succeeded succesfully:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"630\" height=\"203\" src=\"\/wp-content\/uploads\/2020\/09\/image-23.png\" alt=\"\" class=\"wp-image-1287\" srcset=\"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-23.png 630w, https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-23-300x97.png 300w\" sizes=\"auto, (max-width: 630px) 100vw, 630px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Summary<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In this post we explored how we can use SAS tokens provided by Key Vault to move data between storage accounts. We configured Key Vault to manage 2 storage accounts, and configured fine grained control in the SAS token on which permissions were allowed on which account. We hit a small issue with the data consistency check, but were able to solve this by editing the write SAS policy.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I am working with a customer right now that is doing a lot of work with Azure Data Factory (ADF). ADF is a powerful cloud based data integration tool that lets you move data from a multitude of source, process that data and store it in a target data store. You can think of it [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1280,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[2],"tags":[8,148,75,74,149,67],"class_list":["post-1261","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-azure","tag-azure","tag-azure-data-factory","tag-blob-storage","tag-data-engineering","tag-etl","tag-storage"],"jetpack_featured_media_url":"https:\/\/nillsfblog.blob.core.windows.net\/media\/2020\/09\/image-16.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/blog.nillsf.com\/index.php\/wp-json\/wp\/v2\/posts\/1261","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.nillsf.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.nillsf.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.nillsf.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.nillsf.com\/index.php\/wp-json\/wp\/v2\/comments?post=1261"}],"version-history":[{"count":3,"href":"https:\/\/blog.nillsf.com\/index.php\/wp-json\/wp\/v2\/posts\/1261\/revisions"}],"predecessor-version":[{"id":1290,"href":"https:\/\/blog.nillsf.com\/index.php\/wp-json\/wp\/v2\/posts\/1261\/revisions\/1290"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.nillsf.com\/index.php\/wp-json\/wp\/v2\/media\/1280"}],"wp:attachment":[{"href":"https:\/\/blog.nillsf.com\/index.php\/wp-json\/wp\/v2\/media?parent=1261"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.nillsf.com\/index.php\/wp-json\/wp\/v2\/categories?post=1261"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.nillsf.com\/index.php\/wp-json\/wp\/v2\/tags?post=1261"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}