Thursday 29 October 2020

Azure Data Factory - Expressions for Strings, Dates and More

 

Summary


This blog lists some useful expressions to be used in Azure Data Factory such as manipulating and retrieving details from Strings, dates or passing and using outputs in Activities

Solution


Handling Strings 


 

Scenario and Expression

1

Expression to Get the first name from Full Name

FullName: Vaibhav Chaudhari
Output: Vaibhav

@split(pipeline().parameters.FullName, ' ') [0]

2

Expression to Get the last name from Full Name

FullName: Vaibhav Chaudhari
Output: Chaudhari

@split(pipeline().parameters.FullName, ' ') [1]

3

Expression to Get the string between second and third slash

FolderPath: Sales/Products/2019/12/10
Output: 2019

@split(pipeline().parameters.FolderPath, '/') [2]

4


Expression to Remove Spaces from a String

Title: The First King
Output: TheFirstKing

@replace(pipeline().parameters.Title, ' ', '')

5


Expression to Pass Pipeline details in WHERE clause of a SQL Query

SELECT PipelineName, Column1, Column2
FROM MyTable
WHERE PipelineName = '@{pipeline().Pipeline}'
 

6


Expression to Append current timestamp to file name

FilePath: Sales/Products/2019/12/10/product.txt
Output:  Sales/Products/2019/12/10/product_2019-12-16.txt

@concat(replace(pipeline().parameters.FilePath, '.txt', ''), 
'_', formatDateTime(utcnow(), 'yyyy-MM-dd'), '.txt')

 7


Expression to remove file extension from a file name


FilePath: product_2019-12-16.txt
Output:  product_2019-12-16.

@substring(pipeline().parameters.FileName, 0, 
lastindexof(pipeline().parameters.FileName, '.'))


DateTime Related


 

Scenario & Expression

1


Expression to Find first Day of current Month:

@startOfMonth(utcnow(), 'yyyy-MM-dd')

2


Expression to Find last day of Current Month:

@adddays(startOfMonth(addToTime(utcnow(), 1, 'Month')), - 1,
'yyyy-MM-dd')

3

Expression to Find first Day of Next Month

@startOfMonth(addToTime(utcnow(), 1, 'Month'), 'yyyy-MM-dd')

4


Expression to Find last day of Next Month

@adddays(addToTime(startOfMonth(utcnow()), 2, 'Month'), - 1,
'yyyy-MM-dd')

5

Expression to Find first Day of Previous Month

@startOfMonth(addToTime(utcnow(), - 1, 'Month'), 'yyyy-MM-dd')

6

Expression to Find last day of Previous Month:

@adddays(startOfMonth(utcnow()), - 1, 'yyyy-MM-dd')


Tuesday 27 October 2020

Get Azure Data Lake Gen2 Container, Folder Size


Problem


How to get Azure data lake gen 2 container or folder size 

Solution


PowerShell AZ Module has to be installed to run below commands

Run below code to connect to your Azure account

Connect-AzAccount

To get the container total size, specify the container (file system) name in the variable at first line and run the code

$myfilesystem = "adventureworks"

$ctx = New-AzStorageContext -StorageAccountName "vcadlsgen2" -UseConnectedAccount

$Files = Get-AzDataLakeGen2ChildItem -Context $ctx -FileSystem $myfilesystem -Recurse `

            | Where-Object IsDirectory -eq $false

$Total = $Files | Measure-Object -Property Length -Sum

$Total | Select-Object @{Name = "SizeInBytes"; Expression={$_.Sum}},

                       @{Name = "SizeInKB"; Expression={$_.Sum/1KB}}  


Output - 



To get the folder size, specify the container (file system) name as well as folder path in the variable

$myfilesystem = "adventureworks"

$mypath = "Address"

$ctx = New-AzStorageContext -StorageAccountName "vcadlsgen2" -UseConnectedAccount

$Files = Get-AzDataLakeGen2ChildItem -Context $ctx -FileSystem $myfilesystem -Path $mypath -Recurse | Where-Object IsDirectory -eq $false

$Total = $Files | Measure-Object -Property Length -Sum

$Total | Select-Object @{Name = "SizeInBytes"; Expression={$_.Sum}},

                       @{Name = "SizeInKB"; Expression={$_.Sum/1KB}} 


Output -






Monday 1 June 2020

Get Azure Data Lake Gen1 Folder Size and files count


Problem


How to get Azure data lake gen 1 folder size and files count in it

Solution


Option 1 - Azure Storage explorer

1. Install Azure storage explorer, log in to your Azure subscription and open the target folder.

2.  Click on Folder Statistics button on top and you should receive the number of files, subfolders and total folder size in bytes



Option 2 - PowerShell

1. Install Az.Datalakestore Module in order to run Azure data lake gen1 related PowerShell commands using below script

Install-Module -Name Az.Datalakestore -AllowClobber -Force

2. Edit below script to enter your Service Principal Id & Secret and run

$TenantId = "<<YourTenantId>>"
$ServicePrincipalId = "<<YourServicePrincipalId>>"
$ServicePrincipalKey = "<<ServicePrincipalKey>>"
$SecurePassword = ConvertTo-SecureString $ServicePrincipalKey -AsPlainText -Force
$Credential = New-Object System.Management.Automation.PSCredential ( $ServicePrincipalId, $SecurePassword)

Login-AzAccount -ServicePrincipal  -TenantId $TenantId -Credential $Credential

$dataLakeStore = "YourADLSGen1AccountName"
$inputPath = "/YourFolder"

Get-AzDataLakeStoreChildItemSummary -Account $dataLakeStore -Path $inputPath

3. It should return details like below