Azure Data Factory – Copy Data – SQL to Azure Data Lake

Learn how to use the Azure Data Factory, Copy Data activity, for copying SQL into an Azure Data Lake CSV..

Setup SQL Data

I’ve created a table with the following schema in Azure SQL:

CREATE TABLE Building (

Building_Id VARCHAR(5) PRIMARY KEY,
Location VARCHAR(50),
Height_meters INT,
Usage VARCHAR(50)
);

Afterwards, the following entries were added:

INSERT INTO Building (Building_Id, Location, Height_meters, Usage)

VALUES
('B001', 'New York City', 381, 'Office'),
('B002', 'Tokyo', 634, 'Residential'),
('B003', 'London', 310, 'Hotel'),
('B004', 'Dubai', 828, 'Commercial');

The Maintenance Logs from the previous post, https://blog.westmorr.com/2024/01/08/azure-data-factory-copy-data-multiple-files/, were also added. The following schema was used for the Maintenance Logs table.

CREATE TABLE Maintenance_Logs (

Maintenance_Log_Id INT PRIMARY KEY IDENTITY(1,1),
Building_Id VARCHAR(5),
Log_Date DATE,
Maintenance_Type VARCHAR(50),
Cost DECIMAL(10, 2),
CONSTRAINT fk_Building_Id
FOREIGN KEY (Building_Id)
REFERENCES Building (Building_Id)
);

Together, the data I need from Building and Maintenance_Logs tables are as follows:

The final step was to create a SPROC which lets me pass in an optional Building_Id into that LEFT JOIN as well as a view which mirrors the LEFT JOIN above (without the ORDER BY, of course).

Setup the Azure Data Factory Pipeline

Create the SQL Dataset

The first step is to add a linked service to SQL. This post assumes the linked service to Azure Data Lake are already created. With that done, the Copy Data can be added with a new Dataset.

Note that there is no ‘Table name’ specified in the data set. The Source settings for the Data Copy activity will be used to specify a SQL statement, and later on, a SPROC with a dynamic parameter.

Specify Copy Data Activity Source

The Source uses the new dataset, and the Query option is enabled and configured to use the new view:

Use the “Preview data” feature to validate:

There is already a dataset for the Azure Data Lake output container from previous setup, and that is used for the Sink setting. The ‘File Extension’ field on the Sink tab is set to “.csv”.

After successfully running the Copy data activity, the output is as expected:

Conclusion: Azure SQL Data Factory – Copy Data – SQL into Azure Data Lake

The configuration for the Copy Data activity is nearly the same whether copying from Azure Blob Storage or Azure SQL into Azure Data Lake. Effort was taken in advance to setup Views and SPROCs in SQL for this Copy Data activity and future parameterized pipeline runs, but otherwise the key setting for this solution is to use a Query instead of a pre-configured table name.



Categories: Azure Data Factory

Tags: , ,

Leave a Reply

Discover more from Westmorr Consulting

Subscribe now to keep reading and get access to the full archive.

Continue reading