Beginner’s Bliss: Effortless Export of Delta Tables to Azure Data Lake Storage (ADLS) with PySpark
To export a Delta table to ADLS (Azure Data Lake Storage) using PySpark in Databricks, you can use the delta method deltaLakeTable.write.format(‘delta’) to read the Delta table, and then use the save method to save it to the ADLS path.
Here’s a step-by-step guide on how to export a Delta table to ADLS using PySpark and Databricks:
- First, make sure you have the necessary credentials and access permissions to read the Delta table and write to ADLS.
- Import the necessary libraries:
from pyspark.sql import SparkSession
Create a Spark session:
spark = SparkSession.builder \
.appName("Export Delta Table to ADLS") \
.getOrCreate()
Read the Delta table:
delta_table = spark.read.format("delta").load("path/to/delta_table")
Replace “path/to/delta_table” with the path to your Delta table.
Define the destination path in ADLS where you want to save the data:
adls_path = "adl://<YOUR_ACCOUNT_NAME>.azuredatalakestore.net/path/to/destination"
Replace <YOUR_ACCOUNT_NAME> with your ADLS account name, and specify the desired destination path.
Save the Delta table to ADLS
delta_table.write.format("delta").save(adls_path)
You may also need to provide credentials and other configurations depending on your ADLS setup. If so, you can set those using the spark.conf.set method or by providing options to the write method. For example, if you’re using service principal credentials:
spark.conf.set("spark.databricks.adls.oauth2.clientId", "YOUR_CLIENT_ID")
spark.conf.set("spark.databricks.adls.oauth2.clientSecret", "YOUR_CLIENT_SECRET")
spark.conf.set("spark.databricks.adls.oauth2.accessToken", "YOUR_ACCESS_TOKEN")
Remember to replace “YOUR_CLIENT_ID”, “YOUR_CLIENT_SECRET”, and “YOUR_ACCESS_TOKEN” with your actual credentials.
That’s it! The Delta table should now be exported to the specified ADLS path.
Liked it? Follow for more updates, and share your thoughts and questions in the comments. Happy data engineering!