PySpark Date and Time Functions Cheat Sheet for Beginners 🚀

Dhruv Singhal
3 min readDec 22, 2023

--

Dive into the world of PySpark with these date and time functions! Whether you’re a newbie or improving your skills, this cheat sheet will be your trusty guide.

1. current_date():

Get the current date in PySpark.

Example:

from pyspark.sql.functions import current_date

df.select(current_date().alias("current_date")).show()

Output:

+------------+
|current_date|
+------------+
| 2023-12-22 |
+------------+

2. current_timestamp()

Fetch the current timestamp in PySpark.

from pyspark.sql.functions import current_timestamp

df.select(current_timestamp().alias("current_timestamp")).show()

Output:

+-------------------+
| current_timestamp |
+-------------------+
| 2023-12-22 12:34:56 |
+-------------------+

3. date_add(date, days)

Add days to a given date in PySpark.

Example:

from pyspark.sql.functions import date_add

df.select(date_add('2023-12-22', 5).alias("added_date")).show()

Output:

+------------+
| added_date |
+------------+
| 2023-12-27 |
+------------+

4. date_sub(date, days)

Subtract days from a given date in PySpark.

Example:

from pyspark.sql.functions import date_sub

df.select(date_sub('2023-12-22', 3).alias("subtracted_date")).show()
+----------------+
| subtracted_date |
+----------------+
| 2023-12-19 |
+----------------+

5. datediff(end_date, start_date)

Calculate the difference in days between two dates in PySpark.

Example:

from pyspark.sql.functions import datediff

df.select(datediff('2023-12-31', '2023-12-22').alias("date_difference")).show()

Output:

+------------------+
| date_difference |
+------------------+
| 9 |
+------------------+

6. months_between(date1, date2)

Calculate the number of months between two dates in PySpark.

Example:

from pyspark.sql.functions import months_between

df.select(months_between('2023-12-31', '2023-12-22').alias("months_bet

Output:

+----------------+
| months_between |
+----------------+
| 0.35484 |
+----------------+

7. trunc(date, format)

Truncate a date in PySpark to a specified format.

Example:

from pyspark.sql.functions import trunc

df.select(trunc('2023-12-22', 'MONTH').alias("truncated_date")).show()

Output:

+---------------+
| truncated_date|
+---------------+
| 2023-12-01 |
+---------------+

8. add_months(start_date, num_months)

Add months to a given date in PySpark.

Example:

from pyspark.sql.functions import add_months

df.select(add_months('2023-12-22', 2).alias("added_months")).show()

Output:

+-------------+
| added_months|
+-------------+
| 2024-02-22|
+-------------+

9. year(date)

Extract the year from a date in PySpark.

Example:

from pyspark.sql.functions import year

df.select(year('2023-12-22').alias("extracted_year")).show()

Output:

+--------------+
| extracted_year|
+--------------+
| 2023 |
+--------------+

10. quarter(date)

Extract the quarter from a date in PySpark.

Example:

from pyspark.sql.functions import quarter

df.select(quarter('2023-12-22').alias("extracted_quarter")).show()

Output:

+-------------------+
| extracted_quarter |
+-------------------+
| 4 |
+-------------------+

11. dayofmonth(date)

Extract the day of the month from a date in PySpark.

Example:

from pyspark.sql.functions import dayofmonth

df.select(dayofmonth('2023-12-22').alias("day_of_month")).show()

Output:

+-------------+
| day_of_month|
+-------------+
| 22 |
+-------------+

12. dayofweek(date)

Extract the day of the week from a date in PySpark.

Example:

from pyspark.sql.functions import dayofweek

df.select(dayofweek('2023-12-22').alias("day_of_week")).show()

Output:

+------------+
| day_of_week|
+------------+
| 6 |
+------------+

13. dayofyear(date)

Extract the day of the year from a date in PySpark.

Example:

from pyspark.sql.functions import dayofyear

df.select(dayofyear('2023-12-22').alias("day_of_year")).show()

Output:

+-------------+
| day_of_year |
+-------------+
| 356 |
+-------------+

14. date_format(date, format)

Format a date in PySpark according to a specified format.

Example:

from pyspark.sql.functions import date_format

df.select(date_format('2023-12-22', 'MM/dd/yyyy').alias("formatted_date")).show()

Output:

+----------------+
| formatted_date |
+----------------+
| 12/22/2023 |
+----------------+

15. to_utc_timestamp()

Converts a timestamp to UTC (Coordinated Universal Time).

Example:

from pyspark.sql.functions import to_utc_timestamp

# Assuming 'timestamp_col' is the timestamp column in the DataFrame
df = df.withColumn("utc_timestamp_col", to_utc_timestamp("timestamp_col", "GMT"))
df.show()

Output:

+-------------------+-------------------+
|timestamp_col |utc_timestamp_col |
+-------------------+-------------------+
|2023-01-01 12:00:00|2023-01-01 20:00:00|
|2023-02-15 18:30:00|2023-02-16 02:30:00|
|2023-04-30 08:45:00|2023-04-30 14:45:00|
+-------------------+-------------------+

Now you have a comprehensive cheat sheet for 15 essential PySpark date and time functions. If you found this cheat sheet helpful, don’t forget to give it a round of applause 👏, follow for more PySpark insights 🚀, and subscribe to stay updated on the latest content! 📬✨ Happy Sparking! 🔥📆

👉 Feel free to share any additional helpful date and time functions! I’m more than happy to include them in future updates. Let’s make this cheat sheet even more valuable together! 🌟💬

--

--

Dhruv Singhal
Dhruv Singhal

Written by Dhruv Singhal

Data engineer with expertise in PySpark, SQL, Flask. Skilled in Databricks, Snowflake, and Datafactory. Published articles. Passionate about tech and games.

No responses yet