PySpark Date and Time Functions Cheat Sheet for Beginners 🚀
Dive into the world of PySpark with these date and time functions! Whether you’re a newbie or improving your skills, this cheat sheet will be your trusty guide.
1. current_date():
Get the current date in PySpark.
Example:
from pyspark.sql.functions import current_date
df.select(current_date().alias("current_date")).show()
Output:
+------------+
|current_date|
+------------+
| 2023-12-22 |
+------------+
2. current_timestamp()
Fetch the current timestamp in PySpark.
from pyspark.sql.functions import current_timestamp
df.select(current_timestamp().alias("current_timestamp")).show()
Output:
+-------------------+
| current_timestamp |
+-------------------+
| 2023-12-22 12:34:56 |
+-------------------+
3. date_add(date, days)
Add days to a given date in PySpark.
Example:
from pyspark.sql.functions import date_add
df.select(date_add('2023-12-22', 5).alias("added_date")).show()
Output:
+------------+
| added_date |
+------------+
| 2023-12-27 |
+------------+
4. date_sub(date, days)
Subtract days from a given date in PySpark.
Example:
from pyspark.sql.functions import date_sub
df.select(date_sub('2023-12-22', 3).alias("subtracted_date")).show()
+----------------+
| subtracted_date |
+----------------+
| 2023-12-19 |
+----------------+
5. datediff(end_date, start_date)
Calculate the difference in days between two dates in PySpark.
Example:
from pyspark.sql.functions import datediff
df.select(datediff('2023-12-31', '2023-12-22').alias("date_difference")).show()
Output:
+------------------+
| date_difference |
+------------------+
| 9 |
+------------------+
6. months_between(date1, date2)
Calculate the number of months between two dates in PySpark.
Example:
from pyspark.sql.functions import months_between
df.select(months_between('2023-12-31', '2023-12-22').alias("months_bet
Output:
+----------------+
| months_between |
+----------------+
| 0.35484 |
+----------------+
7. trunc(date, format)
Truncate a date in PySpark to a specified format.
Example:
from pyspark.sql.functions import trunc
df.select(trunc('2023-12-22', 'MONTH').alias("truncated_date")).show()
Output:
+---------------+
| truncated_date|
+---------------+
| 2023-12-01 |
+---------------+
8. add_months(start_date, num_months)
Add months to a given date in PySpark.
Example:
from pyspark.sql.functions import add_months
df.select(add_months('2023-12-22', 2).alias("added_months")).show()
Output:
+-------------+
| added_months|
+-------------+
| 2024-02-22|
+-------------+
9. year(date)
Extract the year from a date in PySpark.
Example:
from pyspark.sql.functions import year
df.select(year('2023-12-22').alias("extracted_year")).show()
Output:
+--------------+
| extracted_year|
+--------------+
| 2023 |
+--------------+
10. quarter(date)
Extract the quarter from a date in PySpark.
Example:
from pyspark.sql.functions import quarter
df.select(quarter('2023-12-22').alias("extracted_quarter")).show()
Output:
+-------------------+
| extracted_quarter |
+-------------------+
| 4 |
+-------------------+
11. dayofmonth(date)
Extract the day of the month from a date in PySpark.
Example:
from pyspark.sql.functions import dayofmonth
df.select(dayofmonth('2023-12-22').alias("day_of_month")).show()
Output:
+-------------+
| day_of_month|
+-------------+
| 22 |
+-------------+
12. dayofweek(date)
Extract the day of the week from a date in PySpark.
Example:
from pyspark.sql.functions import dayofweek
df.select(dayofweek('2023-12-22').alias("day_of_week")).show()
Output:
+------------+
| day_of_week|
+------------+
| 6 |
+------------+
13. dayofyear(date)
Extract the day of the year from a date in PySpark.
Example:
from pyspark.sql.functions import dayofyear
df.select(dayofyear('2023-12-22').alias("day_of_year")).show()
Output:
+-------------+
| day_of_year |
+-------------+
| 356 |
+-------------+
14. date_format(date, format)
Format a date in PySpark according to a specified format.
Example:
from pyspark.sql.functions import date_format
df.select(date_format('2023-12-22', 'MM/dd/yyyy').alias("formatted_date")).show()
Output:
+----------------+
| formatted_date |
+----------------+
| 12/22/2023 |
+----------------+
15. to_utc_timestamp()
Converts a timestamp to UTC (Coordinated Universal Time).
Example:
from pyspark.sql.functions import to_utc_timestamp
# Assuming 'timestamp_col' is the timestamp column in the DataFrame
df = df.withColumn("utc_timestamp_col", to_utc_timestamp("timestamp_col", "GMT"))
df.show()
Output:
+-------------------+-------------------+
|timestamp_col |utc_timestamp_col |
+-------------------+-------------------+
|2023-01-01 12:00:00|2023-01-01 20:00:00|
|2023-02-15 18:30:00|2023-02-16 02:30:00|
|2023-04-30 08:45:00|2023-04-30 14:45:00|
+-------------------+-------------------+
Now you have a comprehensive cheat sheet for 15 essential PySpark date and time functions. If you found this cheat sheet helpful, don’t forget to give it a round of applause 👏, follow for more PySpark insights 🚀, and subscribe to stay updated on the latest content! 📬✨ Happy Sparking! 🔥📆
👉 Feel free to share any additional helpful date and time functions! I’m more than happy to include them in future updates. Let’s make this cheat sheet even more valuable together! 🌟💬