How to Sort a DataFrame in Descending Order in PySpark

Published Jan 6, 2022  ∙  Updated May 2, 2022

How can we sort a DataFrame in descending order based on a particular column in PySpark?

Suppose we have a DataFrame df with the column col.

We can achieve this with either sort() or orderBy().

Sort using sort() or orderBy()

We can use sort() with col() or desc() to sort in descending order.

Note that all of these examples below can be done using orderBy() instead of sort().

Sort with external libraries

We can sort with col().

from pyspark.sql.functions import col
df.sort(col('col').desc()))

We can also sort with desc().

from pyspark.sql.functions import desc
df.sort(desc('col'))

Both are valid options, but let’s try to avoid external libraries.

Sort without external libraries

df.sort(df.col.desc())
# OR
df.sort('col', ascending=False)

Remember that all of the examples above can be done using orderBy() instead of sort().

Sort multiple columns

Suppose our DataFrame df had two columns instead: col1 and col2.

Let’s sort based on col2 first, then col1, both in descending order.

We’ll see the same code with both sort() and orderBy().

from pyspark.sql.functions import col
df.sort(col("col2").desc, col("col1").desc)
df.orderBy(col("col2").desc, col("col1").desc)

Let’s try without the external libraries.

df.sort(['col2', 'col1'], ascending=[0, 0])
df.orderBy(['col2', 'col1'], ascending=[0, 0])

To whom it may concern: sort() and orderBy() both perform whole ordering of the dataset in this Spark DataFrame API. sort() does not perform partition-wise ordering; sortWithinPartitions() does.