How to sort a list of strings case-insensitively in Python

Aug 10, 2024#python

In Python, to sort a list of strings, you can use either sort() and sorted(). The difference between them mainly revolves around how they modify the data and their return values.

  • The sort() method sorts the list in place, meaning it modifies the original list. It does not return a new list but instead returns None. This method can only be used with lists.
  • The sorted() function returns a new sorted list from the elements of any iterable, without modifying the original one. Unlike sort(), sorted() can be used with any iterable, not just lists (e.g., tuples, strings, dictionaries). The original iterable remains unchanged.
list.sort(key=None, reverse=False)
sorted(iterable, key=None, reverse=False)

Both methods support optional parameters like key and reverse to customize the sorting order. You can pass any callable object to the key parameter, as long as it accepts one argument and returns a value that can be compared.

By default, Python’s sorting is case-sensitive, meaning that uppercase letters are considered “less than” lowercase letters, so they appear first when sorting.

words = ["Banana", "apple", "Cherry", "date"]
sorted_words = sorted(words)
print(sorted_words)
# ['Banana', 'Cherry', 'apple', 'date']

If you want a case-insensitive sort, you can use the key parameter with str.lower.

words = ["Banana", "apple", "Cherry", "date"]
sorted_words = sorted(words, key=str.lower)
print(sorted_words)
# ['apple', 'Banana', 'Cherry', 'date']

The key parameter in sort() or sorted() takes a function as an argument. This function is applied to each element in the iterable, and the elements are then sorted based on the return values of this function.

When you pass key=str.lower, it tells Python to sort the items based on their lowercase versions, rather than their original case-sensitive forms. It is helpful when you want alphabetical sorting without distinguishing between uppercase and lowercase letters.

If you want to sort a list of strings in a truly case-insensitive manner, including handling special cases beyond just converting to lowercase, you can use key=str.casefold.

Both str.casefold() and str.lower() convert strings to lowercase, but they have different levels of aggressiveness in handling case-insensitive comparisons.

  • str.lower() is suitable for general lowercase conversion and comparison when dealing primarily with ASCII characters.
  • str.casefold() is recommended for robust case-insensitive comparisons, especially when dealing with text from different languages or character sets.
text = "Straße"  # German word for "street"
print(text.lower())  # Output: straße
print(text.casefold())  # Output: strasse

words = ["straße", "Strasse", "Straße", "Straße", "sstraße"]

print(sorted(words, key=str.lower))
# ['sstraße', 'Strasse', 'straße', 'Straße', 'Straße']

print(sorted(words, key=str.casefold))
# ['sstraße', 'straße', 'Strasse', 'Straße', 'Straße']

In this example, str.lower() converts everything to lowercase but does not handle “ß” as “ss” effectively. This may lead to different sorting behavior compared to str.casefold().

Using str.casefold() ensures that the sorting is more accurate for international characters and handles case-insensitivity in a more linguistically appropriate manner.

Remember both sort() and sorted() function in Python are stable. If two items are equal according to the sorting key, they will appear in the same order in the sorted list as they did in the original list.