How to compare strings in Bash

Aug 07, 2023#bash

String manipulation is a frequent and crucial task in various programming scenarios. One frequently encountered requirement is comparing strings, which helps ascertain their equality or establish their lexicographical order. Such comparisons hold significant importance in scripting activities, like validating user input, pattern matching, and data sorting.

In Bash, you can compare strings using various operators and constructs. Here are the common methods to compare strings:

  • Using = and == to check if the strings are equal.
  • Using != to check if the strings are not equal.
  • Using =~ to check if the string matches the extended regular expression.
  • Using < and > to compare the strings in lexicographical (alphabetical) order.
  • Using -z and -n to check if the string length is zero or non-zero.

Here are some of the best practices when comparing strings in Bash:

  • Use double quotes around the variables to avoid any word splitting or globbing issues. For example, [[ "$VAR1" == "$VAR2" ]] is correct, but [[ $VAR1 == $VAR2 ]] is not.
  • The double square brackets [[ ... ]] are used for more advanced pattern matching and can be useful when dealing with wildcards or regular expressions. Remember that using [[ ... ]] is not POSIX compliant and may not work in some non-Bash shells.
  • For simple string comparisons, it’s safer to use single square brackets [ ... ] with =, !=, <, and > operators.

Empty checks

Empty strings are strings that have zero length, meaning they contain no characters. Sometimes, it is useful to check if a string is empty before performing some actions or operations on it. For example, you may want to validate the user input, or avoid errors when concatenating strings.

There are different ways to check if a string is empty in bash, but the most common ones are using the -z and -n operators. The -z operator returns true if the string is empty, and false otherwise. The -n operator returns true if the string is not empty, and false otherwise.

#!/bin/bash

# Define a variable with an empty string
var=""

# Check if the variable is empty using -z
if [ -z "$var" ]; then
  echo "The variable is empty"
else
  echo "The variable is not empty"
fi

# Check if the variable is not empty using -n
if [ -n "$var" ]; then
  echo "The variable is not empty"
else
  echo "The variable is empty"
fi

Note that you need to quote the variable when using these operators, otherwise you may get unexpected results or errors.

Equality checks

You can use both = and == to check if the strings are equal, and != to check if the strings are not equal. The = operator is preferred for POSIX compatibility, while the == operator is specific to Bash. All of them are case-sensitive.

#!/bin/bash

string1="Hello"
string2="World"

# Check if string1 is equal to string2
if [ "$string1" = "$string2" ]; then
  echo "Strings are equal."
else
  echo "Strings are not equal."
fi

# Check if string1 is not equal to string2
if [ "$string1" != "$string2" ]; then
  echo "Strings are not equal."
else
  echo "Strings are equal."
fi

To compare strings case-insensitive in Bash, you can use various techniques. One common approach is to convert both strings to either uppercase or lowercase before performing the comparison.

#!/bin/bash

string1="Hello"
string2="hello"

# Convert both strings to lowercase before comparison
if [[ ${string1,,} == ${string2,,} ]]; then
  echo "The strings are equal (case-insensitive)."
else
  echo "The strings are not equal (case-insensitive)."
fi

# Convert both strings to uppercase before comparison
if [[ ${string1^^} == ${string2^^} ]]; then
  echo "The strings are equal (case-insensitive)."
else
  echo "The strings are not equal (case-insensitive)."
fi

Partial comparison

Partial string comparison in bash is a way of checking if a string contains another string or a substring. You can use various techniques for partial string comparisons.

You can use * wildcard character, which represents zero or more characters in a pattern and can be used to match multiple strings that share a common pattern.

#!/bin/bash

string="This is a sample text."
substring="sample"

if [[ "$string" == *"$substring"* ]]; then
  echo "Substring found: $substring"
else
  echo "Substring not found: $substring"
fi

You can also use grep command, which is used for searching and matching patterns within text data. When used for partial string matching, grep searches for occurrences of a specified substring (pattern) within a given text or a file and prints the lines containing the matching substrings.

#!/bin/bash

string="Bash is awesome!"

# Check if the substring "awesome" exists in the larger string using grep
if echo "$string" | grep -q "awesome"; then
  echo "Substring found: awesome"
else
  echo "Substring not found: awesome"
fi

Lexicographical comparisons

Lexicographical comparison, also known as dictionary order or alphabetical order, is a way of comparing strings based on the order of their characters. The comparison is performed character by character, starting from the first character of each string and moving from left to right until a difference is found or one of the strings ends.

The rules for lexicographical comparison are typically based on the ASCII or Unicode values of the characters. In ASCII, each character is assigned a numerical value, and lexicographical comparison is performed based on these numerical values. Characters with lower numerical values come before characters with higher numerical values.

#!/bin/bash

string1="apple"
string2="banana"

if [[ "$string1" < "$string2" ]]; then
  echo "$string1 comes before $string2 in lexicographical order."
else
  echo "$string1 comes after $string2 in lexicographical order."
fi

Pattern matching with extended regular expressions

Using =~ in Bash allows you to check if a string matches an extended regular expression. In the context of pattern matching, both “extended regular expressions” and “regular expressions” refer to different types of syntax used to define patterns for searching and matching strings. Extended regular expressions provide a more powerful and feature-rich syntax, many metacharacters don’t need to be escaped, making the expressions more readable.

#!/bin/bash

email="john.doe@example.com"

# Check if the string matches the pattern for an email address
if [[ "$email" =~ ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ ]]; then
  echo "Valid email address: $email"
else
  echo "Invalid email address: $email"
fi