Cara mengurutkan array di Bash

Question 1

Saya memiliki array di Bash, misalnya:

array=(a c b f 3 5)

Saya perlu mengurutkan array. Tidak hanya menampilkan konten dengan cara yang diurutkan, tetapi untuk mendapatkan array baru dengan elemen yang diurutkan. Array yang diurutkan baru bisa menjadi yang benar-benar baru atau yang lama.

Question 2

Anda tidak benar-benar membutuhkan banyak kode:

IFS=$'\n' sorted=($(sort <<<"${array[*]}"))
unset IFS

Mendukung spasi dalam elemen (selama itu bukan baris baru), dan berfungsi di Bash 3.x.

misalnya:

$ array=("a c" b f "3 5")
$ IFS=$'\n' sorted=($(sort <<<"${array[*]}")); unset IFS
$ printf "[%s]\n" "${sorted[@]}"
[3 5]
[a c]
[b]
[f]

Catatan: @sorontar telah menunjukkan bahwa kehati-hatian diperlukan jika elemen berisi karakter pengganti seperti *atau ?:

Bagian diurutkan = ($ (...)) menggunakan operator "split and glob". Anda harus mematikan glob: set -fatau set -o noglobatau shopt -op noglobatau elemen dari array *akan diperluas ke daftar file.

Apa yang terjadi:

Hasilnya adalah enam hal puncak yang terjadi dalam urutan ini:

IFS=$'\n'
"${array[*]}"
<<<
sort
sorted=($(...))
unset IFS

Pertama, `IFS=$'\n'`

Ini adalah bagian penting dari operasi kami yang memengaruhi hasil 2 dan 5 dengan cara berikut:

Diberikan:

"${array[*]}" meluas ke setiap elemen yang dibatasi oleh karakter pertama IFS
sorted=() membuat elemen dengan memisahkan setiap karakter IFS

IFS=$'\n' mengatur segala sesuatunya sehingga elemen diperluas menggunakan baris baru sebagai pemisah, dan kemudian dibuat sedemikian rupa sehingga setiap baris menjadi elemen. (yaitu Memisahkan di baris baru.)

Membatasi dengan baris baru penting karena begitulah cara sortkerjanya (pengurutan per baris). Memisahkan hanya dengan baris baru tidak begitu penting, tetapi diperlukan untuk mempertahankan elemen yang berisi spasi atau tab.

Nilai defaultnya IFSadalah spasi , tab , diikuti oleh baris baru , dan tidak sesuai untuk operasi kita.

Selanjutnya, `sort <<<"${array[*]}"`bagian

<<<, disebut di sini string , mengambil perluasan dari "${array[*]}", seperti dijelaskan di atas, dan memasukkannya ke dalam input standar sort.

Dengan contoh kami, sortdiberi makan string berikut ini:

a c
b
f
3 5

Sejak sort diurutkan , menghasilkan:

3 5
a c
b
f

Selanjutnya, `sorted=($(...))`bagian

Bagian $(...), yang disebut substitusi perintah , menyebabkan kontennya ( sort <<<"${array[*]}) berjalan sebagai perintah normal, sementara mengambil keluaran standar yang dihasilkan sebagai literal yang berjalan di mana pun $(...).

Dalam contoh kami, ini menghasilkan sesuatu yang mirip dengan sekadar menulis:

sorted=(3 5
a c
b
f
)

sorted then becomes an array that's created by splitting this literal on every new line.

Finally, the `unset IFS`

This resets the value of IFS to the default value, and is just good practice.

It's to ensure we don't cause trouble with anything that relies on IFS later in our script. (Otherwise we'd need to remember that we've switched things around--something that might be impractical for complex scripts.)

Question 3

Original response:

array=(a c b "f f" 3 5)
readarray -t sorted < <(for a in "${array[@]}"; do echo "$a"; done | sort)

output:

$ for a in "${sorted[@]}"; do echo "$a"; done
3
5
a
b
c
f f

Note this version copes with values that contains special characters or whitespace (except newlines)

Note readarray is supported in bash 4+.

Edit Based on the suggestion by @Dimitre I had updated it to:

readarray -t sorted < <(printf '%s\0' "${array[@]}" | sort -z | xargs -0n1)

which has the benefit of even understanding sorting elements with newline characters embedded correctly. Unfortunately, as correctly signaled by @ruakh this didn't mean the the result of readarray would be correct, because readarray has no option to use NUL instead of regular newlines as line-separators.

Question 4

Here's a pure Bash quicksort implementation:

#!/bin/bash

# quicksorts positional arguments
# return is in array qsort_ret
qsort() {
   local pivot i smaller=() larger=()
   qsort_ret=()
   (($#==0)) && return 0
   pivot=$1
   shift
   for i; do
      if (( i < pivot )); then
         smaller+=( "$i" )
      else
         larger+=( "$i" )
      fi
   done
   qsort "${smaller[@]}"
   smaller=( "${qsort_ret[@]}" )
   qsort "${larger[@]}"
   larger=( "${qsort_ret[@]}" )
   qsort_ret=( "${smaller[@]}" "$pivot" "${larger[@]}" )
}

Use as, e.g.,

$ array=(a c b f 3 5)
$ qsort "${array[@]}"
$ declare -p qsort_ret
declare -a qsort_ret='([0]="3" [1]="5" [2]="a" [3]="b" [4]="c" [5]="f")'

This implementation is recursive… so here's an iterative quicksort:

#!/bin/bash

# quicksorts positional arguments
# return is in array qsort_ret
# Note: iterative, NOT recursive! :)
qsort() {
   (($#==0)) && return 0
   local stack=( 0 $(($#-1)) ) beg end i pivot smaller larger
   qsort_ret=("$@")
   while ((${#stack[@]})); do
      beg=${stack[0]}
      end=${stack[1]}
      stack=( "${stack[@]:2}" )
      smaller=() larger=()
      pivot=${qsort_ret[beg]}
      for ((i=beg+1;i<=end;++i)); do
         if [[ "${qsort_ret[i]}" < "$pivot" ]]; then
            smaller+=( "${qsort_ret[i]}" )
         else
            larger+=( "${qsort_ret[i]}" )
         fi
      done
      qsort_ret=( "${qsort_ret[@]:0:beg}" "${smaller[@]}" "$pivot" "${larger[@]}" "${qsort_ret[@]:end+1}" )
      if ((${#smaller[@]}>=2)); then stack+=( "$beg" "$((beg+${#smaller[@]}-1))" ); fi
      if ((${#larger[@]}>=2)); then stack+=( "$((end-${#larger[@]}+1))" "$end" ); fi
   done
}

In both cases, you can change the order you use: I used string comparisons, but you can use arithmetic comparisons, compare wrt file modification time, etc. just use the appropriate test; you can even make it more generic and have it use a first argument that is the test function use, e.g.,

#!/bin/bash

# quicksorts positional arguments
# return is in array qsort_ret
# Note: iterative, NOT recursive! :)
# First argument is a function name that takes two arguments and compares them
qsort() {
   (($#<=1)) && return 0
   local compare_fun=$1
   shift
   local stack=( 0 $(($#-1)) ) beg end i pivot smaller larger
   qsort_ret=("$@")
   while ((${#stack[@]})); do
      beg=${stack[0]}
      end=${stack[1]}
      stack=( "${stack[@]:2}" )
      smaller=() larger=()
      pivot=${qsort_ret[beg]}
      for ((i=beg+1;i<=end;++i)); do
         if "$compare_fun" "${qsort_ret[i]}" "$pivot"; then
            smaller+=( "${qsort_ret[i]}" )
         else
            larger+=( "${qsort_ret[i]}" )
         fi
      done
      qsort_ret=( "${qsort_ret[@]:0:beg}" "${smaller[@]}" "$pivot" "${larger[@]}" "${qsort_ret[@]:end+1}" )
      if ((${#smaller[@]}>=2)); then stack+=( "$beg" "$((beg+${#smaller[@]}-1))" ); fi
      if ((${#larger[@]}>=2)); then stack+=( "$((end-${#larger[@]}+1))" "$end" ); fi
   done
}

Then you can have this comparison function:

compare_mtime() { [[ $1 -nt $2 ]]; }

and use:

$ qsort compare_mtime *
$ declare -p qsort_ret

to have the files in current folder sorted by modification time (newest first).

NOTE. These functions are pure Bash! no external utilities, and no subshells! they are safe wrt any funny symbols you may have (spaces, newline characters, glob characters, etc.).

Question 5

If you don't need to handle special shell characters in the array elements:

array=(a c b f 3 5)
sorted=($(printf '%s\n' "${array[@]}"|sort))

With bash you'll need an external sorting program anyway.

With zsh no external programs are needed and special shell characters are easily handled:

% array=('a a' c b f 3 5); printf '%s\n' "${(o)array[@]}" 
3
5
a a
b
c
f

ksh has set -s to sort ASCIIbetically.

Question 6

tl;dr:

Sort array a_in and store the result in a_out (elements must not have embedded newlines^[1] ):

Bash v4+:

readarray -t a_out < <(printf '%s\n' "${a_in[@]}" | sort)

Bash v3:

IFS=$'\n' read -d '' -r -a a_out < <(printf '%s\n' "${a_in[@]}" | sort)

Advantages over antak's solution:

You needn't worry about accidental globbing (accidental interpretation of the array elements as filename patterns), so no extra command is needed to disable globbing (set -f, and set +f to restore it later).
You needn't worry about resetting IFS with unset IFS.^[2]

Optional reading: explanation and sample code

The above combines Bash code with external utility sort for a solution that works with arbitrary single-line elements and either lexical or numerical sorting (optionally by field):

Performance: For around 20 elements or more, this will be faster than a pure Bash solution - significantly and increasingly so once you get beyond around 100 elements.
^{(The exact thresholds will depend on your specific input, machine, and platform.)}
- The reason it is fast is that it avoids Bash loops.
printf '%s\n' "${a_in[@]}" | sort performs the sorting (lexically, by default - see sort's POSIX spec):
- "${a_in[@]}" safely expands to the elements of array a_in as individual arguments, whatever they contain (including whitespace).
- printf '%s\n' then prints each argument - i.e., each array element - on its own line, as-is.
Note the use of a process substitution (<(...)) to provide the sorted output as input to read / readarray (via redirection to stdin, <), because read / readarray must run in the current shell (must not run in a subshell) in order for output variable a_out to be visible to the current shell (for the variable to remain defined in the remainder of the script).
Reading sort's output into an array variable:
- Bash v4+: readarray -t a_out reads the individual lines output by sort into the elements of array variable a_out, without including the trailing \n in each element (-t).
- Bash v3: readarray doesn't exist, so read must be used:
  IFS=$'\n' read -d '' -r -a a_out tells read to read into array (-a) variable a_out, reading the entire input, across lines (-d ''), but splitting it into array elements by newlines (IFS=$'\n'. $'\n', which produces a literal newline (LF), is a so-called ANSI C-quoted string).
  (-r, an option that should virtually always be used with read, disables unexpected handling of \ characters.)

Annotated sample code:

#!/usr/bin/env bash

# Define input array `a_in`:
# Note the element with embedded whitespace ('a c')and the element that looks like
# a glob ('*'), chosen to demonstrate that elements with line-internal whitespace
# and glob-like contents are correctly preserved.
a_in=( 'a c' b f 5 '*' 10 )

# Sort and store output in array `a_out`
# Saving back into `a_in` is also an option.
IFS=$'\n' read -d '' -r -a a_out < <(printf '%s\n' "${a_in[@]}" | sort)
# Bash 4.x: use the simpler `readarray -t`:
# readarray -t a_out < <(printf '%s\n' "${a_in[@]}" | sort)

# Print sorted output array, line by line:
printf '%s\n' "${a_out[@]}"

Due to use of sort without options, this yields lexical sorting (digits sort before letters, and digit sequences are treated lexically, not as numbers):

*
10
5
a c
b
f

If you wanted numerical sorting by the 1st field, you'd use sort -k1,1n instead of just sort, which yields (non-numbers sort before numbers, and numbers sort correctly):

*
a c
b
f
5
10

^{[1] To handle elements with embedded newlines, use the following variant (Bash v4+, with GNU sort):

readarray -d '' -t a_out < <(printf '%s\0' "${a_in[@]}" | sort -z).

Michał Górny's helpful answer has a Bash v3 solution.}

^{[2] While IFS is set in the Bash v3 variant, the change is scoped to the command.

By contrast, what follows IFS=$'\n' in antak's answer is an assignment rather than a command, in which case the IFS change is global.}

Question 7

In the 3-hour train trip from Munich to Frankfurt (which I had trouble to reach because Oktoberfest starts tomorrow) I was thinking about my first post. Employing a global array is a much better idea for a general sort function. The following function handles arbitary strings (newlines, blanks etc.):

declare BSORT=()
function bubble_sort()
{   #
    # @param [ARGUMENTS]...
    #
    # Sort all positional arguments and store them in global array BSORT.
    # Without arguments sort this array. Return the number of iterations made.
    #
    # Bubble sorting lets the heaviest element sink to the bottom.
    #
    (($# > 0)) && BSORT=("$@")
    local j=0 ubound=$((${#BSORT[*]} - 1))
    while ((ubound > 0))
    do
        local i=0
        while ((i < ubound))
        do
            if [ "${BSORT[$i]}" \> "${BSORT[$((i + 1))]}" ]
            then
                local t="${BSORT[$i]}"
                BSORT[$i]="${BSORT[$((i + 1))]}"
                BSORT[$((i + 1))]="$t"
            fi
            ((++i))
        done
        ((++j))
        ((--ubound))
    done
    echo $j
}

bubble_sort a c b 'z y' 3 5
echo ${BSORT[@]}

This prints:

3 5 a b c z y

The same output is created from

BSORT=(a c b 'z y' 3 5) 
bubble_sort
echo ${BSORT[@]}

Note that probably Bash internally uses smart-pointers, so the swap-operation could be cheap (although I doubt it). However, bubble_sort demonstrates that more advanced functions like merge_sort are also in the reach of the shell language.

Question 8

Another solution that uses external sort and copes with any special characters (except for NULs :)). Should work with bash-3.2 and GNU or BSD sort (sadly, POSIX doesn't include -z).

local e new_array=()
while IFS= read -r -d '' e; do
    new_array+=( "${e}" )
done < <(printf "%s\0" "${array[@]}" | LC_ALL=C sort -z)

First look at the input redirection at the end. We're using printf built-in to write out the array elements, zero-terminated. The quoting makes sure array elements are passed as-is, and specifics of shell printf cause it to reuse the last part of format string for each remaining parameter. That is, it's equivalent to something like:

for e in "${array[@]}"; do
    printf "%s\0" "${e}"
done

The null-terminated element list is then passed to sort. The -z option causes it to read null-terminated elements, sort them and output null-terminated as well. If you needed to get only the unique elements, you can pass -u since it is more portable than uniq -z. The LC_ALL=C ensures stable sort order independently of locale — sometimes useful for scripts. If you want the sort to respect locale, remove that.

The <() construct obtains the descriptor to read from the spawned pipeline, and < redirects the standard input of the while loop to it. If you need to access the standard input inside the pipe, you may use another descriptor — exercise for the reader :).

Now, back to the beginning. The read built-in reads output from the redirected stdin. Setting empty IFS disables word splitting which is unnecessary here — as a result, read reads the whole 'line' of input to the single provided variable. -r option disables escape processing that is undesired here as well. Finally, -d '' sets the line delimiter to NUL — that is, tells read to read zero-terminated strings.

As a result, the loop is executed once for every successive zero-terminated array element, with the value being stored in e. The example just puts the items in another array but you may prefer to process them directly :).

Of course, that's just one of the many ways of achieving the same goal. As I see it, it is simpler than implementing complete sorting algorithm in bash and in some cases it will be faster. It handles all special characters including newlines and should work on most of the common systems. Most importantly, it may teach you something new and awesome about bash :).

Question 9

try this:

echo ${array[@]} | awk 'BEGIN{RS=" ";} {print $1}' | sort

Output will be:

3
5
a
b
c
f

Problem solved.

Question 10

If you can compute a unique integer for each element in the array, like this:

tab='0123456789abcdefghijklmnopqrstuvwxyz'

# build the reversed ordinal map
for ((i = 0; i < ${#tab}; i++)); do
    declare -g ord_${tab:i:1}=$i
done

function sexy_int() {
    local sum=0
    local i ch ref
    for ((i = 0; i < ${#1}; i++)); do
        ch="${1:i:1}"
        ref="ord_$ch"
        (( sum += ${!ref} ))
    done
    return $sum
}

sexy_int hello
echo "hello -> $?"
sexy_int world
echo "world -> $?"

then, you can use these integers as array indexes, because Bash always use sparse array, so no need to worry about unused indexes:

array=(a c b f 3 5)
for el in "${array[@]}"; do
    sexy_int "$el"
    sorted[$?]="$el"
done

echo "${sorted[@]}"

Pros. Fast.
Cons. Duplicated elements are merged, and it can be impossible to map contents to 32-bit unique integers.

Question 11

min sort:

#!/bin/bash
array=(.....)
index_of_element1=0

while (( ${index_of_element1} < ${#array[@]} )); do

    element_1="${array[${index_of_element1}]}"

    index_of_element2=$((index_of_element1 + 1))
    index_of_min=${index_of_element1}

    min_element="${element_1}"

        for element_2 in "${array[@]:$((index_of_element1 + 1))}"; do
            min_element="`printf "%s\n%s" "${min_element}" "${element_2}" | sort | head -n+1`"      
            if [[ "${min_element}" == "${element_2}" ]]; then
                index_of_min=${index_of_element2}
            fi
            let index_of_element2++
        done

        array[${index_of_element1}]="${min_element}"
        array[${index_of_min}]="${element_1}"

    let index_of_element1++
done

Question 12

array=(a c b f 3 5)
new_array=($(echo "${array[@]}" | sed 's/ /\n/g' | sort))    
echo ${new_array[@]}

echo contents of new_array will be:

3 5 a b c f

Question 13

There is a workaround for the usual problem of spaces and newlines:

Use a character that is not in the original array (like $'\1' or $'\4' or similar).

This function gets the job done:

# Sort an Array may have spaces or newlines with a workaround (wa=$'\4')
sortarray(){ local wa=$'\4' IFS=''
             if [[ $* =~ [$wa] ]]; then
                 echo "$0: error: array contains the workaround char" >&2
                 exit 1
             fi

             set -f; local IFS=$'\n' x nl=$'\n'
             set -- $(printf '%s\n' "${@//$nl/$wa}" | sort -n)
             for    x
             do     sorted+=("${x//$wa/$nl}")
             done
       }

This will sort the array:

$ array=( a b 'c d' $'e\nf' $'g\1h')
$ sortarray "${array[@]}"
$ printf '<%s>\n' "${sorted[@]}"
<a>
<b>
<c d>
<e
f>
<gh>

This will complain that the source array contains the workaround character:

$ array=( a b 'c d' $'e\nf' $'g\4h')
$ sortarray "${array[@]}"
./script: error: array contains the workaround char

description

We set two local variables wa (workaround char) and a null IFS
Then (with ifs null) we test that the whole array $*.
Does not contain any woraround char [[ $* =~ [$wa] ]].
If it does, raise a message and signal an error: exit 1
Avoid filename expansions: set -f
Set a new value of IFS (IFS=$'\n') a loop variable x and a newline var (nl=$'\n').
We print all values of the arguments received (the input array $@).
but we replace any new line by the workaround char "${@//$nl/$wa}".
send those values to be sorted sort -n.
and place back all the sorted values in the positional arguments set --.
Then we assign each argument one by one (to preserve newlines).
in a loop for x
to a new array: sorted+=(…)
inside quotes to preserve any existing newline.
restoring the workaround to a newline "${x//$wa/$nl}".
done

Question 14

This question looks closely related. And BTW, here's a mergesort in Bash (without external processes):

mergesort() {
  local -n -r input_reference="$1"
  local -n output_reference="$2"
  local -r -i size="${#input_reference[@]}"
  local merge previous
  local -a -i runs indices
  local -i index previous_idx merged_idx \
           run_a_idx run_a_stop \
           run_b_idx run_b_stop

  output_reference=("${input_reference[@]}")
  if ((size == 0)); then return; fi

  previous="${output_reference[0]}"
  runs=(0)
  for ((index = 0;;)) do
    for ((++index;; ++index)); do
      if ((index >= size)); then break 2; fi
      if [[ "${output_reference[index]}" < "$previous" ]]; then break; fi
      previous="${output_reference[index]}"
    done
    previous="${output_reference[index]}"
    runs+=(index)
  done
  runs+=(size)

  while (("${#runs[@]}" > 2)); do
    indices=("${!runs[@]}")
    merge=("${output_reference[@]}")
    for ((index = 0; index < "${#indices[@]}" - 2; index += 2)); do
      merged_idx=runs[indices[index]]
      run_a_idx=merged_idx
      previous_idx=indices[$((index + 1))]
      run_a_stop=runs[previous_idx]
      run_b_idx=runs[previous_idx]
      run_b_stop=runs[indices[$((index + 2))]]
      unset runs[previous_idx]
      while ((run_a_idx < run_a_stop && run_b_idx < run_b_stop)); do
        if [[ "${merge[run_a_idx]}" < "${merge[run_b_idx]}" ]]; then
          output_reference[merged_idx++]="${merge[run_a_idx++]}"
        else
          output_reference[merged_idx++]="${merge[run_b_idx++]}"
        fi
      done
      while ((run_a_idx < run_a_stop)); do
        output_reference[merged_idx++]="${merge[run_a_idx++]}"
      done
      while ((run_b_idx < run_b_stop)); do
        output_reference[merged_idx++]="${merge[run_b_idx++]}"
      done
    done
  done
}

declare -ar input=({z..a}{z..a})
declare -a output

mergesort input output

echo "${input[@]}"
echo "${output[@]}"

Question 15

I am not convinced that you'll need an external sorting program in Bash.

Here is my implementation for the simple bubble-sort algorithm.

function bubble_sort()
{   #
    # Sorts all positional arguments and echoes them back.
    #
    # Bubble sorting lets the heaviest (longest) element sink to the bottom.
    #
    local array=($@) max=$(($# - 1))
    while ((max > 0))
    do
        local i=0
        while ((i < max))
        do
            if [ ${array[$i]} \> ${array[$((i + 1))]} ]
            then
                local t=${array[$i]}
                array[$i]=${array[$((i + 1))]}
                array[$((i + 1))]=$t
            fi
            ((i += 1))
        done
        ((max -= 1))
    done
    echo ${array[@]}
}

array=(a c b f 3 5)
echo " input: ${array[@]}"
echo "output: $(bubble_sort ${array[@]})"

This shall print:

 input: a c b f 3 5
output: 3 5 a b c f

Question 16

a=(e b 'c d')
shuf -e "${a[@]}" | sort >/tmp/f
mapfile -t g </tmp/f

Question 17

sorted=($(echo ${array[@]} | tr " " "\n" | sort))

In the spirit of bash / linux, I would pipe the best command-line tool for each step. sort does the main job but needs input separated by newline instead of space, so the very simple pipeline above simply does:

Echo array content --> replace space by newline --> sort

$() is to echo the result

($()) is to put the "echoed result" in an array

Note: as @sorontar mentioned in a comment to a different question:

The sorted=($(...)) part is using the "split and glob" operator. You should turn glob off: set -f or set -o noglob or shopt -op noglob or an element of the array like * will be expanded to a list of files.

Cara mengurutkan array di Bash

Apa yang terjadi:

Pertama, IFS=$'\n'

Selanjutnya, sort <<<"${array[*]}"bagian

Selanjutnya, sorted=($(...))bagian

Finally, the unset IFS