Sunting: Saya telah memperbaiki kesalahan ketik di regex .. Dibutuhkan '\ x80` bukan \ 80 .
Regex untuk memfilter formulir UTF-8 yang tidak valid, untuk kepatuhan ketat terhadap UTF-8, adalah sebagai berikut
perl -l -ne '/
^( ([\x00-\x7F]) # 1-byte pattern
|([\xC2-\xDF][\x80-\xBF]) # 2-byte pattern
|((([\xE0][\xA0-\xBF])|([\xED][\x80-\x9F])|([\xE1-\xEC\xEE-\xEF][\x80-\xBF]))([\x80-\xBF])) # 3-byte pattern
|((([\xF0][\x90-\xBF])|([\xF1-\xF3][\x80-\xBF])|([\xF4][\x80-\x8F]))([\x80-\xBF]{2})) # 4-byte pattern
)*$ /x or print'
Output (dari baris kunci. Dari Uji 1 ):
Codepoint
=========
00001000 Test=1 mode=strict valid,invalid,fail=(1000,0,0)
0000E000 Test=1 mode=strict valid,invalid,fail=(D800,800,0)
0010FFFF mode=strict test-return=(0,0) valid,invalid,fail=(10F800,800,0)
Q. Bagaimana cara membuat data pengujian untuk menguji regex yang menyaring Unicode yang tidak valid?
A. Buat algoritma uji UTF-8 Anda sendiri, dan patahkan aturannya ...
Catch-22 .. Namun, bagaimana Anda menguji algoritma pengujian Anda?
Regex, di atas, telah diuji (menggunakan iconv
sebagai referensi) untuk setiap nilai integer dari 0x00000
hingga 0x10FFFF
.. Nilai atas ini menjadi nilai integer maksimum dari Unicode Codepoint
Menurut halaman wikipedia UTF-8 ini ,.
- UTF-8 mengkodekan masing-masing 1.112.064 poin kode dalam set karakter Unicode, menggunakan satu hingga empat byte 8-bit
Numeber ini (1.112.064) setara dengan rentang 0x000000
untuk 0x10F7FF
, yang merupakan 0x0800 malu maksimum integer-nilai aktual untuk tertinggi Unicode codepoint:0x10FFFF
Ini blok bilangan bulat yang hilang dari spektrum Unicode codepoints, karena kebutuhan untuk UTF-16 encoding untuk langkah di luar maksud desain aslinya melalui sistem yang disebut pasangan pengganti . Satu blok 0x0800
bilangan bulat telah dicadangkan untuk digunakan oleh UTF-16 .. Blok ini mencakup rentang 0x00D800
hingga 0x00DFFF
. Tidak satu pun dari intimeter ini yang merupakan nilai Unicode legal, dan karenanya nilai UTF-8 tidak valid.
Dalam Uji 1 , regex
telah diuji terhadap setiap angka dalam kisaran Unicode Codepoints, dan cocok dengan tepat hasil iconv
.. yaitu. Nilai valid 0x010F7FF , dan 0x000800 nilai tidak valid.
Namun, masalah sekarang muncul dari, * Bagaimana regex menangani Nilai UTF-8 Out-Of-Range; di atas 0x010FFFF
(UTF-8 dapat meluas hingga 6 byte, dengan nilai integer maksimum 0x7FFFFFFF ?
Untuk menghasilkan nilai byte UTF-8 non-unicode yang diperlukan , saya telah menggunakan perintah berikut:
perl -C -e 'print chr 0x'$hexUTF32BE
Untuk menguji validitasnya (dalam beberapa cara), saya telah menggunakan Gilles'
UTF-8 regex ...
perl -l -ne '/
^( [\000-\177] # 1-byte pattern
|[\300-\337][\200-\277] # 2-byte pattern
|[\340-\357][\200-\277]{2} # 3-byte pattern
|[\360-\367][\200-\277]{3} # 4-byte pattern
|[\370-\373][\200-\277]{4} # 5-byte pattern
|[\374-\375][\200-\277]{5} # 6-byte pattern
)*$ /x or print'
Output dari 'perl's print chr' cocok dengan penyaringan regex Gilles .. Satu memperkuat validitas yang lain .. Saya tidak dapat menggunakan iconv
karena hanya menangani subset Standar Unicode yang valid dari UTF-8 (asli) yang lebih luas standar...
Para biarawati yang terlibat agak besar, jadi saya telah menguji top-of-range, bottom-of-range, dan beberapa pemindaian melangkah dengan peningkatan seperti, 11111, 13579, 33333, 53441 ... Hasilnya semua cocok, jadi sekarang semua yang tersisa adalah untuk menguji regex terhadap nilai-nilai gaya UTF-8 out-of-range (tidak valid untuk Unicode, dan karena itu juga tidak valid untuk UTF-8 yang ketat itu sendiri) ..
Berikut adalah modul tes:
[[ "$(locale charmap)" != "UTF-8" ]] && { echo "ERROR: locale must be UTF-8, but it is $(locale charmap)"; exit 1; }
# Testing the UTF-8 regex
#
# Tests to check that the observed byte-ranges (above) have
# been accurately observed and included in the test code and final regex.
# =========================================================================
: 2 bytes; B2=0 # run-test=1 do-not-test=0
: 3 bytes; B3=0 # run-test=1 do-not-test=0
: 4 bytes; B4=0 # run-test=1 do-not-test=0
: regex; Rx=1 # run-test=1 do-not-test=0
((strict=16)); mode[$strict]=strict # iconv -f UTF-16BE then iconv -f UTF-32BE beyond 0xFFFF)
(( lax=32)); mode[$lax]=lax # iconv -f UTF-32BE only)
# modebits=$strict
# UTF-8, in relation to UTF-16 has invalid values
# modebits=$strict automatically shifts to modebits=$lax
# when the tested integer exceeds 0xFFFF
# modebits=$lax
# UTF-8, in relation to UTF-32, has no restrictione
# Test 1 Sequentially tests a range of Big-Endian integers
# * Unicode Codepoints are a subset ofBig-Endian integers
# ( based on 'iconv' -f UTF-32BE -f UTF-8 )
# Note: strict UTF-8 has a few quirks because of UTF-16
# Set modebits=16 to "strictly" test the low range
Test=1; modebits=$strict
# Test=2; modebits=$lax
# Test=3
mode3wlo=$(( 1*4)) # minimum chars * 4 ( '4' is for UTF-32BE )
mode3whi=$((10*4)) # minimum chars * 4 ( '4' is for UTF-32BE )
#########################################################################
# 1 byte UTF-8 values: Nothing to do; no complexities.
#########################################################################
# 2 Byte UTF-8 values: Verifying that I've got the right range values.
if ((B2==1)) ; then
echo "# Test 2 bytes for Valid UTF-8 values: ie. values which are in range"
# =========================================================================
time \
for d1 in {194..223} ;do
# bin oct hex dec
# lo 11000010 302 C2 194
# hi 11011111 337 DF 223
B2b1=$(printf "%0.2X" $d1)
#
for d2 in {128..191} ;do
# bin oct hex dec
# lo 10000000 200 80 128
# hi 10111111 277 BF 191
B2b2=$(printf "%0.2X" $d2)
#
echo -n "${B2b1}${B2b2}" |
xxd -p -u -r |
iconv -f UTF-8 >/dev/null || {
echo "ERROR: Invalid UTF-8 found: ${B2b1}${B2b2}"; exit 20; }
#
done
done
echo
# Now do a negated test.. This takes longer, because there are more values.
echo "# Test 2 bytes for Invalid values: ie. values which are out of range"
# =========================================================================
# Note: 'iconv' will treat a leading \x00-\x7F as a valid leading single,
# so this negated test primes the first UTF-8 byte with values starting at \x80
time \
for d1 in {128..193} {224..255} ;do
#for d1 in {128..194} {224..255} ;do # force a valid UTF-8 (needs $B2b2)
B2b1=$(printf "%0.2X" $d1)
#
for d2 in {0..127} {192..255} ;do
#for d2 in {0..128} {192..255} ;do # force a valid UTF-8 (needs $B2b1)
B2b2=$(printf "%0.2X" $d2)
#
echo -n "${B2b1}${B2b2}" |
xxd -p -u -r |
iconv -f UTF-8 2>/dev/null && {
echo "ERROR: VALID UTF-8 found: ${B2b1}${B2b2}"; exit 21; }
#
done
done
echo
fi
#########################################################################
# 3 Byte UTF-8 values: Verifying that I've got the right range values.
if ((B3==1)) ; then
echo "# Test 3 bytes for Valid UTF-8 values: ie. values which are in range"
# ========================================================================
time \
for d1 in {224..239} ;do
# bin oct hex dec
# lo 11100000 340 E0 224
# hi 11101111 357 EF 239
B3b1=$(printf "%0.2X" $d1)
#
if [[ $B3b1 == "E0" ]] ; then
B3b2range="$(echo {160..191})"
# bin oct hex dec
# lo 10100000 240 A0 160
# hi 10111111 277 BF 191
elif [[ $B3b1 == "ED" ]] ; then
B3b2range="$(echo {128..159})"
# bin oct hex dec
# lo 10000000 200 80 128
# hi 10011111 237 9F 159
else
B3b2range="$(echo {128..191})"
# bin oct hex dec
# lo 10000000 200 80 128
# hi 10111111 277 BF 191
fi
#
for d2 in $B3b2range ;do
B3b2=$(printf "%0.2X" $d2)
echo "${B3b1} ${B3b2} xx"
#
for d3 in {128..191} ;do
# bin oct hex dec
# lo 10000000 200 80 128
# hi 10111111 277 BF 191
B3b3=$(printf "%0.2X" $d3)
#
echo -n "${B3b1}${B3b2}${B3b3}" |
xxd -p -u -r |
iconv -f UTF-8 >/dev/null || {
echo "ERROR: Invalid UTF-8 found: ${B3b1}${B3b2}${B3b3}"; exit 30; }
#
done
done
done
echo
# Now do a negated test.. This takes longer, because there are more values.
echo "# Test 3 bytes for Invalid values: ie. values which are out of range"
# =========================================================================
# Note: 'iconv' will treat a leading \x00-\x7F as a valid leading single,
# so this negated test primes the first UTF-8 byte with values starting at \x80
#
# real 26m28.462s \
# user 27m12.526s | stepping by 2
# sys 13m11.193s /
#
# real 239m00.836s \
# user 225m11.108s | stepping by 1
# sys 120m00.538s /
#
time \
for d1 in {128..223..1} {240..255..1} ;do
#for d1 in {128..224..1} {239..255..1} ;do # force a valid UTF-8 (needs $B2b2,$B3b3)
B3b1=$(printf "%0.2X" $d1)
#
if [[ $B3b1 == "E0" ]] ; then
B3b2range="$(echo {0..159..1} {192..255..1})"
#B3b2range="$(> {192..255..1})" # force a valid UTF-8 (needs $B3b1,$B3b3)
elif [[ $B3b1 == "ED" ]] ; then
B3b2range="$(echo {0..127..1} {160..255..1})"
#B3b2range="$(echo {0..128..1} {160..255..1})" # force a valid UTF-8 (needs $B3b1,$B3b3)
else
B3b2range="$(echo {0..127..1} {192..255..1})"
#B3b2range="$(echo {0..128..1} {192..255..1})" # force a valid UTF-8 (needs $B3b1,$B3b3)
fi
for d2 in $B3b2range ;do
B3b2=$(printf "%0.2X" $d2)
echo "${B3b1} ${B3b2} xx"
#
for d3 in {0..127..1} {192..255..1} ;do
#for d3 in {0..128..1} {192..255..1} ;do # force a valid UTF-8 (needs $B2b1)
B3b3=$(printf "%0.2X" $d3)
#
echo -n "${B3b1}${B3b2}${B3b3}" |
xxd -p -u -r |
iconv -f UTF-8 2>/dev/null && {
echo "ERROR: VALID UTF-8 found: ${B3b1}${B3b2}${B3b3}"; exit 31; }
#
done
done
done
echo
fi
#########################################################################
# Brute force testing in the Astral Plane will take a VERY LONG time..
# Perhaps selective testing is more appropriate, now that the previous tests
# have panned out okay...
#
# 4 Byte UTF-8 values:
if ((B4==1)) ; then
echo "# Test 4 bytes for Valid UTF-8 values: ie. values which are in range"
# ==================================================================
# real 58m18.531s \
# user 56m44.317s |
# sys 27m29.867s /
time \
for d1 in {240..244} ;do
# bin oct hex dec
# lo 11110000 360 F0 240
# hi 11110100 364 F4 244 -- F4 encodes some values greater than 0x10FFFF;
# such a sequence is invalid.
B4b1=$(printf "%0.2X" $d1)
#
if [[ $B4b1 == "F0" ]] ; then
B4b2range="$(echo {144..191})" ## f0 90 80 80 to f0 bf bf bf
# bin oct hex dec 010000 -- 03FFFF
# lo 10010000 220 90 144
# hi 10111111 277 BF 191
#
elif [[ $B4b1 == "F4" ]] ; then
B4b2range="$(echo {128..143})" ## f4 80 80 80 to f4 8f bf bf
# bin oct hex dec 100000 -- 10FFFF
# lo 10000000 200 80 128
# hi 10001111 217 8F 143 -- F4 encodes some values greater than 0x10FFFF;
# such a sequence is invalid.
else
B4b2range="$(echo {128..191})" ## fx 80 80 80 to f3 bf bf bf
# bin oct hex dec 0C0000 -- 0FFFFF
# lo 10000000 200 80 128 0A0000
# hi 10111111 277 BF 191
fi
#
for d2 in $B4b2range ;do
B4b2=$(printf "%0.2X" $d2)
#
for d3 in {128..191} ;do
# bin oct hex dec
# lo 10000000 200 80 128
# hi 10111111 277 BF 191
B4b3=$(printf "%0.2X" $d3)
echo "${B4b1} ${B4b2} ${B4b3} xx"
#
for d4 in {128..191} ;do
# bin oct hex dec
# lo 10000000 200 80 128
# hi 10111111 277 BF 191
B4b4=$(printf "%0.2X" $d4)
#
echo -n "${B4b1}${B4b2}${B4b3}${B4b4}" |
xxd -p -u -r |
iconv -f UTF-8 >/dev/null || {
echo "ERROR: Invalid UTF-8 found: ${B4b1}${B4b2}${B4b3}${B4b4}"; exit 40; }
#
done
done
done
done
echo "# Test 4 bytes for Valid UTF-8 values: END"
echo
fi
########################################################################
# There is no test (yet) for negated range values in the astral plane. #
# (all negated range values must be invalid) #
# I won't bother; This was mainly for me to ge the general feel of #
# the tests, and the final test below should flush anything out.. #
# Traversing the intire UTF-8 range takes quite a while... #
# so no need to do it twice (albeit in a slightly different manner) #
########################################################################
################################
### The construction of: ####
### The Regular Expression ####
### (de-construction?) ####
################################
# BYTE 1 BYTE 2 BYTE 3 BYTE 4
# 1: [\x00-\x7F]
# ===========
# ([\x00-\x7F])
#
# 2: [\xC2-\xDF] [\x80-\xBF]
# =================================
# ([\xC2-\xDF][\x80-\xBF])
#
# 3: [\xE0] [\xA0-\xBF] [\x80-\xBF]
# [\xED] [\x80-\x9F] [\x80-\xBF]
# [\xE1-\xEC\xEE-\xEF] [\x80-\xBF] [\x80-\xBF]
# ==============================================
# ((([\xE0][\xA0-\xBF])|([\xED][\x80-\x9F])|([\xE1-\xEC\xEE-\xEF][\x80-\xBF]))([\x80-\xBF]))
#
# 4 [\xF0] [\x90-\xBF] [\x80-\xBF] [\x80-\xBF]
# [\xF1-\xF3] [\x80-\xBF] [\x80-\xBF] [\x80-\xBF]
# [\xF4] [\x80-\x8F] [\x80-\xBF] [\x80-\xBF]
# ===========================================================
# ((([\xF0][\x90-\xBF])|([\xF1-\xF3][\x80-\xBF])|([\xF4][\x80-\x8F]))([\x80-\xBF]{2}))
#
# The final regex
# ===============
# 1-4: (([\x00-\x7F])|([\xC2-\xDF][\x80-\xBF])|((([\xE0][\xA0-\xBF])|([\xED][\x80-\x9F])|([\xE1-\xEC\xEE-\xEF][\x80-\xBF]))([\x80-\xBF]))|((([\xF0][\x90-\xBF])|([\xF1-\xF3][\x80-\xBF])|([\xF4][\x80-\x8F]))([\x80-\xBF]{2})))
# 4-1: (((([\xF0][\x90-\xBF])|([\xF1-\xF3][\x80-\xBF])|([\xF4][\x80-\x8F]))([\x80-\xBF]{2}))|((([\xE0][\xA0-\xBF])|([\xED][\x80-\x9F])|([\xE1-\xEC\xEE-\xEF][\x80-\xBF]))([\x80-\xBF]))|([\xC2-\xDF][\x80-\xBF])|([\x00-\x7F]))
#######################################################################
# The final Test; for a single character (multi chars to follow) #
# Compare the return code of 'iconv' against the 'regex' #
# for the full range of 0x000000 to 0x10FFFF #
# #
# Note; this script has 3 modes: #
# Run this test TWICE, set each mode Manually! #
# #
# 1. Sequentially test every value from 0x000000 to 0x10FFFF #
# 2. Throw a spanner into the works! Force random byte patterns #
# 2. Throw a spanner into the works! Force random longer strings #
# ============================== #
# #
# Note: The purpose of this routine is to determine if there is any #
# difference how 'iconv' and 'regex' handle the same data #
# #
#######################################################################
if ((Rx==1)) ; then
# real 191m34.826s
# user 158m24.114s
# sys 83m10.676s
time {
invalCt=0
validCt=0
failCt=0
decBeg=$((0x00110000)) # incement by decimal integer
decMax=$((0x7FFFFFFF)) # incement by decimal integer
#
for ((CPDec=decBeg;CPDec<=decMax;CPDec+=13247)) ;do
((D==1)) && echo "=========================================================="
#
# Convert decimal integer '$CPDec' to Hex-digits; 6-long (dec2hex)
hexUTF32BE=$(printf '%0.8X\n' $CPDec) # hexUTF32BE
# progress count
if (((CPDec%$((0x1000)))==0)) ;then
((Test>2)) && echo
echo "$hexUTF32BE Test=$Test mode=${mode[$modebits]} "
fi
if ((Test==1 || Test==2 ))
then # Test 1. Sequentially test every value from 0x000000 to 0x10FFFF
#
if ((Test==2)) ; then
bits=32
UTF8="$( perl -C -e 'print chr 0x'$hexUTF32BE |
perl -l -ne '/^( [\000-\177]
| [\300-\337][\200-\277]
| [\340-\357][\200-\277]{2}
| [\360-\367][\200-\277]{3}
| [\370-\373][\200-\277]{4}
| [\374-\375][\200-\277]{5}
)*$/x and print' |xxd -p )"
UTF8="${UTF8%0a}"
[[ -n "$UTF8" ]] \
&& rcIco32=0 || rcIco32=1
rcIco16=
elif ((modebits==strict && CPDec<=$((0xFFFF)))) ;then
bits=16
UTF8="$( echo -n "${hexUTF32BE:4}" |
xxd -p -u -r |
iconv -f UTF-16BE -t UTF-8 2>/dev/null)" \
&& rcIco16=0 || rcIco16=1
rcIco32=
else
bits=32
UTF8="$( echo -n "$hexUTF32BE" |
xxd -p -u -r |
iconv -f UTF-32BE -t UTF-8 2>/dev/null)" \
&& rcIco32=0 || rcIco32=1
rcIco16=
fi
# echo "1 mode=${mode[$modebits]}-$bits rcIconv: (${rcIco16},${rcIco32}) $hexUTF32BE "
#
#
#
if ((${rcIco16}${rcIco32}!=0)) ;then
# 'iconv -f UTF-16BE' failed produce a reliable UTF-8
if ((bits==16)) ;then
((D==1)) && echo "bits-$bits rcIconv: error $hexUTF32BE .. 'strict' failed, now trying 'lax'"
# iconv failed to create a 'srict' UTF-8 so
# try UTF-32BE to get a 'lax' UTF-8 pattern
UTF8="$( echo -n "$hexUTF32BE" |
xxd -p -u -r |
iconv -f UTF-32BE -t UTF-8 2>/dev/null)" \
&& rcIco32=0 || rcIco32=1
#echo "2 mode=${mode[$modebits]}-$bits rcIconv: (${rcIco16},${rcIco32}) $hexUTF32BE "
if ((rcIco32!=0)) ;then
((D==1)) && echo -n "bits-$bits rcIconv: Cannot gen UTF-8 for: $hexUTF32BE"
rcIco32=1
fi
fi
fi
# echo "3 mode=${mode[$modebits]}-$bits rcIconv: (${rcIco16},${rcIco32}) $hexUTF32BE "
#
#
#
if ((rcIco16==0 || rcIco32==0)) ;then
# 'strict(16)' OR 'lax(32)'... 'iconv' managed to generate a UTF-8 pattern
((D==1)) && echo -n "bits-$bits rcIconv: pattern* $hexUTF32BE"
((D==1)) && if [[ $bits == "16" && $rcIco32 == "0" ]] ;then
echo " .. 'lax' UTF-8 produced a pattern"
else
echo
fi
# regex test
if ((modebits==strict)) ;then
#rxOut="$(echo -n "$UTF8" |perl -l -ne '/^(([\x00-\x7F])|([\xC2-\xDF][\x80-\xBF])|((([\xE0][\xA0-\xBF])|([\xED][\x80-\x9F])|([\xE1-\xEC\xEE-\xEF][\x80-\xBF]))([\x80-\xBF]))|((([\xF0][\x90-\xBF])|([\xF1-\xF3][\x80-\xBF])|([\xF4][\x80-\x8F]))([\x80-\xBF]{2})))*$/ or print' )"
rxOut="$(echo -n "$UTF8" |
perl -l -ne '/^( ([\x00-\x7F]) # 1-byte pattern
|([\xC2-\xDF][\x80-\xBF]) # 2-byte pattern
|((([\xE0][\xA0-\xBF])|([\xED][\x80-\x9F])|([\xE1-\xEC\xEE-\xEF][\x80-\xBF]))([\x80-\xBF])) # 3-byte pattern
|((([\xF0][\x90-\xBF])|([\xF1-\xF3][\x80-\xBF])|([\xF4][\x80-\x8F]))([\x80-\xBF]{2})) # 4-byte pattern
)*$ /x or print' )"
else
if ((Test==2)) ;then
rx="$(echo -n "$UTF8" |perl -l -ne '/^([\000-\177]|[\300-\337][\200-\277]|[\340-\357][\200-\277]{2}|[\360-\367][\200-\277]{3}|[\370-\373][\200-\277]{4}|[\374-\375][\200-\277]{5})*$/ and print')"
[[ "$UTF8" != "$rx" ]] && rxOut="$UTF8" || rxOut=
rx="$(echo -n "$rx" |sed -e "s/\(..\)/\1 /g")"
else
rxOut="$(echo -n "$UTF8" |perl -l -ne '/^([\000-\177]|[\300-\337][\200-\277]|[\340-\357][\200-\277]{2}|[\360-\367][\200-\277]{3}|[\370-\373][\200-\277]{4}|[\374-\375][\200-\277]{5})*$/ or print' )"
fi
fi
if [[ "$rxOut" == "" ]] ;then
((D==1)) && echo " rcRegex: ok"
rcRegex=0
else
((D==1)) && echo -n "bits-$bits rcRegex: error $hexUTF32BE .. 'strict' failed,"
((D==1)) && if [[ "12" == *$Test* ]] ;then
echo # " (codepoint) Test $Test"
else
echo
fi
rcRegex=1
fi
fi
#
elif [[ $Test == 2 ]]
then # Test 2. Throw a randomizing spanner into the works!
# Then test the arbitary bytes ASIS
#
hexLineRand="$(echo -n "$hexUTF32BE" |
sed -re "s/(.)(.)(.)(.)(.)(.)(.)(.)/\1\n\2\n\3\n\4\n\5\n\6\n\7\n\8/" |
sort -R |
tr -d '\n')"
#
elif [[ $Test == 3 ]]
then # Test 3. Test single UTF-16BE bytes in the range 0x00000000 to 0x7FFFFFFF
#
echo "Test 3 is not properly implemented yet.. Exiting"
exit 99
else
echo "ERROR: Invalid mode"
exit
fi
#
#
if ((Test==1 || Test=2)) ;then
if ((modebits==strict && CPDec<=$((0xFFFF)))) ;then
((rcIconv=rcIco16))
else
((rcIconv=rcIco32))
fi
if ((rcRegex!=rcIconv)) ;then
[[ $Test != 1 ]] && echo
if ((rcRegex==1)) ;then
echo "ERROR: 'regex' ok, but NOT 'iconv': ${hexUTF32BE} "
else
echo "ERROR: 'iconv' ok, but NOT 'regex': ${hexUTF32BE} "
fi
((failCt++));
elif ((rcRegex!=0)) ;then
# ((invalCt++)); echo -ne "$hexUTF32BE exit-codes $${rcIco16}${rcIco32}=,$rcRegex\t: $(printf "%0.8X\n" $invalCt)\t$hexLine$(printf "%$(((mode3whi*2)-${#hexLine}))s")\r"
((invalCt++))
else
((validCt++))
fi
if ((Test==1)) ;then
echo -ne "$hexUTF32BE " "mode=${mode[$modebits]} test-return=($rcIconv,$rcRegex) valid,invalid,fail=($(printf "%X" $validCt),$(printf "%X" $invalCt),$(printf "%X" $failCt)) \r"
else
echo -ne "$hexUTF32BE $rx mode=${mode[$modebits]} test-return=($rcIconv,$rcRegex) val,inval,fail=($(printf "%X" $validCt),$(printf "%X" $invalCt),$(printf "%X" $failCt))\r"
fi
fi
done
} # End time
fi
exit