Saya ingin memiliki fungsi untuk membuat siput dari string Unicode, misalnya gen_slug('Andrés Cortez')
harus kembali andres-cortez
. Bagaimana saya harus melakukan itu?
andres
. Apakah Anda yakin input Anda persis "andrés"?
Saya ingin memiliki fungsi untuk membuat siput dari string Unicode, misalnya gen_slug('Andrés Cortez')
harus kembali andres-cortez
. Bagaimana saya harus melakukan itu?
andres
. Apakah Anda yakin input Anda persis "andrés"?
Jawaban:
Alih-alih ganti yang panjang, coba yang ini:
public static function slugify($text)
{
// replace non letter or digits by -
$text = preg_replace('~[^\pL\d]+~u', '-', $text);
// transliterate
$text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
// remove unwanted characters
$text = preg_replace('~[^-\w]+~', '', $text);
// trim
$text = trim($text, '-');
// remove duplicate -
$text = preg_replace('~-+~', '-', $text);
// lowercase
$text = strtolower($text);
if (empty($text)) {
return 'n-a';
}
return $text;
}
Ini didasarkan pada yang ada di tutorial Jobeet milik Symfony.
^
tepat setelah braket pembuka - itu membalikkan pertandingan.
iconv
tidak akan mengonversi dengan benar jika $text
berisi karakter yang tidak memiliki ascii yang setara. Misalnya iconv('utf-8', 'us-ascii//TRANSLIT', "EFI收购Cretaprint")
akan kembali "EFI"
dan membocorkan peringatan.
$text = trim($text, '-');
harus pada akhirnya, jika tidak Foo 收
menjadi foo-
. Juga, Foo 收 Bar
menjadi foo--bar
(yang diulang -
tampaknya berlebihan).
Karena jawaban ini mendapat perhatian, saya menambahkan beberapa penjelasan.
Solusi yang diberikan pada dasarnya akan menggantikan semuanya kecuali AZ, az, 0-9, & - (tanda hubung) dengan - (tanda hubung). Jadi, itu tidak akan berfungsi dengan baik dengan karakter unicode lainnya (yang merupakan karakter valid untuk slug / string URL). Skenario umum adalah ketika string input berisi karakter non-Inggris.
Hanya gunakan solusi ini jika Anda yakin bahwa string input tidak akan memiliki karakter unicode yang Anda mungkin ingin menjadi bagian dari output / siput.
Misalnya. "नारी शक्ति" akan menjadi "----------" (semua tanda hubung) alih-alih "नारी-शक्ति" (slug URL yang valid).
Bagaimana tentang...
$slug = strtolower(trim(preg_replace('/[^A-Za-z0-9-]+/', '-', $string)));
?
strtolower(trim(preg_replace('/[^A-Za-z0-9-]+/', '-', "Étienne")))
mengembalikan "-tienne"
bukan "etienne"
, jadi itu tidak berfungsi dengan karakter beraksen.
Jika Anda telah menginstal ekstensi intl , Anda dapat menggunakan fungsi Transliterator :: transliterate untuk membuat siput dengan mudah.
<?php
$string = 'Namnet på bildtävlingen';
$slug = \Transliterator::createFromRules(
':: Any-Latin;'
. ':: NFD;'
. ':: [:Nonspacing Mark:] Remove;'
. ':: NFC;'
. ':: [:Punctuation:] Remove;'
. ':: Lower();'
. '[:Separator:] > \'-\''
)
->transliterate( $string );
echo $slug; // namnet-pa-bildtavlingen
?>
Catatan: Saya telah mengambil ini dari wordpress dan berhasil !!
Gunakan seperti ini:
echo sanitize('testing this link');
Kode
//taken from wordpress
function utf8_uri_encode( $utf8_string, $length = 0 ) {
$unicode = '';
$values = array();
$num_octets = 1;
$unicode_length = 0;
$string_length = strlen( $utf8_string );
for ($i = 0; $i < $string_length; $i++ ) {
$value = ord( $utf8_string[ $i ] );
if ( $value < 128 ) {
if ( $length && ( $unicode_length >= $length ) )
break;
$unicode .= chr($value);
$unicode_length++;
} else {
if ( count( $values ) == 0 ) $num_octets = ( $value < 224 ) ? 2 : 3;
$values[] = $value;
if ( $length && ( $unicode_length + ($num_octets * 3) ) > $length )
break;
if ( count( $values ) == $num_octets ) {
if ($num_octets == 3) {
$unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]) . '%' . dechex($values[2]);
$unicode_length += 9;
} else {
$unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]);
$unicode_length += 6;
}
$values = array();
$num_octets = 1;
}
}
}
return $unicode;
}
//taken from wordpress
function seems_utf8($str) {
$length = strlen($str);
for ($i=0; $i < $length; $i++) {
$c = ord($str[$i]);
if ($c < 0x80) $n = 0; # 0bbbbbbb
elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb
elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb
elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb
elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb
elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b
else return false; # Does not match any model
for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80))
return false;
}
}
return true;
}
//function sanitize_title_with_dashes taken from wordpress
function sanitize($title) {
$title = strip_tags($title);
// Preserve escaped octets.
$title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $title);
// Remove percent signs that are not part of an octet.
$title = str_replace('%', '', $title);
// Restore octets.
$title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title);
if (seems_utf8($title)) {
if (function_exists('mb_strtolower')) {
$title = mb_strtolower($title, 'UTF-8');
}
$title = utf8_uri_encode($title, 200);
}
$title = strtolower($title);
$title = preg_replace('/&.+?;/', '', $title); // kill entities
$title = str_replace('.', '-', $title);
$title = preg_replace('/[^%a-z0-9 _-]/', '', $title);
$title = preg_replace('/\s+/', '-', $title);
$title = preg_replace('|-+|', '-', $title);
$title = trim($title, '-');
return $title;
}
sanitize_title_with_dashes($string, null, 'save')
(perhatikan parameter tambahan), jika tidak, Anda akan mendapatkan beberapa kode karakter yang berantakan telstra%e2%80%99s-%e2%80%98all-roles-flex%e2%80%99
. Tidak terlalu cantik. :-(
sanitize
adalah nama fungsi yang aneh dan dapat dilupakan untuk menghasilkan siput.
Itu selalu merupakan ide yang baik untuk menggunakan solusi yang ada yang didukung oleh banyak pengembang tingkat tinggi. Yang paling populer adalah https://github.com/cocur/slugify . Pertama-tama, ini mendukung lebih dari satu bahasa, dan sedang diperbarui.
Jika Anda tidak ingin menggunakan seluruh paket, Anda dapat menyalin bagian yang Anda butuhkan.
Berikut ini yang lain, misalnya "Judul dengan karakter aneh ééé AX Z" menjadi "judul-dengan-karakter-aneh-eee-axz".
/**
* Function used to create a slug associated to an "ugly" string.
*
* @param string $string the string to transform.
*
* @return string the resulting slug.
*/
public static function createSlug($string) {
$table = array(
'Š'=>'S', 'š'=>'s', 'Đ'=>'Dj', 'đ'=>'dj', 'Ž'=>'Z', 'ž'=>'z', 'Č'=>'C', 'č'=>'c', 'Ć'=>'C', 'ć'=>'c',
'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O',
'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss',
'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e',
'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o',
'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b',
'ÿ'=>'y', 'Ŕ'=>'R', 'ŕ'=>'r', '/' => '-', ' ' => '-'
);
// -- Remove duplicated spaces
$stripped = preg_replace(array('/\s{2,}/', '/[\t\n]/'), ' ', $string);
// -- Returns the slug
return strtolower(strtr($string, $table));
}
Versi terbaru dari kode @Imran Omar Bukhsh (dari cabang Wordpress (4.0) terbaru):
<?php
// Add methods to slugify taken from Wordpress:
// - https://github.com/WordPress/WordPress/blob/master/wp-includes/formatting.php
// - https://github.com/WordPress/WordPress/blob/master/wp-includes/functions.php
/**
* Set the mbstring internal encoding to a binary safe encoding when func_overload
* is enabled.
*
* When mbstring.func_overload is in use for multi-byte encodings, the results from
* strlen() and similar functions respect the utf8 characters, causing binary data
* to return incorrect lengths.
*
* This function overrides the mbstring encoding to a binary-safe encoding, and
* resets it to the users expected encoding afterwards through the
* `reset_mbstring_encoding` function.
*
* It is safe to recursively call this function, however each
* `mbstring_binary_safe_encoding()` call must be followed up with an equal number
* of `reset_mbstring_encoding()` calls.
*
* @since 3.7.0
*
* @see reset_mbstring_encoding()
*
* @param bool $reset Optional. Whether to reset the encoding back to a previously-set encoding.
* Default false.
*/
function mbstring_binary_safe_encoding( $reset = false ) {
static $encodings = array();
static $overloaded = null;
if ( is_null( $overloaded ) )
$overloaded = function_exists( 'mb_internal_encoding' ) && ( ini_get( 'mbstring.func_overload' ) & 2 );
if ( false === $overloaded )
return;
if ( ! $reset ) {
$encoding = mb_internal_encoding();
array_push( $encodings, $encoding );
mb_internal_encoding( 'ISO-8859-1' );
}
if ( $reset && $encodings ) {
$encoding = array_pop( $encodings );
mb_internal_encoding( $encoding );
}
}
/**
* Reset the mbstring internal encoding to a users previously set encoding.
*
* @see mbstring_binary_safe_encoding()
*
* @since 3.7.0
*/
function reset_mbstring_encoding() {
mbstring_binary_safe_encoding( true );
}
/**
* Checks to see if a string is utf8 encoded.
*
* NOTE: This function checks for 5-Byte sequences, UTF8
* has Bytes Sequences with a maximum length of 4.
*
* @author bmorel at ssi dot fr (modified)
* @since 1.2.1
*
* @param string $str The string to be checked
* @return bool True if $str fits a UTF-8 model, false otherwise.
*/
function seems_utf8($str) {
mbstring_binary_safe_encoding();
$length = strlen($str);
reset_mbstring_encoding();
for ($i=0; $i < $length; $i++) {
$c = ord($str[$i]);
if ($c < 0x80) $n = 0; # 0bbbbbbb
elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb
elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb
elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb
elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb
elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b
else return false; # Does not match any model
for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80))
return false;
}
}
return true;
}
/**
* Encode the Unicode values to be used in the URI.
*
* @since 1.5.0
*
* @param string $utf8_string
* @param int $length Max length of the string
* @return string String with Unicode encoded for URI.
*/
function utf8_uri_encode( $utf8_string, $length = 0 ) {
$unicode = '';
$values = array();
$num_octets = 1;
$unicode_length = 0;
mbstring_binary_safe_encoding();
$string_length = strlen( $utf8_string );
reset_mbstring_encoding();
for ($i = 0; $i < $string_length; $i++ ) {
$value = ord( $utf8_string[ $i ] );
if ( $value < 128 ) {
if ( $length && ( $unicode_length >= $length ) )
break;
$unicode .= chr($value);
$unicode_length++;
} else {
if ( count( $values ) == 0 ) $num_octets = ( $value < 224 ) ? 2 : 3;
$values[] = $value;
if ( $length && ( $unicode_length + ($num_octets * 3) ) > $length )
break;
if ( count( $values ) == $num_octets ) {
if ($num_octets == 3) {
$unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]) . '%' . dechex($values[2]);
$unicode_length += 9;
} else {
$unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]);
$unicode_length += 6;
}
$values = array();
$num_octets = 1;
}
}
}
return $unicode;
}
/**
* Sanitizes a title, replacing whitespace and a few other characters with dashes.
*
* Limits the output to alphanumeric characters, underscore (_) and dash (-).
* Whitespace becomes a dash.
*
* @since 1.2.0
*
* @param string $title The title to be sanitized.
* @param string $raw_title Optional. Not used.
* @param string $context Optional. The operation for which the string is sanitized.
* @return string The sanitized title.
*/
function sanitize_title_with_dashes( $title, $raw_title = '', $context = 'display' ) {
$title = strip_tags($title);
// Preserve escaped octets.
$title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $title);
// Remove percent signs that are not part of an octet.
$title = str_replace('%', '', $title);
// Restore octets.
$title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title);
if (seems_utf8($title)) {
if (function_exists('mb_strtolower')) {
$title = mb_strtolower($title, 'UTF-8');
}
$title = utf8_uri_encode($title, 200);
}
$title = strtolower($title);
$title = preg_replace('/&.+?;/', '', $title); // kill entities
$title = str_replace('.', '-', $title);
if ( 'save' == $context ) {
// Convert nbsp, ndash and mdash to hyphens
$title = str_replace( array( '%c2%a0', '%e2%80%93', '%e2%80%94' ), '-', $title );
// Strip these characters entirely
$title = str_replace( array(
// iexcl and iquest
'%c2%a1', '%c2%bf',
// angle quotes
'%c2%ab', '%c2%bb', '%e2%80%b9', '%e2%80%ba',
// curly quotes
'%e2%80%98', '%e2%80%99', '%e2%80%9c', '%e2%80%9d',
'%e2%80%9a', '%e2%80%9b', '%e2%80%9e', '%e2%80%9f',
// copy, reg, deg, hellip and trade
'%c2%a9', '%c2%ae', '%c2%b0', '%e2%80%a6', '%e2%84%a2',
// acute accents
'%c2%b4', '%cb%8a', '%cc%81', '%cd%81',
// grave accent, macron, caron
'%cc%80', '%cc%84', '%cc%8c',
), '', $title );
// Convert times to x
$title = str_replace( '%c3%97', 'x', $title );
}
$title = preg_replace('/[^%a-z0-9 _-]/', '', $title);
$title = preg_replace('/\s+/', '-', $title);
$title = preg_replace('|-+|', '-', $title);
$title = trim($title, '-');
return $title;
}
$title = '#PFW Alexander McQueen Spring/Summer 2015';
echo "title -> slug: \n". $title ." -> ". sanitize_title_with_dashes($title);
echo "\n\n";
$title = '«GQ»: Elyas M\'Barek gehört zu Männern des Jahres';
echo "title -> slug: \n". $title ." -> ". sanitize_title_with_dashes($title);
Lihat contoh online .
%c2%abgq%c2%bb-elyas-mbarek-geh%c3%b6rt-zu-m%c3%a4nnern-des-jahres
public static function slugify ($text) {
$replace = [
'<' => '', '>' => '', ''' => '', '&' => '',
'"' => '', 'À' => 'A', 'Á' => 'A', 'Â' => 'A', 'Ã' => 'A', 'Ä'=> 'Ae',
'Ä' => 'A', 'Å' => 'A', 'Ā' => 'A', 'Ą' => 'A', 'Ă' => 'A', 'Æ' => 'Ae',
'Ç' => 'C', 'Ć' => 'C', 'Č' => 'C', 'Ĉ' => 'C', 'Ċ' => 'C', 'Ď' => 'D', 'Đ' => 'D',
'Ð' => 'D', 'È' => 'E', 'É' => 'E', 'Ê' => 'E', 'Ë' => 'E', 'Ē' => 'E',
'Ę' => 'E', 'Ě' => 'E', 'Ĕ' => 'E', 'Ė' => 'E', 'Ĝ' => 'G', 'Ğ' => 'G',
'Ġ' => 'G', 'Ģ' => 'G', 'Ĥ' => 'H', 'Ħ' => 'H', 'Ì' => 'I', 'Í' => 'I',
'Î' => 'I', 'Ï' => 'I', 'Ī' => 'I', 'Ĩ' => 'I', 'Ĭ' => 'I', 'Į' => 'I',
'İ' => 'I', 'IJ' => 'IJ', 'Ĵ' => 'J', 'Ķ' => 'K', 'Ł' => 'K', 'Ľ' => 'K',
'Ĺ' => 'K', 'Ļ' => 'K', 'Ŀ' => 'K', 'Ñ' => 'N', 'Ń' => 'N', 'Ň' => 'N',
'Ņ' => 'N', 'Ŋ' => 'N', 'Ò' => 'O', 'Ó' => 'O', 'Ô' => 'O', 'Õ' => 'O',
'Ö' => 'Oe', 'Ö' => 'Oe', 'Ø' => 'O', 'Ō' => 'O', 'Ő' => 'O', 'Ŏ' => 'O',
'Œ' => 'OE', 'Ŕ' => 'R', 'Ř' => 'R', 'Ŗ' => 'R', 'Ś' => 'S', 'Š' => 'S',
'Ş' => 'S', 'Ŝ' => 'S', 'Ș' => 'S', 'Ť' => 'T', 'Ţ' => 'T', 'Ŧ' => 'T',
'Ț' => 'T', 'Ù' => 'U', 'Ú' => 'U', 'Û' => 'U', 'Ü' => 'Ue', 'Ū' => 'U',
'Ü' => 'Ue', 'Ů' => 'U', 'Ű' => 'U', 'Ŭ' => 'U', 'Ũ' => 'U', 'Ų' => 'U',
'Ŵ' => 'W', 'Ý' => 'Y', 'Ŷ' => 'Y', 'Ÿ' => 'Y', 'Ź' => 'Z', 'Ž' => 'Z',
'Ż' => 'Z', 'Þ' => 'T', 'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a',
'ä' => 'ae', 'ä' => 'ae', 'å' => 'a', 'ā' => 'a', 'ą' => 'a', 'ă' => 'a',
'æ' => 'ae', 'ç' => 'c', 'ć' => 'c', 'č' => 'c', 'ĉ' => 'c', 'ċ' => 'c',
'ď' => 'd', 'đ' => 'd', 'ð' => 'd', 'è' => 'e', 'é' => 'e', 'ê' => 'e',
'ë' => 'e', 'ē' => 'e', 'ę' => 'e', 'ě' => 'e', 'ĕ' => 'e', 'ė' => 'e',
'ƒ' => 'f', 'ĝ' => 'g', 'ğ' => 'g', 'ġ' => 'g', 'ģ' => 'g', 'ĥ' => 'h',
'ħ' => 'h', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', 'ī' => 'i',
'ĩ' => 'i', 'ĭ' => 'i', 'į' => 'i', 'ı' => 'i', 'ij' => 'ij', 'ĵ' => 'j',
'ķ' => 'k', 'ĸ' => 'k', 'ł' => 'l', 'ľ' => 'l', 'ĺ' => 'l', 'ļ' => 'l',
'ŀ' => 'l', 'ñ' => 'n', 'ń' => 'n', 'ň' => 'n', 'ņ' => 'n', 'ʼn' => 'n',
'ŋ' => 'n', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o', 'ö' => 'oe',
'ö' => 'oe', 'ø' => 'o', 'ō' => 'o', 'ő' => 'o', 'ŏ' => 'o', 'œ' => 'oe',
'ŕ' => 'r', 'ř' => 'r', 'ŗ' => 'r', 'š' => 's', 'ù' => 'u', 'ú' => 'u',
'û' => 'u', 'ü' => 'ue', 'ū' => 'u', 'ü' => 'ue', 'ů' => 'u', 'ű' => 'u',
'ŭ' => 'u', 'ũ' => 'u', 'ų' => 'u', 'ŵ' => 'w', 'ý' => 'y', 'ÿ' => 'y',
'ŷ' => 'y', 'ž' => 'z', 'ż' => 'z', 'ź' => 'z', 'þ' => 't', 'ß' => 'ss',
'ſ' => 'ss', 'ый' => 'iy', 'А' => 'A', 'Б' => 'B', 'В' => 'V', 'Г' => 'G',
'Д' => 'D', 'Е' => 'E', 'Ё' => 'YO', 'Ж' => 'ZH', 'З' => 'Z', 'И' => 'I',
'Й' => 'Y', 'К' => 'K', 'Л' => 'L', 'М' => 'M', 'Н' => 'N', 'О' => 'O',
'П' => 'P', 'Р' => 'R', 'С' => 'S', 'Т' => 'T', 'У' => 'U', 'Ф' => 'F',
'Х' => 'H', 'Ц' => 'C', 'Ч' => 'CH', 'Ш' => 'SH', 'Щ' => 'SCH', 'Ъ' => '',
'Ы' => 'Y', 'Ь' => '', 'Э' => 'E', 'Ю' => 'YU', 'Я' => 'YA', 'а' => 'a',
'б' => 'b', 'в' => 'v', 'г' => 'g', 'д' => 'd', 'е' => 'e', 'ё' => 'yo',
'ж' => 'zh', 'з' => 'z', 'и' => 'i', 'й' => 'y', 'к' => 'k', 'л' => 'l',
'м' => 'm', 'н' => 'n', 'о' => 'o', 'п' => 'p', 'р' => 'r', 'с' => 's',
'т' => 't', 'у' => 'u', 'ф' => 'f', 'х' => 'h', 'ц' => 'c', 'ч' => 'ch',
'ш' => 'sh', 'щ' => 'sch', 'ъ' => '', 'ы' => 'y', 'ь' => '', 'э' => 'e',
'ю' => 'yu', 'я' => 'ya'
];
// make a human readable string
$text = strtr($text, $replace);
// replace non letter or digits by -
$text = preg_replace('~[^\\pL\d.]+~u', '-', $text);
// trim
$text = trim($text, '-');
// remove unwanted characters
$text = preg_replace('~[^-\w.]+~', '', $text);
$text = strtolower($text);
return $text;
}
Jangan gunakan preg_replace untuk ini. Ada fungsi php yang dibangun hanya untuk tugas: strtr () http://php.net/manual/en/function.strtr.php
Diambil dari komentar di tautan di atas (dan saya mengujinya sendiri; berfungsi:
function normalize ($string) {
$table = array(
'Š'=>'S', 'š'=>'s', 'Đ'=>'Dj', 'đ'=>'dj', 'Ž'=>'Z', 'ž'=>'z', 'Č'=>'C', 'č'=>'c', 'Ć'=>'C', 'ć'=>'c',
'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O',
'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss',
'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e',
'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o',
'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b',
'ÿ'=>'y', 'Ŕ'=>'R', 'ŕ'=>'r',
);
return strtr($string, $table);
}
Saya menggunakan:
function slugify($text)
{
$text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
return strtolower(preg_replace('/[^A-Za-z0-9-]+/', '-', $text));
}
Satu-satunya kekurangan adalah bahwa karakter Cyrillic tidak akan dikonversi, dan saya sedang mencari solusi yang tidak lama str_replace untuk setiap karakter Cyrillic tunggal.
set_locale('cyrillic.UTF-8')
terlebih dahulu. Nilai persisnya tergantung pada lokal yang Anda instal.
Saya tidak tahu yang mana yang harus digunakan jadi saya membuat bangku cepat di phptester.net
<?php
// First test
// https://stackoverflow.com/a/42740874/10232729
function slugify(STRING $string, STRING $separator = '-'){
$accents_regex = '~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i';
$special_cases = [ '&' => 'and', "'" => ''];
$string = mb_strtolower( trim( $string ), 'UTF-8' );
$string = str_replace( array_keys($special_cases), array_values( $special_cases), $string );
$string = preg_replace( $accents_regex, '$1', htmlentities( $string, ENT_QUOTES, 'UTF-8' ) );
$string = preg_replace('/[^a-z0-9]/u', $separator, $string);
return preg_replace('/['.$separator.']+/u', $separator, $string);
}
// Second test
// https://stackoverflow.com/a/13331948/10232729
function slug(STRING $string, STRING $separator = '-'){
$string = transliterator_transliterate('Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC; [:Punctuation:] Remove; Lower();', $string);
return str_replace(' ', $separator, $string);;
}
// Third test - My choice
// https://stackoverflow.com/a/38066136/10232729
function slugbis($text){
$replace = [
'<' => '', '>' => '', '-' => ' ', '&' => '',
'"' => '', 'À' => 'A', 'Á' => 'A', 'Â' => 'A', 'Ã' => 'A', 'Ä'=> 'Ae',
'Ä' => 'A', 'Å' => 'A', 'Ā' => 'A', 'Ą' => 'A', 'Ă' => 'A', 'Æ' => 'Ae',
'Ç' => 'C', 'Ć' => 'C', 'Č' => 'C', 'Ĉ' => 'C', 'Ċ' => 'C', 'Ď' => 'D', 'Đ' => 'D',
'Ð' => 'D', 'È' => 'E', 'É' => 'E', 'Ê' => 'E', 'Ë' => 'E', 'Ē' => 'E',
'Ę' => 'E', 'Ě' => 'E', 'Ĕ' => 'E', 'Ė' => 'E', 'Ĝ' => 'G', 'Ğ' => 'G',
'Ġ' => 'G', 'Ģ' => 'G', 'Ĥ' => 'H', 'Ħ' => 'H', 'Ì' => 'I', 'Í' => 'I',
'Î' => 'I', 'Ï' => 'I', 'Ī' => 'I', 'Ĩ' => 'I', 'Ĭ' => 'I', 'Į' => 'I',
'İ' => 'I', 'IJ' => 'IJ', 'Ĵ' => 'J', 'Ķ' => 'K', 'Ł' => 'K', 'Ľ' => 'K',
'Ĺ' => 'K', 'Ļ' => 'K', 'Ŀ' => 'K', 'Ñ' => 'N', 'Ń' => 'N', 'Ň' => 'N',
'Ņ' => 'N', 'Ŋ' => 'N', 'Ò' => 'O', 'Ó' => 'O', 'Ô' => 'O', 'Õ' => 'O',
'Ö' => 'Oe', 'Ö' => 'Oe', 'Ø' => 'O', 'Ō' => 'O', 'Ő' => 'O', 'Ŏ' => 'O',
'Œ' => 'OE', 'Ŕ' => 'R', 'Ř' => 'R', 'Ŗ' => 'R', 'Ś' => 'S', 'Š' => 'S',
'Ş' => 'S', 'Ŝ' => 'S', 'Ș' => 'S', 'Ť' => 'T', 'Ţ' => 'T', 'Ŧ' => 'T',
'Ț' => 'T', 'Ù' => 'U', 'Ú' => 'U', 'Û' => 'U', 'Ü' => 'Ue', 'Ū' => 'U',
'Ü' => 'Ue', 'Ů' => 'U', 'Ű' => 'U', 'Ŭ' => 'U', 'Ũ' => 'U', 'Ų' => 'U',
'Ŵ' => 'W', 'Ý' => 'Y', 'Ŷ' => 'Y', 'Ÿ' => 'Y', 'Ź' => 'Z', 'Ž' => 'Z',
'Ż' => 'Z', 'Þ' => 'T', 'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a',
'ä' => 'ae', 'ä' => 'ae', 'å' => 'a', 'ā' => 'a', 'ą' => 'a', 'ă' => 'a',
'æ' => 'ae', 'ç' => 'c', 'ć' => 'c', 'č' => 'c', 'ĉ' => 'c', 'ċ' => 'c',
'ď' => 'd', 'đ' => 'd', 'ð' => 'd', 'è' => 'e', 'é' => 'e', 'ê' => 'e',
'ë' => 'e', 'ē' => 'e', 'ę' => 'e', 'ě' => 'e', 'ĕ' => 'e', 'ė' => 'e',
'ƒ' => 'f', 'ĝ' => 'g', 'ğ' => 'g', 'ġ' => 'g', 'ģ' => 'g', 'ĥ' => 'h',
'ħ' => 'h', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', 'ī' => 'i',
'ĩ' => 'i', 'ĭ' => 'i', 'į' => 'i', 'ı' => 'i', 'ij' => 'ij', 'ĵ' => 'j',
'ķ' => 'k', 'ĸ' => 'k', 'ł' => 'l', 'ľ' => 'l', 'ĺ' => 'l', 'ļ' => 'l',
'ŀ' => 'l', 'ñ' => 'n', 'ń' => 'n', 'ň' => 'n', 'ņ' => 'n', 'ʼn' => 'n',
'ŋ' => 'n', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o', 'ö' => 'oe',
'ö' => 'oe', 'ø' => 'o', 'ō' => 'o', 'ő' => 'o', 'ŏ' => 'o', 'œ' => 'oe',
'ŕ' => 'r', 'ř' => 'r', 'ŗ' => 'r', 'š' => 's', 'ù' => 'u', 'ú' => 'u',
'û' => 'u', 'ü' => 'ue', 'ū' => 'u', 'ü' => 'ue', 'ů' => 'u', 'ű' => 'u',
'ŭ' => 'u', 'ũ' => 'u', 'ų' => 'u', 'ŵ' => 'w', 'ý' => 'y', 'ÿ' => 'y',
'ŷ' => 'y', 'ž' => 'z', 'ż' => 'z', 'ź' => 'z', 'þ' => 't', 'ß' => 'ss',
'ſ' => 'ss', 'ый' => 'iy', 'А' => 'A', 'Б' => 'B', 'В' => 'V', 'Г' => 'G',
'Д' => 'D', 'Е' => 'E', 'Ё' => 'YO', 'Ж' => 'ZH', 'З' => 'Z', 'И' => 'I',
'Й' => 'Y', 'К' => 'K', 'Л' => 'L', 'М' => 'M', 'Н' => 'N', 'О' => 'O',
'П' => 'P', 'Р' => 'R', 'С' => 'S', 'Т' => 'T', 'У' => 'U', 'Ф' => 'F',
'Х' => 'H', 'Ц' => 'C', 'Ч' => 'CH', 'Ш' => 'SH', 'Щ' => 'SCH', 'Ъ' => '',
'Ы' => 'Y', 'Ь' => '', 'Э' => 'E', 'Ю' => 'YU', 'Я' => 'YA', 'а' => 'a',
'б' => 'b', 'в' => 'v', 'г' => 'g', 'д' => 'd', 'е' => 'e', 'ё' => 'yo',
'ж' => 'zh', 'з' => 'z', 'и' => 'i', 'й' => 'y', 'к' => 'k', 'л' => 'l',
'м' => 'm', 'н' => 'n', 'о' => 'o', 'п' => 'p', 'р' => 'r', 'с' => 's',
'т' => 't', 'у' => 'u', 'ф' => 'f', 'х' => 'h', 'ц' => 'c', 'ч' => 'ch',
'ш' => 'sh', 'щ' => 'sch', 'ъ' => '', 'ы' => 'y', 'ь' => '', 'э' => 'e',
'ю' => 'yu', 'я' => 'ya'
];
// make a human readable string
$text = strtr($text, $replace);
// replace non letter or digits by -
$text = preg_replace('~[^\pL\d.]+~u', '-', $text);
// trim
$text = trim($text, '-');
// remove unwanted characters
$text = preg_replace('~[^-\w.]+~', '', $text);
return strtolower($text);
}
// Fourth test
// https://stackoverflow.com/a/2955521/10232729
function slugagain($string){
$table = [
'Š'=>'S', 'š'=>'s', 'Đ'=>'Dj', 'đ'=>'dj', 'Ž'=>'Z', 'ž'=>'z', 'Č'=>'C', 'č'=>'c', 'Ć'=>'C', 'ć'=>'c',
'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O',
'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss',
'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e',
'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o',
'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b',
'ÿ'=>'y', 'Ŕ'=>'R', 'ŕ'=>'r', ' '=>'-'
];
return strtr($string, $table);
}
// Fifth test
// https://stackoverflow.com/a/27396804/10232729
function slugifybis($url){
$url = trim($url);
$url = str_replace(' ', '-', $url);
$url = str_replace('/', '-slash-', $url);
return rawurlencode($url);
}
// Sixth and last test
// https://stackoverflow.com/a/39442034/10232729
setlocale( LC_ALL, "en_US.UTF8" );
function slugifyagain($string){
$string = iconv('utf-8', 'us-ascii//translit//ignore', $string); // transliterate
$string = str_replace("'", '', $string);
$string = preg_replace('~[^\pL\d]+~u', '-', $string); // replace non letter or non digits by "-"
$string = preg_replace('~[^-\w]+~', '', $string); // remove unwanted characters
$string = preg_replace('~-+~', '-', $string); // remove duplicate "-"
$string = trim($string, '-'); // trim "-"
$string = trim($string); // trim
$string = mb_strtolower($string, 'utf-8'); // lowercase
return urlencode($string); // safe;
};
$string = $newString = "¿ Àñdréß l'affreux ğarçon & nøël en forêt !";
$max = 10000;
echo '<pre>';
echo 'Beginning :';
echo '<br />';
echo '<br />';
echo '> Slugging '.$max.' iterations of following :';
echo '<br />';
echo '>> ' . $string;
echo '<br />';
echo '<br />';
echo 'Output results :';
echo '<br />';
echo '<br />';
$start = microtime(true);
for($i = 0 ; $i < $max ; $i++){
$newString = slugify($string);
}
$time = (microtime(true) - $start) * 1000;
echo '> First test passed in **' . round($time, 2) . 'ms**';
echo '<br />';
echo '>> Result : ' . $newString;
echo '<br />';
echo '<br />';
$start = microtime(true);
for($i = 0 ; $i < $max ; $i++){
$newString = slug($string);
}
$time = (microtime(true) - $start) * 1000;
echo '> Second test passed in **' . round($time, 2) . 'ms**';
echo '<br />';
echo '>> Result : ' . $newString;
echo '<br />';
echo '<br />';
$start = microtime(true);
for($i = 0 ; $i < $max ; $i++){
$newString = slugbis($string);
}
$time = (microtime(true) - $start) * 1000;
echo '> Third test passed in **' . round($time, 2) . 'ms**';
echo '<br />';
echo '>> Result : ' . $newString;
echo '<br />';
echo '<br />';
$start = microtime(true);
for($i = 0 ; $i < $max ; $i++){
$newString = slugagain($string);
}
$time = (microtime(true) - $start) * 1000;
echo '> Fourth test passed in **' . round($time, 2) . 'ms**';
echo '<br />';
echo '>> Result : ' . $newString;
echo '<br />';
echo '<br />';
$start = microtime(true);
for($i = 0 ; $i < $max ; $i++){
$newString = slugifybis($string);
}
$time = (microtime(true) - $start) * 1000;
echo '> Fifth test passed in **' . round($time, 2) . 'ms**';
echo '<br />';
echo '>> Result : ' . $newString;
echo '<br />';
echo '<br />';
$start = microtime(true);
for($i = 0 ; $i < $max ; $i++){
$newString = slugifyagain($string);
}
$time = (microtime(true) - $start) * 1000;
echo '> Sixth test passed in **' . round($time, 2) . 'ms**';
echo '<br />';
echo '>> Result : ' . $newString;
echo '</pre>';
Awal:
Slugging 10000 iterasi sebagai berikut:
¿Àñdréß l'affreux ğarçon & nøël en forêt!
Hasil keluaran:
Tes pertama lulus dalam 120,78 ms
Hasil: -iquest-andresz-laffreux-arcon-dan-noel-en-foret-
Tes kedua lulus dalam 3883,82 ms
Hasil: -andreß-laffreux-garcon - nøel-en-foret-
Tes ketiga lulus dalam 56,83 ms
Hasil: andress-l-affreux-garcon-noel-en-foret
Tes keempat lulus dalam 18,93 ms
Hasil: ¿-AndreSs-l'affreux-ğarcon - & - noel-en-foret-!
Tes kelima lulus dalam 6,45 ms
Hasil:% C2% BF-% C3% 80% C3% B1dr% C3% A9% C3% 9F-l% 27affreux-% C4% 9Far% C3% A7on-% 26-n% C3% B3% B3% AB3- id-untuk% C3% AAt-% 21
Tes keenam lulus dalam 112,42 ms
Hasil: andress-laffreux-garcon-n-el-en-foret
Diperlukan tes lebih lanjut.
Sunting: tes iterasi yang lebih sedikit
Awal:
Slugging 100 iterasi berikut ini:
¿Àñdréß l'affreux ğarçon & nøël en forêt!
Hasil keluaran:
Tes pertama lulus dalam 1,72 ms
Hasil: -iquest-andresz-laffreux-arcon-dan-noel-en-foret-
Tes kedua lulus dalam 48,59 ms
Hasil: -andreß-laffreux-garcon - nøel-en-foret-
Tes ketiga lulus dalam 0,91 ms
Hasil: andress-l-affreux-garcon-noel-en-foret
Tes keempat lulus dalam 0,3 ms
Hasil: ¿-AndreSs-l'affreux-ğarcon - & - noel-en-foret-!
Tes kelima lulus dalam 0,14 ms
Hasil:% C2% BF-% C3% 80% C3% B1dr% C3% A9% C3% 9F-l% 27affreux-% C4% 9Far% C3% A7on-% 26-n% C3% B3% B3% AB3- id-untuk% C3% AAt-% 21
Tes keenam lulus dalam 1.4ms
Hasil: andress-laffreux-garcon-n-el-en-foret
Anda bisa melihatnya Normalizer::normalize()
, lihat di sini . Itu hanya perlu memuat modul intl untuk PHP
Karena gTLD dan IDN semakin banyak digunakan, saya tidak bisa melihat mengapa URL tidak boleh mengandung Andrés.
Hanya rawurlencode $ URL yang Anda inginkan. Sebagian besar peramban menampilkan karakter UTF-8 dalam URL (mungkin bukan IE6 kuno) dan bit.ly / goo.gl dapat digunakan untuk membuatnya singkat dalam kasus-kasus seperti Rusia dan Arab jika perlu untuk keperluan iklan atau hanya menuliskannya di iklan seperti pengguna akan menulisnya di URL browser.
Satu-satunya perbedaan adalah spasi "" mungkin merupakan ide bagus untuk menggantinya dengan "-" dan "/" jika Anda tidak ingin mengizinkannya.
<?php
function slugify($url)
{
$url = trim($url);
$url = str_replace(" ","-",$url);
$url = str_replace("/","-slash-",$url);
$url = rawurlencode($url);
}
?>
URL sebagai dikodekan http://www.hurtta.com/RU/%D0%9F%D1%80%D0%BEDD0%B4%D1%83%D0%BA%D1%82%D1%8B/
Url seperti yang tertulis http://www.hurtta.com/RU/Продукты/
Saya menulis ini berdasarkan tanggapan Maerlyn. Fungsi ini akan berfungsi terlepas dari pengkodean karakter pada halaman. Itu juga tidak akan mengubah tanda kutip tunggal menjadi tanda hubung :)
function slugify ($string) {
$string = utf8_encode($string);
$string = iconv('UTF-8', 'ASCII//TRANSLIT', $string);
$string = preg_replace('/[^a-z0-9- ]/i', '', $string);
$string = str_replace(' ', '-', $string);
$string = trim($string, '-');
$string = strtolower($string);
if (empty($string)) {
return 'n-a';
}
return $string;
}
pada localhost saya semuanya baik-baik saja, tetapi pada server itu membantu saya "set_locale" dan "utf-8" di "mb_strtolower".
<?
setlocale( LC_ALL, "en_US.UTF8" );
function slug( $string )
{
$string = iconv( "utf-8", "us-ascii//translit//ignore", $string ); // transliterate
$string = str_replace( "'", "", $string );
$string = preg_replace( "~[^\pL\d]+~u", "-", $string ); // replace non letter or non digits by "-"
$string = preg_replace( "~[^-\w]+~", "", $string ); // remove unwanted characters
$string = preg_replace( "~-+~", "-", $string ); // remove duplicate "-"
$string = trim( $string, "-" ); // trim "-"
$string = trim( $string ); // trim
$string = mb_strtolower( $string, "utf-8" ); // lowercase
$string = urlencode( $string ); // safe
return $string;
};
?>
Cara paling elegan menurut saya adalah menggunakan Behat \ Transliterator \ Transliterator.
Saya perlu memperluas kelas ini dengan kelas Anda karena ini adalah abstrak, beberapa seperti ini:
<?php
use Behat\Transliterator\Transliterator;
class Urlizer extends Transliterator
{
}
Dan kemudian, gunakan saja:
$text = "Master Ápiu";
$urlizer = new Urlizer();
$slug = $urlizer->transliterate($slug, "-");
echo $slug; // master-apiu
Tentu saja Anda harus meletakkan semua ini di komposer Anda juga.
composer require behat/transliterator
Info lebih lanjut di sini https://github.com/Behat/Transliterator
Ada solusi bagus di sini yang berhubungan dengan karakter khusus juga.
Texto Fantástico => texto-fantastico
function slugify( $string, $separator = '-' ) {
$accents_regex = '~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i';
$special_cases = array( '&' => 'and', "'" => '');
$string = mb_strtolower( trim( $string ), 'UTF-8' );
$string = str_replace( array_keys($special_cases), array_values( $special_cases), $string );
$string = preg_replace( $accents_regex, '$1', htmlentities( $string, ENT_QUOTES, 'UTF-8' ) );
$string = preg_replace("/[^a-z0-9]/u", "$separator", $string);
$string = preg_replace("/[$separator]+/u", "$separator", $string);
return $string;
}
Penulis: Natxet
Ini mungkin cara untuk melakukannya juga. Terinspirasi dari tautan ini, Pertukaran ahli dan alinalexander
function slugifier($txt){
/* Get rid of accented characters */
$search = explode(",","ç,æ,œ,á,é,í,ó,ú,à,è,ì,ò,ù,ä,ë,ï,ö,ü,ÿ,â,ê,î,ô,û,å,e,i,ø,u");
$replace = explode(",","c,ae,oe,a,e,i,o,u,a,e,i,o,u,a,e,i,o,u,y,a,e,i,o,u,a,e,i,o,u");
$txt = str_replace($search, $replace, $txt);
/* Lowercase all the characters */
$txt = strtolower($txt);
/* Avoid whitespace at the beginning and the ending */
$txt = trim($txt);
/* Replace all the characters that are not in a-z or 0-9 by a hyphen */
$txt = preg_replace("/[^a-z0-9]/", "-", $txt);
/* Remove hyphen anywhere it's more than one */
$txt = preg_replace("/[\-]+/", '-', $txt);
return $txt;
}
Karena saya telah melihat banyak metode di sini tetapi saya telah menemukan metode paling sederhana untuk diri saya sendiri. Mungkin itu akan membantu seseorang.
$slug = strtolower(preg_replace('/[^a-zA-Z0-9\-]/', '',preg_replace('/\s+/', '-', $string) ));
Bagi saya varian ini sempurna, juga berubah &
menjadi and
. Ini kode:
function dSlug($string) {
return strtolower(trim(preg_replace('~[^0-9a-z]+~i', '-', html_entity_decode(preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i', '$1',htmlentities(preg_replace('/[&]/', ' and ', $title), ENT_QUOTES, 'UTF-8')), ENT_QUOTES, 'UTF-8')), '-'));
}`