Saya punya meja besar entities
dengan ~ 15 juta catatan. Saya ingin menemukan 5 baris teratas yang cocok dengan 'hoki' di name
.
Saya memiliki indeks teks lengkap name
, yang digunakan:gin_ix_entity_full_text_search_name
Pertanyaan:
SELECT "entities".*,
ts_rank(to_tsvector('english', "entities"."name"::text),
to_tsquery('english', 'hockey'::text)) AS "rank0.48661998202865475"
FROM "entities"
WHERE "entities"."place" = 'f'
AND (to_tsvector('english', "entities"."name"::text) @@ to_tsquery('english', 'hockey'::text))
ORDER BY "rank0.48661998202865475" DESC LIMIT 5
Durasi 25.623 ms
Jelaskan rencana 1 Batas (biaya = 12666,89..12666,89 baris = 5 lebar = 3116) 2 -> Sortir (biaya = 12666.89..12670.18 baris = 6571 lebar = 3116) 3 Kunci Urut: (ts_rank (to_tsvector ('english' :: regconfig, (name) :: text), '' 'hockey' '' :: tsquery)) 4 -> Bitmap Heap Scan pada entitas (biaya = 124,06..12645,06 baris = 6571 lebar = 3116) 5 Periksa kembali Cond: (to_tsvector ('english' :: regconfig, (name) :: text) @@ '' 'hockey' '' :: tsquery) 6 Filter: (TIDAK tempat) 7 -> Pemindaian Indeks Bitmap di gin_ix_entity_full_text_search_name (biaya = 0,00..123,74 baris = 6625 lebar = 0) 8 Indeks Cond: (to_tsvector ('english' :: regconfig, (name) :: text) @@ '' 'hockey' '' :: tsquery)
Saya tidak mengerti mengapa memverifikasi kondisi indeks dua kali. (Rencana kueri langkah 4 dan 7). Apakah karena kondisi boolean saya ( not place
)? Jika demikian, haruskah saya menambahkannya ke indeks saya untuk mendapatkan permintaan yang sangat cepat? Atau apakah kondisi penyortiran membuatnya lambat?
EXPLAIN ANALYZE
keluaran:
Batas (biaya = 4447.28..4447.29 baris = 5 lebar = 3116) (waktu aktual = 18509.274..18509.282 baris = 5 putaran = 1) -> Sortir (biaya = 4447,28..4448.41 baris = 2248 lebar = 3116) (waktu aktual = 18509.271..18509.273 baris = 5 putaran = 1) Kunci Sortir: (ts_rank (to_tsvector ('english' :: regconfig, (nama) :: teks), '' 'test' '' :: tsquery)) Metode Sortir: heapsort top-N Memori: 19kB -> Bitmap Heap Scan pada entitas (biaya = 43.31..4439.82 baris = 2.248 lebar = 3116) (waktu aktual = 119.003..18491.408 baris = 2533 loop = 1) Periksa kembali Cond: (to_tsvector ('english' :: regconfig, (name) :: text) @@ '' 'test' '' :: tsquery) Saring: (BUKAN tempat) -> Pemindaian Indeks Bitmap pada gin_ix_entity_full_text_search_name (biaya = 0,00..43,20 baris = 2266 lebar = 0) (waktu aktual = 74,093..74,093 baris = 2593 loop = 1) Indeks Cond: (to_tsvector ('english' :: regconfig, (name) :: text) @@ '' 'test' '' :: tsquery) Total runtime: 18509.381 ms
Berikut adalah parameter DB saya. Di-host oleh Heroku, di layanan Amazon. Mereka menggambarkannya sebagai memiliki 1,7GB RAM, 1 unit pemrosesan dan DB maks 1TB.
nama | pengaturan saat ini ------------------------------ + ------------------- -------------------------------------------------- ------------------------------------ versi | PostgreSQL 9.0.7 pada i486-pc-linux-gnu, dikompilasi oleh GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 32-bit archive_command | test -f /etc/postgresql/9.0/main/wal-ed/ARCHIVING_OFF || envdir /etc/postgresql/9.0/resource29857_heroku_com/wal-ed/env wal-e wal-e-push% p archive_mode | di archive_timeout | 1 menit checkpoint_completion_target | 0,7 checkpoint_segments | 40 client_min_messages | memperhatikan cpu_index_tuple_cost | 0,001 cpu_operator_cost | 0,0005 cpu_tuple_cost | 0,003 effective_cache_size | 1530000kB hot_standby | di lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 listen_addresses | * log_checkpoints | di log_destination | syslog log_line_prefix | % u [KUNING] log_min_duration_statement | 50 ms log_min_messages | memperhatikan logging_collector | di maintenance_work_mem | 64MB max_connections | 500 max_prepared_transactions | 500 max_stack_depth | 2MB max_standby_archive_delay | -1 max_standby_streaming_delay | -1 max_wal_senders | 10 port | random_page_cost | 2 server_encoding | UTF8 shared_buffers | 415MB ssl | di syslog_ident | resource29857_heroku_com Zona Waktu | UTC wal_buffers | 8MB wal_keep_segments | 127 wal_level | hot_standby work_mem | 100MB (39 baris)
EDIT
Sepertinya ORDER BY
bagian yang lambat:
d6ifslbf0ugpu=> EXPLAIN ANALYZE SELECT "entities"."name",
ts_rank(to_tsvector('english', "entities"."name"::text),
to_tsquery('english', 'banana'::text)) AS "rank0.48661998202865475"
FROM "entities"
WHERE (to_tsvector('english', "entities"."name"::text) @@ to_tsquery('english', 'banana'::text))
LIMIT 5;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=43.31..53.07 rows=5 width=24) (actual time=76.583..103.623 rows=5 loops=1)
-> Bitmap Heap Scan on entities (cost=43.31..4467.60 rows=2266 width=24) (actual time=76.581..103.613 rows=5 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, (name)::text) @@ '''banana'''::tsquery)
-> Bitmap Index Scan on gin_ix_entity_full_text_search_name (cost=0.00..43.20 rows=2266 width=0) (actual time=53.592..53.592 rows=1495 loops=1)
Index Cond: (to_tsvector('english'::regconfig, (name)::text) @@ '''banana'''::tsquery)
Total runtime: 103.680 ms
Vs. dengan ORDER BY
:
d6ifslbf0ugpu=> EXPLAIN ANALYZE SELECT "entities"."name",
ts_rank(to_tsvector('english', "entities"."name"::text),
to_tsquery('english', 'banana'::text)) AS "rank0.48661998202865475"
FROM "entities"
WHERE (to_tsvector('english', "entities"."name"::text) @@ to_tsquery('english', 'banana'::text))
ORDER BY "rank0.48661998202865475" DESC
LIMIT 5;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4475.12..4475.13 rows=5 width=24) (actual time=15013.735..15013.741 rows=5 loops=1)
-> Sort (cost=4475.12..4476.26 rows=2266 width=24) (actual time=15013.732..15013.735 rows=5 loops=1)
Sort Key: (ts_rank(to_tsvector('english'::regconfig, (name)::text), '''banana'''::tsquery))
Sort Method: top-N heapsort Memory: 17kB
-> Bitmap Heap Scan on entities (cost=43.31..4467.60 rows=2266 width=24) (actual time=0.872..15006.763 rows=1495 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, (name)::text) @@ '''banana'''::tsquery)
-> Bitmap Index Scan on gin_ix_entity_full_text_search_name (cost=0.00..43.20 rows=2266 width=0) (actual time=0.549..0.549 rows=1495 loops=1)
Index Cond: (to_tsvector('english'::regconfig, (name)::text) @@ '''banana'''::tsquery)
Total runtime: 15013.805 ms
Saya masih sedikit tidak mengerti mengapa ini lebih lambat. Sepertinya ini mengambil jumlah baris yang sama dari Bitmap Heap Scan, tetapi butuh waktu lebih lama?