Percepatan Eksponensial dalam Memori Eksternal

Latar Belakang

Memori eksternal, atau model DAM, menentukan biaya suatu algoritma dengan jumlah I / Os yang dimilikinya (pada dasarnya, jumlah cache yang hilang). Waktu berjalan ini umumnya diberikan dalam hal , ukuran memori, dan , jumlah kata yang dapat ditransfer ke memori pada satu waktu. Kadang-kadang dan digunakan untuk dan masing-masing. $M$ $B$ $L$ $Z$ $B$ $M$

Misalnya, penyortiran membutuhkan biaya dan multiplikasi matriks naif membutuhkan $\Theta(N/B\log_{M/B} N/B)$ $\Theta(n^3/B\sqrt{M})$ .

Model ini digunakan untuk menganalisis "cache-menyadari algoritma", yang tidak memiliki pengetahuan tentang atau . Secara umum tujuannya adalah untuk algoritma cache-terlupakan untuk melakukan secara optimal dalam model memori eksternal; ini tidak selalu mungkin, seperti dalam masalah Permutasi misalnya (ditunjukkan dalam Brodal, Faderberg 2003 ). Lihat artikel ini oleh Erik Demaine untuk penjelasan lebih lanjut tentang algoritma yang tidak memperhatikan cache, termasuk diskusi tentang penyortiran dan perkalian matriks. $B$ $M$

Kita dapat melihat bahwa mengubah menyebabkan percepatan logaritmik untuk menyortir dan percepatan polinomial untuk perkalian matriks. (Hasil ini berasal dari Hong, Kung 1981 $M$ dan sebenarnya mendahului kedua ketidaktahuan cache dan formalisasi model memori eksternal).

Pertanyaan saya adalah ini:

Apakah ada kasus di mana speedup eksponensial dalam ? Waktu berjalan akan menjadi seperti . Saya terutama tertarik pada algoritma cache-lupa atau struktur data yang cocok dengan deskripsi ini tetapi akan senang dengan algoritma cache-aware / struktur data atau bahkan batas bawah yang paling terkenal. $M$ $f(N,B)/2^{O(M)}$

Secara umum diasumsikan dalam kebanyakan model bahwa ukuran kata jika adalah ukuran input dan jelas . Kemudian speedup dari memberikan percepatan polinomial di . Ini membuat saya percaya bahwa jika masalah yang saya cari ada, itu bukan polinomial. (Kalau tidak, kita dapat mengubah ukuran cache dengan konstanta untuk mendapatkan jumlah I / Os yang konstan, yang sepertinya tidak mungkin). $w = \Omega(\log N)$ $N$ $M > w$ $2^M$ $N$

ds.algorithms ds.data-structures cache-oblivious

— SamM
sumber

bisa menebak, tetapi

? ditemukan kasus diberikan sebagai speedup

, cukup?

N =

$N=$

B^{p o l y l o g (B)}

$B^{\rm{polylog}(B)}$

— vzn

Itu harus dalam hal

untuk tujuan saya, sayangnya. Saya akan tertarik pada referensi.

M

$M$

— SamM

wikipedia tentang cache algoritma lupa . fyi ada beberapa kehalusan dalam notasi bidang ini. Catatan kaki p7 dari Demaine mengatakan di daerah ini,

adalah ukuran masalah & kadang-kadang

mana

adalah jumlah blok, "tetapi notasi huruf kecil tampaknya tidak disukai". Anda menggunakan

atas dan sebagai alternatif

tampaknya keduanya sebagai ukuran input. pikir Anda setidaknya harus membakukan dalam pertanyaan Anda.

N

$N$

n = N / B

$n = N/B$

n

$n$

n

$n$

N

$N$

— vzn

I edited it for consistency.

N

$N$ is the size of the input, and

n

$n$ is only used for matrix multiplication because the running time for that problem is generally defined in terms of an

n \times n

$n\times n$ matrix (i.e.

N = n^{2}

$N = n^2$ )

— SamM

jangan melihat kasus ini setelah memindai literatur. mungkin tidak ada referensi seperti itu? mungkin ada beberapa kasus yang harus dibuat bahwa algoritma semacam itu mungkin rumit dan oleh karena itu sulit untuk dianalisis secara teoritis untuk mendapatkan kecepatan seperti itu ...? atau mungkin harus dibuat-buat ...? atau, mungkin itu tidak mungkin? mungkinkah ada gagasan bahwa akses acak ke memori adalah kasus terburuk yang mungkin terjadi? Sepertinya peningkatan untuk kecepatan linier dalam

untuk kasus itu ...? atau, mungkin beberapa pola fraktal akses ke memori adalah kasus terburuk? jalur studi ini hanya sedikit lebih dari satu dekade ....

M

$M$

— vzn

Jawaban:

I am not sure I understood the question. But it seems to me that under the assumption that $PSPACE$ contains problems requiring exponential time, such problems would fulfill your requirements, since if $M$ is $O(\log N)$ you would need an exponential number of I/O operations (since you can't "stay" in the same memory block of size $O( \log N)$ more than a polynomial number of steps without going into a cycle) and if $M=N$ $P$ $M$ $\Omega(N)$ $M$ , intuitively I believe this is not true, because I feel it should be possible to design a problem that is in fact the concatenation of smaller problems of size $O(\log N)$ each requiring exponential time in its own size, and an exponential number of I/O operations (that is, $poly (N)$ , since $poly(N)$ is exponential in $O(\log N)$ ). In practice I believe $PSPACE$ -complete problems such as $TQBF$ fulfill your condition.

— user8477
sumber

I'm afraid I don't follow some of your arguments. If

M = Ω (N)

$M= \Omega(N)$ , any problem in external memory is trivial. As I mentioned,

M = O (\log N)

$M = O(\log N)$ is somewhat silly as that means memory has only a constant number of words (possible but not how external memory is generally examined). I see what you're saying that there was exponential gain, but that doesn't say anything about intermediate values. For example, optimal Partition in external memory is the min of two terms (essentially if everything fits in memory we do something entirely different than if it doesn't). Can you rule that out?

— SamM

I don't understand why any problem in external memory is trivial if

M = Ω (N)

$M = \Omega(N)$ . If

M = N / 2

$M=N/2$ and the algorithm takes, exponential time, you might be forced to swap back and forth between (so to speak) the two halves of the memory an exponential number of times.

— user8477

Ah, of course you're right about the constant factor being important. That makes a lot of sense; this is certainly a good starting point.

— SamM

I don't have any argument for intermediate values. At a very superficial level, I guess that some backtracking algorithms would have an exponential dependence on the memory size because I/O operations would be required at nodes of lower depth in the search tree. This dependence would apply to intermediate values. This says nothing about the inherent complexity of the problem, of course. Also, if you have

M = ω (\log N)

$M=\omega(\log N)$ , the pigeonhole (cycling) argument given above would still yield a gain of

T (N) / 2^{M}

$T(N)/2^M$ where

T (N)

$T(N)$ is the time complexity of the problem.

— user8477

-4

this question appears to veer into terra incognita ie unexplored territory. after some thinking & review here are some thoughts/ideas on this apparently very difficult/subtle question which hopefully will be intelligent but are not meant to be definitive.

Demain who you cite writes, "the principle idea [of cache oblivious algorithms] is simple: design external-memory algorithms without knowing $B$ and $M$ . But this simple idea has several surprisingly powerful consequences."

it also appears to have surprisingly subtle implications. it appears hard to analyze this model for extremal behavior and it appears nobody has done this so far. there are some surveys and they seem to survey the entire field so far. this is not surprising given the field is only ~1 decade old.

the model does not appear to have been translated to Turing machines yet where more strict/formal/theoretical/general/standard analysis may be possible. the whole concept of Big-Oh notation and its implications and intuitions such as irrelevance of constants is not necessarily automatically carrying in this area of cache oblivious algorithms. for example the model already seems to be working with constants $B,M$ which measure exact memory sizes. it appears the field does not have a set of basic axioms of its dynamics so far.
TMs were invented in 1936 by Turing and the Hartmanis-Stearns time/space hierarchy theorems (which you are somewhat alluding to in this question) were discovered in 1965. thats a remarkable ~3 decades yet the time/space hierarchy theorems are now considered somewhat elementary and taught in undergraduate classes. there does not seem to be corresponding hierarchy theorems established in this model yet, and how to do that is not a trivial problem. this model actually seems to have more "moving parts" than the standard Turing machine (which already has a rather fiendishly complex dynamics), ie is like an augmented Turing machine.
another idea is to convert this external memory model to TMs somehow, which again does not appear to show up in the literature/surveys on cache oblivious algorithms, and maybe see how the Hartmanis-Stearns hierarchy theorems could translate. it appears to relate to a two tape TM where one tape is size 'M' and the other tape is infinite, and blocks are transferred to 'M' in sizes of 'B'. but also, a difficulty here is it is more of a RAM model than the Turing model which uses sequential access to the tape. on the other hand this could run into subtleties associated with conversions between single and multitape TMs.
suggest attacking this problem empirically with simulations which is where the literature tends to focus on eg as in this great survey by Kumar on Cache oblivious algorithms (2003). there are many programs and papers online for cache simulations and one could possibly answer your question without a large amount of code, using some simplifications. one basic idea is to create probabilistic algorithms which access "nearby" areas of memory based on probabilities. "nearby" here is not necessarily contiguous memory, but instead an algorithm that selects random memory pages (blocks) based on keeping track of its own most recently accessed pages. suggest using power laws to determine the probability of selecting "near" pages in the "most recently accessed" list. this appears to be the key aspect that cache-based performance improvements are related to.
heres a rough argument based on 'M' which is basically a measure of "core memory" versus disk memory. for an algorithm, one would expect as core memory increases, one only comes close [asymptotically] to a linear improvement in algorithmic speed, and adding "core memory" to get any super-linear increase in algorithm speed seems intuitively almost like exceeding the speed limit and "getting something for nothing". however, dont grasp the model well enough to prove this [but apparently no one else has either, including the authorities/founders/experts].
this cache-oblivious algorithm literature has focused on P algorithms and did not see any case at all of an analysis of a non-P algorithm so far and perhaps none exists. this could possibly be because the analysis is too difficult! in which case, questions about extremal behavior may go unanswered in this field in the long term.
it does not even seem intuitively clear, or possibly at all studied yet, about how "small" complexity algorithms such as in L, or "big" algorithms such as non-P (eg Expspace etc) behave in this model. the model is measuring memory locality, which seems to be fundamentally different, but intertwined with, time and space hierarchies.
there is a somewhat obscure measurement of Turing machine complexity called "reversal complexity" with some study, results, and papers that might be related. basically the TM can cover a certain amount of time and space, but reversals are a somewhat independent measurement of the tape head during the computation. it seems that "high reversals" might be related to "high memory locality" because in both cases the tape head is tending to stay in a "smaller" region versus moving into larger regions.
this question and model reminds me of Amdahls law and suspect some kind of yet-undiscovered similar law related to diminishing returns or balances/tradeoffs might hold or be applicable in this area. rough reasoning: cache oblivious algorithm theory is looking at a balance/tradeoff between a finite "local" memory and a cost-based external "infinite" disk. this is basically two resources that behave "in parallel" and there is likely some kind of optimal tradeoff between them.

— vzn
sumber

In your first point, what would "more theoretical analysis" even mean? A TM doesn't have a

k

$k$ -level memory hierarchy. The ideal-cache model is easy and natural to work with, and has been experimentally proven to be a useful model with respect to memory behavior on real systems.

— Juho

the TM model is the basic model of TCS and "bridge thms" between its complexity hierarchy (time/space, basic complexity classes such as P/NP etc) with cache oblivious algorithms apparently remain to be mapped out. the external memory models & related cache oblivious models are basically attempting to model real-world performance characteristics & the literature is not so far interested in greater theoretical abstractions such as asked by the question.

— vzn