Mengambil sampel secara efisien jalur

14

Mari menjadi grafik, dan biarkan dan dua simpul dari . Bisakah kita secara efisien mencicipi terpendek $G$ $s$ $t$ $G$ $s$ - $t$ secara seragam dan independen secara acak dari rangkaian semua jalur terpendek antara $s$ dan $t$ ? Untuk kesederhanaan, kita dapat menganggap $G$ sederhana, tidak terarah dan tidak berbobot.

Bahkan di banyak grafik terbatas, jumlah jalur terpendek antara $s$ dan $t$ dapat eksponensial dalam ukuran $G$ . Oleh karena itu, kita tentu ingin menghindari sebenarnya menghitung semua jalur $s$ - terpendek $t$ . Saya tidak tahu tentang kasus umum, tetapi bagi saya tampaknya kita dapat mencapai ini untuk beberapa kelas grafik khusus.

Ini terasa seperti sesuatu yang seseorang harus pertimbangkan sebelumnya. Apakah ada penelitian yang ada tentang ini, atau apakah ini sebenarnya mudah dilakukan bahkan untuk grafik umum?

— Juho
sumber

Pertanyaan bagus Juho. Sambil mempertimbangkan jawaban, apa yang Anda benar-benar pahami dengan "mengambil sampel jalur seragam secara acak"? Jika cukup untuk s dan t untuk diambil secara acak pertanyaannya sepele jadi saya kira Anda maksudkan bahwa semua node di jalur terpendek muncul dengan frekuensi (mis., Probabilitas) yang mengikuti distribusi seragam. Atau adakah definisi lain? Khususnya, untuk grafik bipartit, pertanyaan Anda tampaknya sangat mudah, bukan?

— Carlos Linares López

1

@ CarlosLinaresLópez Pertimbangkan untuk mengatakan grafik berlian , dan katakanlah

s

$s$ ada di sisi kanan "tepi vertikal", dan

t

$t$ ada di sisi kiri. Sekarang ada 2 jalur terpendek antara

s

$s$ dan

t

$t$ . Algoritme harus kembali dengan probabilitas yang sama salah satu dari dua jalur ini. Jadi

s

$s$ dan

t

$t$ tidak "diambil secara acak", tetapi mereka diberikan sebagai input. Apakah itu menjelaskan? Dalam hal ini, saya tidak yakin apakah masalahnya sangat mudah untuk grafik bipartit.

— Juho

1

@ CarlosLinaresLópez Dengan kata lain, kita diberi grafik

, dan dua simpul

. Misalkan

adalah himpunan semua jalur terpendek antara dan . Keluarkan elemen secara seragam secara acak.

G

$G$

s, t \in V (G)

$s,t \in V(G)$

S

$S$

s

$s$

t

$t$

S

$S$

— Juho

6

Saya tidak 100% yakin jawaban ini benar, tetapi begini:

Saya pikir Anda dapat mengurangi ini untuk acak sembarang-jalur, dari , dalam DAG dengan satu sumber dan satu wastafel. $s-t$

Diberikan grafik $G$

Membuat digraf kosong baru, . $H$
Pertama: menjalankan BFS bagian dari Dijkstra terpendek-jalan, mulai dari , menandai semua node dengan jarak-dari- terpendek mereka . $s$ $s$
Misalkan menjadi jarak minimum dari ; yang kita tahu dari langkah BFS dari algoritma jalur terpendek Dijkstra. $d(s,v)$ $s-v$
Kemudian lakukan langkah berikutnya dari algoritma jalur terpendek Dijkstra, dapatkan jalur terpendek, simpan di (dengan mundur dari ke ). $\mathbf p$ $t$ $s$
Sekarang mulai loop berikut; perluasan dalam komentar, dan di bawah ini:
- $q_0=\{t\}$
- Sementara
  - $q_1= \emptyset$
  - For
    - So we want to find all possible next nodes for this shortest-subpath from $t-u$
    - For all such that
      - $v$ is a neighboring node, with less $d(s,\cdot)$ (it will be $1$ less)
      - Therefore, $t-u-v$ is possible subpath in a shortest path.
      - Put $v \rightarrow H, \text{di-edge}(u,v)\rightarrow H$
      - Now we need to check $v$ 's lesser-neighbors next turn.
      - Put $v \rightarrow q_1$
  - Set to :
    - $q_0 \leftarrow q_1$

Essentially, I am collecting all possible nodes that can be used in the shortest-path, and placing them in $H$ .

More on how this works:

Dijkstra's shortest-path algorithm works by first running a BFS, and marking all the nodes $v\in G$ with their-shortest paths from $s-v$ . The next step is to go back from $t-s$ , and follow the least-neighboring nodes back.

The thing is, here you can choose any of the least neighboring nodes. What I do here is collect all the least-neighboring nodes each step, which means I account for all the shortest-paths.

Now you quickly think, but hey, why is enumerating them exponential, but my way is not?

The answer is, because I use a set to avoid adding the same nodes twice, I avoid recalculating this for each possible path.

Now we have a DAG that we can traverse in any way from $t-s$ , and obtain a shortest-reversed-path from $s-t$ . The graph should have $t$ as the only source, and $s$ as the only sink.

If the above is correct, then I think we can take this a step further and solve the problem as follows.

Give each node in the DAG a node-weight; the node-weight will be the number of paths from from that node to $s$ . Let us call this $w(v)$ .

You can compute these quickly, see Algorithm that finds the number of simple paths from s to t in G.

Once we have the node-weight, we can uniformly pick a path by:

~~Layout the DAG as a level-structure (for visualization)~~
~~At each level, choose an arbitrary ordering between the nodes ie. a notion of "left to right".~~
Traversing down the DAG: at each step , (where means size-of, in this case, the length of the shortest-path):
- Let $u_i$ be the current node (starting at $t$ )
- Add up all the weights of the children of $u_i$ , and using an RNG, choose one child node, $v_i$ , uniformly between the weighted children.
- Set $u_{i+1} = v_i$ , and go to the next step

— Realz Slaw
sumber

The level-structure, and notion of left-to-right were part of my initial attempt to simply generate

r \in [0, w (t))

$r\in \left[0,w(t)\right)$ , and choose a path that way, but I didn't figure that out, so you can safely ignore them.

— Realz Slaw

1

This answer looks great! I love the ideas! I tried to write it out in a slightly different way (in my answer), as a test of my understanding. In any case, I just wanted to share my appreciation for this lovely answer!

— D.W.

5

Here is a solution based upon the ideas in Realz Slaw's answer. It is basically a re-exposition of his ideas that might be clearer or easier to follow. The plan is that we will proceed in two steps:

First, we will build a graph $S$ with the following property: any path from $s$ to $t$ in $S$ is a shortest path from $s$ to $t$ in $G$ , and every shortest path from $s$ to $t$ in $G$ is also present in $S$ . Thus, $S$ contains exactly the shortest paths in $G$ : all the shortest paths, and nothing more. As it happens, $S$ will be a DAG.
Next, we will sample uniformly at random from all paths from $s$ to $t$ in $S$ .

This approaches generalizes to an arbitrary directed graph $G$ , as long as all edges have positive weight, so I'll explain my algorithm in those terms. Let $w(u,v)$ denote the weight on the edge $u \to v$ . (This generalizes the problem statement you gave. If you have an unweighted graph, just assume every edge has weight 1. If you have an undirected graph, treat each undirected edge $(u,v)$ as the two directed edges $u\to v$ and $v\to u$ .)

Step 1: extract $S$ . Run a single-source shortest-paths algorithm (e.g., Dijkstra's algorithm) on $G$ , starting from source $s$ . For each vertex $v$ in $G$ , let $d(s,v)$ denote the distance from $s$ to $v$ .

Now define the graph $S$ as follows. It consists of every edge $u \to v$ such that (1) $u \to v$ is an edge in $G$ , and (2) $d(s,v) = d(s,u) + w(u,v)$ .

The graph $S$ has some convenient properties:

Every shortest path from $s$ to $t$ in $G$ exists as a path in $S$ : a shortest path $s=v_0,v_1,v_2,\dots,v_k=t$ in $G$ has the property that $d(s,v_{i+1})=d(s,v_i)+w(v_i,v_{i+1})$ , so the edge $v_i \to v_{i+1}$ is present in $S$ .
Every path in $S$ from $s$ to $t$ is a shortest path in $G$ . In particular, consider any path in $S$ from $s$ to $t$ , say $s=v_0,v_1,v_2,\dots,v_k=t$ . Its length is given by the sum of the weights of its edges, namely $\sum_{i=1}^k w(v_{i-1},v_i)$ , but by the definition of $S$ , this sum is $\sum_{i=1}^k (d(s,v_i)-d(s,v_{i-1})$ , which telescopes to $d(s,t)-d(s,s)=d(s,t)$ . Therefore, this path is a shortest path from $s$ to $t$ in $G$ .
Finally, the absence of zero-weight edges in $G$ implies that $S$ is a dag.

Step 2: sample a random path. Now we can throw away the weights on the edges in $S$ , and sample a random path from $s$ to $t$ in $S$ .

To help with this, we will do a precomputation to compute $n(v)$ for each vertex $v$ in $S$ , where $n(v)$ counts the number of distinct paths from $v$ to $t$ . This precomputation can be done in linear time by scanning the vertices of $S$ in topologically sorted order, using the following recurrence relation:

n (v) = \sum_{w \in succ (v)} n (w)

$n(v) = \sum_{w \in \text{succ}(v)} n(w)$

where $\text{succ}(v)$ denotes the successors of $v$ , i.e., $\text{succ}(v) = \{w : v \to w \text{ is an edge in $S$}\}$ , and where we have the base case $n(t)=1$ .

Next, we use the $n(\cdot)$ annotation to sample a random path. We first visit node $s$ . Then, we randomly choose one of the successors of $s$ , with successor $w$ weighted by $n(w)$ . In other words:

choosesuccessor(v):
    n = 0
    for each w in succ(w):
        n = n + n(w)
    r = a random integer between 0 and n-1
    n = 0
    for each w in succ(w):
        n = n + n(w)
        if r < n:
            return w

To choose a random path, we repeatedly iterate this process: i.e., $v_0=s$ , and $v_{i+1} =$ choosesuccessor $(v_i)$ . The resulting path is the desired path, and it will be sampled uniformly at random from all shortest paths from $s$ to $t$ .

Hopefully this helps you understand Realz Slaw's solution more easily. All credit to Realz Slaw for the beautiful and clean solution to this problem!

The one case this doesn't handle is the case where some edges have weight 0 or negative weight. However, the problem is potentially not well-defined in that case, as you can have infinitely many shortest paths.

— D.W.
sumber

Glad you took the time to fully get my answer; I wasn't sure it is correct. Now I am vindicated :D.

— Realz Slaw