cards-deck: 100-199_university::111-120_theoretic_cs::111_algorithms_datastructures

| DFS | Depth first search |

Wir möchten uns jetzt eine Form anschauen, die zuerst in die Tiefe sucht, bevor sie die restlichen Knoten betrachten.

Konzept von DFS:

[!Important] Konzeption eines DFS Wir schauen uns einen Startknoten an. Von diesem aus werden wir anschließend einen weiteren, beliebigen Knoten anschauen. Innerhalb diesem werden wir das Prinzip wiederholen und wieder einen beliebigen Knoten anschauen, der erreichbar ist. Dies machen wir bis wir das Ende erreichen. Danach haben wir quasi einen Pfad abgeschlossen und werden jetzt zurückspringen und ab dem vorherigen Knoten den nächsten Nachbar besuchen etc.

Das heißt jetzt:

Von Startknoten immer Nachbar betrachten, das bis zum Ende dessen (keine Nachbarn) und dann mit backtracking immer zurück, um entsprechend einen Pfad zu finden.

[!Error] Wichtig wie oft besuchen wir Knoten mit DFS? #card Wir möchten jeden Knoten maximal 1 besuchen! Würden wir das erlauben, würden wir einen Loop erzeugen, weil wir so nach und nach uns in den gleichen Knoten wiederinden werden und sich alles wiederholt. ^1704708844168

Wir möchten das Ganze noch als PseudoCode betrachten:

PseudoCode:

[!Definition] Iterativer PseudoCode DFS wie bauen wir ihn auf? #card

for all u in V 
  u.color = white // indicates, that it wasnt visited yet 
forall u in V // traversing through all nodes available!
  if u.color = white // we have not used it yet!
  	DFS-visit(G,u) # traversing to next neighbor, if it wasnt visited before 
  	
DFS-Visit(G,u) // function that is actually traversing through graph 
u.color = grey # grey signals "in process"
for all v in Adj(u) ## going trough all neighbors
  if v.color == white 
  	v.pre = u # remember where we came from >> only for analysis
  	DFS-VIIST(G,v) # visiting this neighbor, because it was not visited yet 
u.color = black # once traversed trough all vertices, we can mark this node as visited

^1704708844177

PseudoCode Kaufmann:

[!Definition] Rekursive Beschreibung von DFS wie bauen wir das auf? #card Wir setzen hier $S^{'}$ als impliziten calling-stack der Rekursion auf ( also er gibt an, was wir wann aufrufen bzw wie wir die Graphen durchlaufen) Weiterhin werden dfsnum und compnum die Aufrufs und Abschlussreihenfolge angeben ( also wann wir es aufgerufen / abgeschlossen haben) -> diese Information kann helfen, weil wir somit nachvollziehen können, wann welcher Knoten durchlaufen / aufgerufen wurde. ( also etwa, dass er innerhalb einer anderen aufgerufen wurde und deswegen Nummer $n + 1$ ist)
T = tree-edges // also alle die, mit dem wir einen Baum aufbauen!
F = forward-edges // alle die, wo wir von einem Zwei in einen anderen kommen
B = backwards-edges // alle die, die von einem Zweig zu einem vorherigen Knoten zurückspringen 
C = cross-edges // alle die, die außerhalb oder zu einem Punkt springen 

def DFS(v): Unit = 
  for all ( ( v,w ) in E)
  	if w not in S 
  		S = S mit w // adding w as node that we now have traversed ( because we have to track the ones we have visited already!)
  		dfsnum(w) = count1  // indicating that we have called node _w_ at which step 
  		// this could be at the beginning or later within another recursion --> this helps to keep track when we traverse a node!
  		count1 ++ //
  		DFS(w) // calling recursively to explore within the available neighbors of this node!
  		
  		// once we are done, returning to this state
  		compnum(w) = count2  // indicates when this node was completed / explored completely 
  		count ++ 
  		add (v,w) to T
  	else if ( v --> w ) // also ein Pfad von v nach w existiert ( wir ihn schon kennen):
  		füge (v,w) zu F hinzu
  	else if ( w --> v) // es gibt einen Pfad zurück zum Startpunkt 
  		füge (v,w) zu B 
  	else: 
  		füge  (v,w) zu C 

^1704708844183

| T,F,B,C Partitionen im Graph |

[!Example] Betrachtung von T,F,B,C Kanten in Graph: Consider the following graph: ![[Pasted image 20231125175624.png]] Welche Kanten beschreiben wir hier mit folgenden Termen? T(schwarz) = tree-edges // also alle die, mit dem wir einen Baum aufbauen! F(blau) = forward-edges // alle die, wo wir von einem Zwei in einen anderen kommen B(grün) = backwards-edges // alle die, die von einem Zweig zu einem vorherigen Knoten zurückspringen C(rot) = cross-edges // alle die, die außerhalb oder zu einem Punkt springen
#card ![[Pasted image 20231125175741.png]] ^1704708844190

[!Example] Anwendung des DFS:

![[Pasted image 20221104113634.png]] Unter Betrachtung des obigen Graphen: Wende den DFS an! #card

^1704708844196

Wir haben zuvor schon die vier möglichen Parts betrachten, in die wir Kanten einordnen können, wenn wir DFS laufen lassen. Welche sind das ? #card T = tree-edges // also alle die, mit dem wir einen Baum aufbauen! F = forward-edges // alle die, wo wir von einem Zwei in einen anderen kommen B = backwards-edges // alle die, die von einem Zweig zu einem vorherigen Knoten zurückspringen C = cross-edges // alle die, die außerhalb oder zu einem ^1704708844202

[!Tip] Partition T entspricht dem Aufruf-baum was bedeutet das? #card Mit Partition T meinen wir die tree-edges, also alle Knoten, die bei der Suche einen Baum ergeben haben. Wir können hieraus jetzt ziehen, dass dieser Baum auch dem Ablauf/Aufbau-Baum der Rekursion entspricht ^1704708844208

[!Important] Aussage wenn $\exists P f a d (v, w)$ , wobei beide tree-edges (T) sind was können wir folgern? #card Existiert dieser Pfad dann muss hier folglich dfsnum(v) kleiner als dfsnum(w) sein. Also v wurde früher aufgerufen als w. ^1704708844214

Es gelten jetzt für alle Kanten $(v, w) \in E$ folgende Eigenschaften:

[!Definition] 1. $(v, w) \in T \cup F ⟺$ #card 1. $(v, w) \in T \cup F ⟺$ dfsnum(v) $<$ dfsnum(w) // kommt also vor w! ^1704708844221

[!Definition] 2. $(v, w) \in B ⟺$ #card mit B meinen wir die backwards-edges 2. $(v, w) \in B ⟺$ dfsnum(v) $>$ dfsnum(w) $\land$ compnum(w) $>$ compnum(v) // also v wird später erst gefunden und $w$ wurde früher abgeschlossen ( weil wir quasi bei w sind und danach zu v springen) ^1704708844227

[!Definition] 3. $(v, w) \in C ⟺$ Mit C meinen wir die Querkanten/ cross-edges 3. $(v, w) \in C ⟺$ dfsnum(v) $>$ dfsnum(w) $\land$ compnum(w) $<$ compnum(v) Also wird v zuerst besucht, jedoch beenden wir v viel später als w ( weil wir von zu einem viel komplexeren / tieferen Bereich gesprungen sind!)

Folgerungen:

Wir können aus dieser Betrachtung jetzt zwei wichtige Punkte ableiten:

[!Important] Einteilung in die Edge-partitions wie können wir das realisieren? #card Wir können die Abhängigkeiten zwischen dfsnum und compnum nutzen, um entsprechend einteilen zu können ^1704708844235

[!Important] Folgerung Bei Anwendung von DFS auf DAG? #card Wenden wir einen DFS auf einen DAG an, dann gibt es da keine Backwards-edges. Hierbei gilt dann auch für alle Kanten: $\forall (v, w) \in E : compnum (v) > compnum (w)$

Das heißt: –> $n + 1 - compnum (v), \forall v \in V$ bildet eine topologische Ordnung ^1704708844242

[!Error] DFS dient als alternativer Algorithmus zu Top-Sort

Betrachtung der Laufzeit:

[!Important] Faktoren zur Laufzeitberechnung welche gibt es? #card Die Laufzeit baut sich hier aus zwei wichtigen Faktoren auf:

wir traversieren höchstens 1x durch jeden Knoten

wir schauen uns jede Kante maximal 1x an!

Aus dieser Betrachtung heraus werden wir wohl wahrscheinlich folgende Laufzeit erhalten: $O (∣ V ∣ + ∣ E ∣)$

^1704708844248

Geschwindigkeit mit [[111.13_Graphen_basics#Darstellen als Adjazenzliste|Adjazenzliste]]

Wir möchten die Geschwindigkeit / Laufzeit mit einer Adjazenzliste näher betrachten. Dafür müssen wir diverse Falle betrachten: welche ? #card

time spent in DFS(G) itself without subroutines is $O (∣ V ∣)$ –> we are visiting each node after all
within each node we look at its edges, which is denoted with $d (v) =$ weight of each of given vertex
which is also equal to amount of edges and thus results with $O (d (v))$
- Furthermore all visits together are represented by : $v \in V \sum (O (d (v))) = O (∣ E ∣)$
- Each Edge is only visited once $O (∣ E ∣)$ ^1704708844254

[!Definition] Laufzeit DFS mit Adjazenzliste: womit resultieren wir? #card Wir erhalten eine Laufzeit von $O (∣ V ∣ + ∣ E ∣)$ ^1704708844261

Geschwindigkeit mit [[111.13_Graphen_basics#Darstellen als Matrix|Adjazenzmatrix]]

while the procedure remains the same as for adjacency lists, we have an important change of time required, due to matrix: which? #card Finding neighbors of a vertex requires us to traverse through a row in the matrix, thus taking $O (n)$ Because of this we running time for DFS-visit is now $O (n)$ . ^1704708844267

Mit dieser wichtigen Modifikation ändert sich die Laufzeit dieser Struktur immens:

[!Definition] Laufzeit DFS mit Adjazenzmatrix: womit resultieren wir jetzt? #card Wir erhalten eine neue Laufzeit von $O (n^{2})$ ^1704708844274

Using DFS > Connected components

undirected graphs | scanning for connected components with DFS

in undirected graph, a connected component is a maximal subset A of vertices such that there exists a path between each tow vertices $u, v \in A$ that means that each subset discovered by the DFS-Visit is connected to the preceeding node and thus part of the connected component within the graph.

Observation :: ^1704708844282 DFS-Visit starts in vertex u, then the discovered tree by DFS- Visit (G,u) contains all vertices in the connected component of u.

![[Pasted image 20221107143423.png]]

:: start in one vertices and gather all the other vertices within the component of connected graphs and combine them to be a a component.

directed graph - scanning for connected components with DFS:

component Graph $G^{SCC}$ : Recap definition? #card in a directed graph, a strongly connected component is maximal subset S of vertices such that for each two vertices of S theres a directed path from one to the other and vice versa

graph TD
point1 --> point2;
point2 --> point1;
point3 --> point2;
point2 --> point3;

Definition of a component graph $G^{SCC}$ of directed graph

vertices of $G^{SCC}$ correspond to components of G
there exists an edge between vertices A and B in $G^{SCC}$ if there exists vertices u and v in the connected components represented by A and B such that there is an edge from u to v ![[Pasted image 20221107165934.png]] also to be represented in shorter notation :: ^1704708844311 ![[Pasted image 20221107165956.png]]

Proposition:cted acyclic graph, meaning that there is no loop present, there is no circle

Proof: If G(SCC) would contain a cycle, then all corresponding vertices would be part of a larger SCC, which contradicts the definition of the SCC_{s} itself.

Sink and Source components of a directed Graph :

sink component :: if the corresponding vertex in a $G^{SCC}$ does not have an out-edge >> only directed towards, it does not help with loops. (example would be B and D) ^1704708844323

source component :: if the corresponding $G^{SCC}$ does not have an in-edge, only edges pointing outward (example is A and C) ^1704708844330

![[Pasted image 20221107144356.png]] A component can either be a sink, source or none of them. They cannot be sink and source at the saem time either, unless they are “isolated” - there are no in and out-going edges from the component. A DAG - directed acyclic graph always has at least one source and at least one sink.

running DFS starting in a sink component ::

^1704708844336 :: requirement : sink component B If DFS started on G in vertex $u \in B$ then the tree constructed with DFS-Visit(G,u) covers whole component B > because it cant leave the component, but only traverse within. However starting the DFS-Visit in $u \neq \in B$ in another component, we are able to construct a tree that is greater than the component choosen >>> because it can possibly enter B

![[Pasted image 20221107195110.png]]

Resulting Idea:: ^1704708844342 To discover a SCC we start a DFS within a sink component and then work backwards along $G^{SCC}$ . However we ough to figure out how to find sink holes efficiently first :: ^1704708844347

Finding a sink component within a graph ::

^1704708844353 requirement :: default DFS on directed graph ^1704708844359

defining the discovery time d(u) and finishing time f(u) of a vertex as: discovery time : time when DFS algorithm first visited the node - and marked it grey for in process - $d (A) := mi n_{u \in A} d (u)$ $f ini s hin g t im e : t im e w h e n D FS ma r k s t h e n o d e a s d o n e an df ini s h e d i t - ma r kin g i t b l a c k . -$ f(A) := max_{u\in A}f(u)$

Proposition :: finding sink hole based on time passed on node ::

^1704708844365 Let A and B be two SCCs of a graph G and assume that B is a descendent of A in $G^{SCC}$. Then f(B) < f(A) > B will always be faster than A, because in order to finish A, the DFS ought to finish B first -> traverses into b, taking all depth checks and once done its returning to A. ![[Pasted image 20221107200303.png]]

Proof ::

^1704708844370 Case d(A) < d(B) :: Denote by u the first vertex we visit in A. Then the DFS builds a tree starting from u, and the tree will cover all of B before it is done with u. ^1704708844376 ![[Pasted image 20221107200424.png]] Thus all of B is processed, before A is able to finish its execution

Case d(A) > d(B) :: Denote by u the first vertex we visit in B. ^1704708844382 Then the DFS will first visit all of B before moving ![[Pasted image 20221107200633.png]]

However this result does not conclude, that the vertex with the smallest finishing time is necessarily a sin component.

Attention Statement of Proposition onlys talk about finishing times of components, not individual vertices.

Proposition 12 :: finding a source component within a graph ::

^1704708844388

Assuming that we run a DFS on G ( with any starting vertex v) and recording all finishing times of each vertices. Then the verte with the largest finishing time is in a source component.

Proof :: ^1704708844393

Consequence of previous propositon - finding source component - that the component with the largest finishing time f(A) ought to be a source component.

Converting sources to sinks >> sources can be found easily::

In order to map strongly connected components, we can apply a trick that helps finding the sink components, although we are only able to spot and located source locations so far: Inverting the graph We consider the graph $G^{t} $w hi c hha s t h es am e v er t i ces a s G b u t a ll e d g es a rese t in re v erse dd i rec t i o n s .$ G^{t} $d oes in c l u d e t h es am es t ro n l g yco nn ec t e d co m p o n e n t s a s G d oes, so w ec an e a s i l yse a rc h f or t h eso u rces in$ G^{t}$, as they are equal to the sink components in G. Graph before converting : ![[Pasted image 20221107210128.png]] Graph after converting : ![[Pasted image 20221107210147.png]]

Final Algorithm SCC ::

^1704708844399

General Idea for the algorithm :: ^1704708844405

run a first DFS on G, with any arbitrary starting vertex - not important yet. The verte u* with the largest finishing time f(u) is in the source of the sought $G^{SCC} $> t h ese t o f co m p o n e n t s, t ha tw e a re t ry in g t o l oc a t e a f t er a ll . T hi s m e an s, t ha tt h e v er t e xu * i s ina s ink o f t h ere v erse d G r a p h$ (G^{t})^{SCC} $- s t a r tt h eseco n d D FS o n u * in t h ere v erse d g r a p h$ G^{t} $. T h e d i sco v ere d t reeo f D FS ($ G^{t},u^{*}$) is the first strongly connected component of our Graph.
Continue with a DFS on the vertex V = v* > that has the highest f(u) among the remaining vertices.
this continues until done.

SCC(G)
Call DFS(G) to compute finishing times f(u)
Compute reverse graph Gt
Call DFS(Gt) where the vertices in the main loop are considered in order of decreasing f(u) > start with the largest execution time, because it is a sink component now - by our proposition
Output the subsets that have been discovered by the individual calls of DFS-Visit

Starting with the reversed graph and then running the second DFS on the original graph is also possible, because both graphs share the same subsets

SCC running time ::

Running time when the graph is an adjacency list - faster than matrix as seen [[111.13_Graphen_basics#Runtime analysis of DFS ::|DFS runtime]] ^1704708844411 execution of DFS twice :: $\mathcal{O}(|V| + |E|) $^{1} 704708844417 re v ers in gg r a p h :$ \mathcal{O}(|E|) $or d er v er t i ces a ccor d in g t o f ini s hin g t im es (in t hi se x am pl e a c hi e v ab l e w i t h$ \mathcal{O}(|V|) $ma y v a ry, see n l a t er) >> h o wt o k ee ppt r a c k o ff ini s hin g t im es d u r in g a l g or i t hm * * R es u lt s w i t h :: * *$ \mathcal{O}(|V|+|E|)$ ^1704708844423

is there a more efficient solution / could there be one ?

:: probably not because with this running time we are constant -> scaling with size of graph of course <- already. And we have to traverse trough each edge and vertices at least once so thats the minimum effort we have for the whole graph

Proposition - DFS on directed Graphs ::

^1704708844429

A directed graph has a cycle if and only if its DFS reveals a back edge - an edge that goes from the current vertex to a previously visited vertex.

Proof $\Longleftarrow $: A ss u m e t ha t D FS re v e a l s aba c k e d g e . T hi s m e an s t ha tt h eD FS ha s a l re a d y v i s i t e d u, so m e in t er m e d ia t e v er t i ces$ v_{1},\ldots,v_{k} $an d t h e n v, an d n o w sees a l inkba c k t o u . T h u s t h e g r a p hha s t h ecyc l e u - >$ v_{1},\ldots,v_{k}->u$

Proof $\Longrightarrow $:! [[P a s t e d ima g e 20221107212435. p n g]] A ss u m e t h ere i s a cyc l e f ro m v 0 - > .. v k - > v 0 in t h e g r a p h . L e t$ v_{i} $b e t h e v er t e x w i t h t h es ma ll es t d i sco v ery t im e . A ll o t h er$ v_{j} $a rere a c h e df ro m$ v_{i} $an d a re t h u s d esce n d an t s in t h eD FSt ree . I n p a r t i c u l a r t hi s i s t r u e f or$ v_{i-1} $w hi c hi s g o in g t ore v e a lt h ee d e$ (v_{i-1},v_{i}$ which is a back edge >> thus proof.

Proposition - RUN DFS on a DAG ::

^1704708844435 If we run DFS on a DAG, all graph edges go from larger to smaller finishing times.

DFS on a DAG does not have any back edges Then we always finish the descendants first before finishing their ancestors :: ^1704708844442 ![[Pasted image 20221107212747.png]]

Algorithm for Topological sort ::

^1704708844449

Run DFS with arbitrary starting vertex
If DFS reveals back-edge > no topological sort, abort
If not >> sort vertices by decreasing finishing times.s

scattered-lenity