Graph

Graph#

Graphs can be used to represent many interesting things about our world, including systems of roads, airline flights from city to city, how the Internet is connected, or even the sequence of classes you must take to complete a major in computer science.

Vocabulary and Definitions#

Vertex: A vertex (also called a “node”) is a fundamental part of a graph. It can have a name, which we will call the “key”. A vertex may also have additional information. We will call this additional information the “payload.”
Edge: An edge (also called an “arc”) is another fundamental part of a graph. An edge connects two vertices to show that there is a relationship between them. Edges may be one-way or two-way. If the edges in a graph are all one-way, we say that the graph is a directed graph, or a digraph.
Weight: Edges may be weighted to show that there is a cost to go from one vertex to another. For example in a graph of roads that connect one city to another, the weight on the edge might represent the distance between the two cities.
Path: A path in a graph is a sequence of vertices that are connected by edges. Formally we would define a path as w1,w2,...,wn such that (wi,wi+1) ∈ E for all 1 ≤ i ≤ n−1. The unweighted path length is the number of edges in the path, specifically n−1. The weighted path length is the sum of the weights of all the edges in the path.
Cycle: A cycle in a directed graph is a path that starts and ends at the same vertex. A graph with no cycles is called an acyclic graph. A directed graph with no cycles is called a directed acyclic graph(DAG).

A graph can be represented by G where G=(V,E). For the graph G, V is a set of vertices and E is a set of edges. Each edge is a tuple (v,w) where w,v ∈ V. We can add a third component to the edge tuple to represent a weight. A subgraph s is a set of edges e and vertices v such that e ⊂ E and v ⊂ V.

Types of Graphs#

Undirected Graph#

An undirected graph is a graph in which edges have no orientation. The edge (u, v) is identical to the edge (v, u).

graph LR; A((A)) B((B)) C((C)) D((D)) E((E)) F((F)) B---C; B---D; A---B; A---C; C---D; C---E; D---F; E---F;

Directed Graph (Digraph)#

A directed graph or digraph is a graph in which edges have orientations. For example, the edge (u, v) is the edge from node u to node v.

graph LR; A((A)) B((B)) C((C)) D((D)) E((E)) F((F)) A-->B; A-->C; B-->D; C-->D; D-->E; D-->F; E-->F;

Weighted Graphs#

Many graphs can have edges that contain a certain weight to represent an arbitrary value such as cost, distance, quantity, etc.

graph LR; A((A)) B((B)) C((C)) D((D)) E((E)) F((F)) A-- 4 ---B; A-- 2 ---C; B-- 4 ---D; D-- 1 --- F; B-- 8 ---C; C-- 3 ---D; D-- 2 ---E; D-- 1 ---F; E-- 11 ---F;

Special Graphs#

Trees#

A tree is an undirected graph with no cycles. Equivalently, it is a connected graph with N nodes and N-1 edges.

flowchart TB; subgraph id1 [ ] direction TB a1((A)) a2((B)) a3((C)) a4((D)) a1---a2 a2---a3 a3---a4 end subgraph id2 [ ] direction TB b1((A)) b2((B)) b3((C)) b4((D)) b5((E)) b6((F)) b1---b2 b1---b3 b2---b4 b2---b5 b3---b6 end subgraph id3 [ ] direction LR c1((A)) c2((B)) c3((C)) c4((D)) c5((E)) c6((F)) c7((G)) c8((H)) c9((I)) c1---c3 c2---c3 c3---c4 c3---c5 c3---c6 c4---c7 c5---c9 c5---c8 end

Rooted Trees#

A rooted tree is a tree with a designated root node where every edge either points away from or towards the root node. When edges point away from the root the graph is called an arborescence (out-tree) and anti-arborescence (in-tree) otherwise.

flowchart TB; subgraph ide1 [ ] direction RL a1((A)) a2((B)) a3((C)) a4((D)) a2-->a1 a3-->a1 a4-->a3 end subgraph ide2 [ ] direction TB b1((A)) b2((B)) b3((C)) b4((D)) b5((E)) b6((F)) b7((G)) b1-->b2 b1-->b3 b2-->b4 b2-->b5 b3-->b6 b3-->b7 end subgraph ide3 [ ] direction RL c0((A)) c1((B)) c2((C)) c3((D)) c4((E)) c5((F)) c6((G)) c7((H)) c8((I)) c9((J)) c1-->c0 c2-->c0 c3-->c0 c7-->c0 c4-->c1 c5-->c1 c6-->c1 c8-->c2 c9-->c2 end style a1 fill:#FF8000 style b1 fill:#FF8000 style c0 fill:#FF8000

Directed Acyclic Graphs (DAGs)#

DAGs are directed graphs with no cycles. These graphs play an important role in representing structures with dependencies. Several efficient algorithms exist to operates on DAGs.

Cool fact: All out-trees are DAGs but not all DAGs are out-trees.

flowchart TB; subgraph ide1 [ ] direction LR a1((A)) a2((B)) a3((C)) a4((D)) a5((E)) a6((F)) a7((G)) a8((H)) a9((I)) a1-->a2 a1-->a3 a2-->a4 a2-->a5 a3-->a4 a3-->a5 a4-->a6 a4-->a7 a5-->a8 a5-->a9 end subgraph ide2 [ ] direction TB b1((A)) b2((B)) b3((C)) b4((D)) b5((E)) b6((F)) b7((G)) b1-->b2 b1-->b3 b2-->b4 b2-->b5 b3-->b6 b3-->b7 end

Bipartite Graph#

A bipartite graph is one whose vertices can be split into two independent groups U, V such that every edge connects betweens U and V.

Other definitions exist such as: The graph is two colourable or there is no odd length cycle.

flowchart LR; a1((A)) a2((B)) a3((C)) a4((D)) b1((P)) b2((Q)) b3((R)) b4((S)) a1---b1 a1---b3 a1---b4 a2---b2 a2---b4 a3---b1 a3---b4 a4---b4 style a1 fill:#FF8000 style a2 fill:#FF8000 style a3 fill:#FF8000 style a4 fill:#FF8000 style b1 fill:#FFFFFF style b2 fill:#FFFFFF style b3 fill:#FFFFFF style b4 fill:#FFFFFF

Complete Graphs#

A complete graph is one where there is a unique edge between every pair of nodes. A complete graph with n vertices is denoted as the graph Kn.

flowchart BT; subgraph id1 [K1] a1((A)) end subgraph id2 [K2] b1((A)) b2((B)) b1---b2 end subgraph id3 [K3] c1((A)) c2((B)) c3((C)) c1---c2 c2---c3 c3---c1 end subgraph id4 [K4] d1((A)) d2((B)) d3((C)) d4((D)) d1---d2 d2---d3 d3---d4 d4---d1 d1---d3 d2---d4 end

Representing Graphs#

Adjacency Matrix#

A adjacency matrix m is a very simple way to represent a graph. The idea is that the cell m[i][j] represents the edge weight of going from node i to node j.

Pros	Cons
Space efficient for representing dense graphs	Requires Θ(V²) space
Edge weight lookup is O(1)	Iterating over all edges takes Θ(V²) time
Simplest graph representation

Adjacency List#

An adjacency list is a way to represent a graph as a map from nodes to lists of edges.

Pros	Cons
Space efficient for representing sparse graphs	Less space efficient for denser graphs.
Iterating over all edges is efficient	Edge weight lookup is O(E)
	Slightly more complex graph representation

Edge List#

An edge list is a way to represent a graph simply as an unordered list of edges. Assume the notation for any triplet (u,v,w) means: “the cost from node u to node v is w” This representation is seldomly used because of its lack of structure. However, it is conceptually simple and practical in a handful of algorithms.

Pros	Cons
Space efficient for representing sparse graphs	Less space efficient for denser graphs.
Iterating over all edges is efficient	Edge weight lookup is O(E)
Very simple structure

Graph Abstract Data Type#

The graph abstract data type (ADT) is defined as follows:

Graph() creates a new, empty graph.
add_vertex(vert) adds an instance of Vertex to the graph.
add_edge(fromVert, toVert) Adds a new, directed edge to the graph that connects two vertices.
add_edge(fromVert, toVert, weight) Adds a new, weighted, directed edge to the graph that connects two vertices.
get_vertices() returns the list of all vertices in the graph.
in returns True for a statement of the form vertex in graph, if the given vertex is in the graph, False otherwise.

Implementing a Graph in Python#

An abstract data type (ADT) when given a physical implementation then we refer to the implementation as a data structure.

In any object-oriented programming language, the implementation of choice for an abstract data type such as a Graph is the creation of a new class. The Graph operations are implemented as methods.

Using dictionaries, it is easy to implement the adjacency list in Python. In our implementation of the Graph abstract data type we will create a classes Graph, which holds the master list of vertices, and edges.

class Graph:

    def __init__(self):
        self.vertices = set()
        self.num_vertices = 0

        self.edges = {}
        self.num_edges = 0

    def add_vertex(self, v):
        if v not in self.vertices:
            self.vertices.add(v)
            self.num_vertices += 1

    def add_edge(self, v, u, w=None):
        self.add_vertex(v)
        self.add_vertex(u)

        if v not in self.edges:
            self.edges[v] = {u: w}
        else:
            self.edges[v][u] = w

        self.num_edges += 1

    def get_vertices(self):
        return self.vertices

    def get_edges(self, v):
        return self.edges[v].items()


g = Graph()
g.add_edge(0, 1, 5)
g.add_edge(0, 4, 4)
g.add_edge(4, 1, 0)
g.add_edge(4, 3, 1)
g.add_edge(1, 0, 6)
g.add_edge(1, 4, 7)
g.add_edge(1, 3, 8)
g.add_edge(1, 2, 1)
g.add_edge(2, 3, 3)
g.add_edge(3, 4, 2)

for v in g.get_vertices():
    print(f"{v}")
    for u, w in g.get_edges(v):
        print(f" └──> {u}: {w}")

0
 └──> 1: 5
 └──> 4: 4
1
 └──> 0: 6
 └──> 4: 7
 └──> 3: 8
 └──> 2: 1
2
 └──> 3: 3
3
 └──> 4: 2
4
 └──> 1: 0
 └──> 3: 1

Practical usage in python#

dict module provides an implementation of the graph data structure. And it could be used to represent graphs.

graph = {0: [1], 1: [3, 5], 2: [1], 3: [2, 4], 4: [5], 5: [7], 7: [6], 6: [4]}

Breadth First Search#

Breadth first search (BFS. is one of the easiest algorithms for searching a graph. It also serves as a prototype for several other important graph algorithms.

Given a graph G and a starting vertex s, a breadth first search proceeds by exploring edges in the graph to find all the vertices in G for which there is a path from s. The remarkable thing about a breadth first search is that it finds all the vertices that are a distance k from s before it finds any vertices that are a distance k+1.

Breadth First Search Analysis#

Breadth First Search loops one time for each vertex in the graph |V|, this gives us O(V) for the outer loop. The inner loop, which is nested is executed at most once for each edge in the graph, |E|. The reason is that every vertex is dequeued at most once and we examine an edge from node u to node v only when node u is dequeued. This gives us O(E. for the for loop, combining the two loops gives us O(V+E).

from collections import deque

graph = {0: [1], 1: [3], 2: [1], 3: [2, 4], 4: [5], 5: [7], 7: [6], 6: [4]}


def bfs(graph, start):
    path = []
    seen = set()
    queue = deque([start])

    while queue:
        v = queue.popleft()

        if v not in seen:
            seen.add(v)
            path.append(v)
            queue.extend(graph[v])

    return path


bfs(graph, 0)

[0, 1, 3, 2, 4, 5, 7, 6]

Applications of Breadth First Search#

Shortest Path and Minimum Spanning Tree for unweighted graph In an unweighted graph, the shortest path is the path with least number of edges. With Breadth First, we always reach a vertex from given source using the minimum number of edges. Also, in case of unweighted graphs, any spanning tree is Minimum Spanning Tree.
Peer to Peer Networks. In Peer to Peer Networks like BitTorrent, Breadth First Search is used to find all neighbor nodes.
Crawlers in Search Engines.
Social Networking Websites: In social networks, we can find people within a given distance ‘k’ from a person using Breadth First Search till ‘k’ levels.
GPS Navigation systems: Breadth First Search is used to find all neighboring locations.
Broadcasting in Network: In networks, a broadcasted packet follows Breadth First Search to reach all nodes.
In Garbage Collection: Breadth First Search is used in copying garbage collection using Cheney’s algorithm. Refer this and for details. Breadth First Search is preferred over Depth First Search because of better locality of reference.

def find_shortest_path(graph, start, end):
    dist = {start: [start]}
    queue = deque(start)
    while queue:
        node = queue.popleft()
        for neighbor in graph.get(node, []):
            if neighbor not in dist:
                dist[neighbor] = dist[node] + [neighbor]
                queue.append(neighbor)
    return dist.get(end)

find_shortest_path(graph, 1, 5)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 12
      9                 queue.append(neighbor)
     10     return dist.get(end)
---> 12 find_shortest_path(graph, 1, 5)

Cell In[3], line 3, in find_shortest_path(graph, start, end)
      1 def find_shortest_path(graph, start, end):
      2     dist = {start: [start]}
----> 3     queue = deque(start)
      4     while queue:
      5         node = queue.popleft()

TypeError: 'int' object is not iterable

Depth First Search#

Given a graph G and a starting vertex s, a depth first search proceeds by exploring edges in the graph to create the deepest depth first tree, without any branches all the vertices in G for which there is a path from s. It is even possible that a depth first search will create more than one tree. When the depth first search algorithm creates a group of trees we call this a depth first forest.

Depth First Search Analysis#

Depth First Search loops one time for each vertex in the graph |V|, this gives us O(V) for the outer loop. The inner loop, which is nested is executed at most once for each edge in the graph, |E|. The reason is that every vertex is pushed at most once and we examine an edge from node u to node v only when node u is poped. This gives us O(E) for the for loop, combining the two loops gives us O(V+E).

from collections import deque

graph = {0: [1], 1: [3], 2: [1], 3: [2, 4], 4: [5], 5: [7], 7: [6], 6: [4]}


def dfs(graph, start):
    path = []
    seen = set()
    stack = deque([start])

    while stack:
        v = stack.pop()

        if v not in seen:
            seen.add(v)
            path.append(v)
            stack.extend(graph[v])

    return path


dfs(graph, 0)

[0, 1, 3, 4, 5, 7, 6, 2]

Applications of Depth First Search#

For a weighted graph, DFS traversal of the graph produces the minimum spanning tree and all pair shortest path tree.
Detecting Cycle: A graph has cycle if and only if we see a back edge during DFS. So we can run DFS for the graph and check for back edges. (See this for details)
Path Finding: DFS algorithm can be modified to find a path between two given vertices u and z.
1. Call DFS(G, u) with u as the start vertex.
2. Use a stack S to keep track of the path between the start vertex and the current vertex.
3. As soon as destination vertex z is encountered, return the path as the contents of the stack
Topological Sorting: used for scheduling jobs from the given dependencies among jobs.
Test if a graph is bipartite We can augment either BFS or DFS when we first discover a new vertex, color it opposited its parents, and for each other edge, check it doesn’t link two vertices of the same color. The first vertex in any connected component can be red or black! See this for details.
Finding Strongly Connected Components of a graph A directed graph is called strongly connected if there is a path from each vertex in the graph to every other vertex. (See this for DFS based algo for finding Strongly Connected Components)
Solving puzzles with only one solution, such as mazes. (DFS can be adapted to find all solutions to a maze by only including nodes on the current path in the visited set.)

Topological Sorting#

Topological sort gives the precise order in which we should do each of the steps (visit the graph nodes) in order to achieve our goal.

A topological sort takes a directed acyclic graph (DAG) and produces a linear ordering of all its vertices such that if the graph G contains an edge (v, u) then the vertex v comes before the vertex u in the ordering. Directed acyclic graphs are used in many applications to indicate the precedence of events. The topological sort is a simple but useful adaptation of a depth first search.

The algorithm for the topological sort is as follows:

Call dfs(g) for some graph g. The main reason we want to call depth first search is to compute the finish times for each of the vertices.
Store the vertices in a list in decreasing order of finish time.
Return the ordered list as the result of the topological sort.

from collections import deque


def topological_sort(graph):
    old_graph = graph

    graph = {}
    order = []
    nodes = set()

    for node, edges in old_graph.items():
        nodes.add(node)
        for edge in edges:
            neighbour, weight = edge
            nodes.add(neighbour)
            graph[neighbour] = graph.get(neighbour, set()) | set([node])
    
    nodes_without_prerequisites = nodes - set(graph.keys())

    while nodes_without_prerequisites:
        current_node = nodes_without_prerequisites.pop()

        if current_node in graph:
            del graph[current_node]

        for node, edges in graph.items():
            if current_node in edges:
                edges.remove(current_node)

            if not edges:
                nodes_without_prerequisites.add(node)

        if len(nodes_without_prerequisites) == 0 and len(graph) > 0:
            return []

        order.append(current_node)

    return order


graph = {
    'A': [('B', 4), ('C', 2)],
    'B': [('C', 8), ('D', 4)],
    'C': [('D', 3)],
    'D': [('E', 2), ('F', 1), ('G', 1)],
    'E': [('F', 11)],
    'F': [('I', 2)],
    'G': [('H', 4)],
    'H': [('I', 4)],
}
topological_sort(graph)

['A', 'B', 'C', 'D', 'E', 'G', 'F', 'H', 'I']

Strongly Connected Components#

Strongly Connected Components (SCCs) can be thought of as self-contained cycles within a directed graph where every vertex in a given cycle can reach every other vertex in the same cycle.

Once again we will see that we can create a very powerful and efficient algorithm by making use of a depth first search. Before we tackle the main SCC algorithm we must look at one other definition. The transposition of a graph G is defined as the graph GT where all the edges in the graph have been reversed. That is, if there is a directed edge from node A to node B in the original graph then GT will contain and edge from node B to node A.

Algorithm to compute the strongly connected components for a graph.

Call dfs for the graph G to compute the finish times for each vertex.
Compute GT.
Call dfs for the graph GT but in the main loop of DFS explore each vertex in decreasing order of finish time.
Each tree in the forest computed in step 3 is a strongly connected component. Output the vertex ids for each vertex in each tree in the forest to identify the component.

Shortest Path Problems#

Given a weighted graph, find the shortest path of edges from node A to node B.

graph LR; A((A)) B((B)) C((C)) D((D)) E((E)) F((F)) G((G)) H((H)) I((I)) A-- 4 -->B; A-- 2 -->C; B-- 4 -->D; B-- 8 -->C; C-- 3 -->D; D-- 2 -->E; D-- 1 -->F; D-- 1 -->G; E-- 7 -->F; F-- 2 -->I; G-- 4 -->H; H-- 2 -->I;

from collections import deque


def find_shortest_path(graph, start, end):
    dist = {start: [start]}
    queue = deque(start)

    while queue:
        node = queue.popleft()

        for edge in graph.get(node, []):
            neighbour, weight = edge
            if neighbour not in dist:
                dist[neighbour] = dist[node] + [neighbour]
                queue.append(neighbour)

    return dist.get(end)


graph = {
    'A': [('B', 4), ('C', 2)],
    'B': [('C', 8), ('D', 4)],
    'C': [('D', 3)],
    'D': [('E', 2), ('F', 1), ('G', 1)],
    'E': [('F', 11)],
    'F': [('I', 2)],
    'G': [('H', 4)],
    'H': [('I', 4)],
}

find_shortest_path(graph, 'A', 'I')

['A', 'B', 'D', 'F', 'I']

Dijkstra’s Algorithm#

from heapq import heappush, heappop


def dijkstras(graph, start):
    output = {node: -1 for node in graph.keys()}
    visited = set()
    priority_queue = [(0, start)]

    while priority_queue:
        distance, node = heappop(priority_queue)

        if node in visited:
            continue

        visited.add(node)
        output[node] = distance

        for neighbour, neighbour_distance in graph.get(node, []):
            if neighbour not in visited:
                heappush(priority_queue, (distance +
                         neighbour_distance, neighbour))

    return output


graph = graph = {
    'A': [('B', 4), ('C', 2)],
    'B': [('C', 8), ('D', 4)],
    'C': [('D', 3)],
    'D': [('E', 2), ('F', 1), ('G', 1)],
    'E': [('F', 11)],
    'F': [('I', 2)],
    'G': [('H', 4)],
    'H': [('I', 4)],
}
dijkstras(graph, 'A')

{'A': 0, 'B': 4, 'C': 2, 'D': 5, 'E': 7, 'F': 6, 'G': 6, 'H': 10, 'I': 8}

A* (A Star) Algorithm#

Minimum Spanning Tree (MST)#

A minimum spanning tree (MST) is a subset of the edges of a connected, edge-weighted graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight.

Bonus Material#

Dijkstra’s Algorithm
- By Computerphile
A (A Star) Search Algorithm*
- By Computerphile