Graph#

Graphs can be used to represent many interesting things about our world, including systems of roads, airline flights from city to city, how the Internet is connected, or even the sequence of classes you must take to complete a major in computer science.

Vocabulary and Definitions#

  • Vertex: A vertex (also called a “node”) is a fundamental part of a graph. It can have a name, which we will call the “key”. A vertex may also have additional information. We will call this additional information the “payload.”

  • Edge: An edge (also called an “arc”) is another fundamental part of a graph. An edge connects two vertices to show that there is a relationship between them. Edges may be one-way or two-way. If the edges in a graph are all one-way, we say that the graph is a directed graph, or a digraph.

  • Weight: Edges may be weighted to show that there is a cost to go from one vertex to another. For example in a graph of roads that connect one city to another, the weight on the edge might represent the distance between the two cities.

  • Path: A path in a graph is a sequence of vertices that are connected by edges. Formally we would define a path as w1,w2,...,wn such that (wi,wi+1) E for all 1 i n−1. The unweighted path length is the number of edges in the path, specifically n−1. The weighted path length is the sum of the weights of all the edges in the path.

  • Cycle: A cycle in a directed graph is a path that starts and ends at the same vertex. A graph with no cycles is called an acyclic graph. A directed graph with no cycles is called a directed acyclic graph(DAG).

A graph can be represented by G where G=(V,E). For the graph G, V is a set of vertices and E is a set of edges. Each edge is a tuple (v,w) where w,v V. We can add a third component to the edge tuple to represent a weight. A subgraph s is a set of edges e and vertices v such that e E and v V.

Types of Graphs#

Undirected Graph#

An undirected graph is a graph in which edges have no orientation. The edge (u, v) is identical to the edge (v, u).

graph LR; A((A)) B((B)) C((C)) D((D)) E((E)) F((F)) B---C; B---D; A---B; A---C; C---D; C---E; D---F; E---F;

Directed Graph (Digraph)#

A directed graph or digraph is a graph in which edges have orientations. For example, the edge (u, v) is the edge from node u to node v.

graph LR; A((A)) B((B)) C((C)) D((D)) E((E)) F((F)) A-->B; A-->C; B-->D; C-->D; D-->E; D-->F; E-->F;

Weighted Graphs#

Many graphs can have edges that contain a certain weight to represent an arbitrary value such as cost, distance, quantity, etc.

graph LR; A((A)) B((B)) C((C)) D((D)) E((E)) F((F)) A-- 4 ---B; A-- 2 ---C; B-- 4 ---D; D-- 1 --- F; B-- 8 ---C; C-- 3 ---D; D-- 2 ---E; D-- 1 ---F; E-- 11 ---F;

Special Graphs#

Trees#

A tree is an undirected graph with no cycles. Equivalently, it is a connected graph with N nodes and N-1 edges.

flowchart TB; subgraph id1 [ ] direction TB a1((A)) a2((B)) a3((C)) a4((D)) a1---a2 a2---a3 a3---a4 end subgraph id2 [ ] direction TB b1((A)) b2((B)) b3((C)) b4((D)) b5((E)) b6((F)) b1---b2 b1---b3 b2---b4 b2---b5 b3---b6 end subgraph id3 [ ] direction LR c1((A)) c2((B)) c3((C)) c4((D)) c5((E)) c6((F)) c7((G)) c8((H)) c9((I)) c1---c3 c2---c3 c3---c4 c3---c5 c3---c6 c4---c7 c5---c9 c5---c8 end

Rooted Trees#

A rooted tree is a tree with a designated root node where every edge either points away from or towards the root node. When edges point away from the root the graph is called an arborescence (out-tree) and anti-arborescence (in-tree) otherwise.

flowchart TB; subgraph ide1 [ ] direction RL a1((A)) a2((B)) a3((C)) a4((D)) a2-->a1 a3-->a1 a4-->a3 end subgraph ide2 [ ] direction TB b1((A)) b2((B)) b3((C)) b4((D)) b5((E)) b6((F)) b7((G)) b1-->b2 b1-->b3 b2-->b4 b2-->b5 b3-->b6 b3-->b7 end subgraph ide3 [ ] direction RL c0((A)) c1((B)) c2((C)) c3((D)) c4((E)) c5((F)) c6((G)) c7((H)) c8((I)) c9((J)) c1-->c0 c2-->c0 c3-->c0 c7-->c0 c4-->c1 c5-->c1 c6-->c1 c8-->c2 c9-->c2 end style a1 fill:#FF8000 style b1 fill:#FF8000 style c0 fill:#FF8000

Directed Acyclic Graphs (DAGs)#

DAGs are directed graphs with no cycles. These graphs play an important role in representing structures with dependencies. Several efficient algorithms exist to operates on DAGs.

Cool fact: All out-trees are DAGs but not all DAGs are out-trees.

flowchart TB; subgraph ide1 [ ] direction LR a1((A)) a2((B)) a3((C)) a4((D)) a5((E)) a6((F)) a7((G)) a8((H)) a9((I)) a1-->a2 a1-->a3 a2-->a4 a2-->a5 a3-->a4 a3-->a5 a4-->a6 a4-->a7 a5-->a8 a5-->a9 end subgraph ide2 [ ] direction TB b1((A)) b2((B)) b3((C)) b4((D)) b5((E)) b6((F)) b7((G)) b1-->b2 b1-->b3 b2-->b4 b2-->b5 b3-->b6 b3-->b7 end

Bipartite Graph#

A bipartite graph is one whose vertices can be split into two independent groups U, V such that every edge connects betweens U and V.

Other definitions exist such as: The graph is two colourable or there is no odd length cycle.

flowchart LR; a1((A)) a2((B)) a3((C)) a4((D)) b1((P)) b2((Q)) b3((R)) b4((S)) a1---b1 a1---b3 a1---b4 a2---b2 a2---b4 a3---b1 a3---b4 a4---b4 style a1 fill:#FF8000 style a2 fill:#FF8000 style a3 fill:#FF8000 style a4 fill:#FF8000 style b1 fill:#FFFFFF style b2 fill:#FFFFFF style b3 fill:#FFFFFF style b4 fill:#FFFFFF

Complete Graphs#

A complete graph is one where there is a unique edge between every pair of nodes. A complete graph with n vertices is denoted as the graph Kn.

flowchart BT; subgraph id1 [K1] a1((A)) end subgraph id2 [K2] b1((A)) b2((B)) b1---b2 end subgraph id3 [K3] c1((A)) c2((B)) c3((C)) c1---c2 c2---c3 c3---c1 end subgraph id4 [K4] d1((A)) d2((B)) d3((C)) d4((D)) d1---d2 d2---d3 d3---d4 d4---d1 d1---d3 d2---d4 end

Representing Graphs#

Adjacency Matrix#

A adjacency matrix m is a very simple way to represent a graph. The idea is that the cell m[i][j] represents the edge weight of going from node i to node j.

Pros

Cons

Space efficient for representing dense graphs

Requires Θ(V²) space

Edge weight lookup is O(1)

Iterating over all edges takes Θ(V²) time

Simplest graph representation

Adjacency List#

An adjacency list is a way to represent a graph as a map from nodes to lists of edges.

Pros

Cons

Space efficient for representing sparse graphs

Less space efficient for denser graphs.

Iterating over all edges is efficient

Edge weight lookup is O(E)

Slightly more complex graph representation

Edge List#

An edge list is a way to represent a graph simply as an unordered list of edges. Assume the notation for any triplet (u,v,w) means: “the cost from node u to node v is w” This representation is seldomly used because of its lack of structure. However, it is conceptually simple and practical in a handful of algorithms.

Pros

Cons

Space efficient for representing sparse graphs

Less space efficient for denser graphs.

Iterating over all edges is efficient

Edge weight lookup is O(E)

Very simple structure

Graph Abstract Data Type#

The graph abstract data type (ADT) is defined as follows:

  • Graph() creates a new, empty graph.

  • add_vertex(vert) adds an instance of Vertex to the graph.

  • add_edge(fromVert, toVert) Adds a new, directed edge to the graph that connects two vertices.

  • add_edge(fromVert, toVert, weight) Adds a new, weighted, directed edge to the graph that connects two vertices.

  • get_vertices() returns the list of all vertices in the graph.

  • in returns True for a statement of the form vertex in graph, if the given vertex is in the graph, False otherwise.

Implementing a Graph in Python#

An abstract data type (ADT) when given a physical implementation then we refer to the implementation as a data structure.

In any object-oriented programming language, the implementation of choice for an abstract data type such as a Graph is the creation of a new class. The Graph operations are implemented as methods.

Using dictionaries, it is easy to implement the adjacency list in Python. In our implementation of the Graph abstract data type we will create a classes Graph, which holds the master list of vertices, and edges.

class Graph:

    def __init__(self):
        self.vertices = set()
        self.num_vertices = 0

        self.edges = {}
        self.num_edges = 0

    def add_vertex(self, v):
        if v not in self.vertices:
            self.vertices.add(v)
            self.num_vertices += 1

    def add_edge(self, v, u, w=None):
        self.add_vertex(v)
        self.add_vertex(u)

        if v not in self.edges:
            self.edges[v] = {u: w}
        else:
            self.edges[v][u] = w

        self.num_edges += 1

    def get_vertices(self):
        return self.vertices

    def get_edges(self, v):
        return self.edges[v].items()


g = Graph()
g.add_edge(0, 1, 5)
g.add_edge(0, 4, 4)
g.add_edge(4, 1, 0)
g.add_edge(4, 3, 1)
g.add_edge(1, 0, 6)
g.add_edge(1, 4, 7)
g.add_edge(1, 3, 8)
g.add_edge(1, 2, 1)
g.add_edge(2, 3, 3)
g.add_edge(3, 4, 2)

for v in g.get_vertices():
    print(f"{v}")
    for u, w in g.get_edges(v):
        print(f" └──> {u}: {w}")
0
 └──> 1: 5
 └──> 4: 4
1
 └──> 0: 6
 └──> 4: 7
 └──> 3: 8
 └──> 2: 1
2
 └──> 3: 3
3
 └──> 4: 2
4
 └──> 1: 0
 └──> 3: 1

Practical usage in python#

dict module provides an implementation of the graph data structure. And it could be used to represent graphs.

graph = {0: [1], 1: [3, 5], 2: [1], 3: [2, 4], 4: [5], 5: [7], 7: [6], 6: [4]}

Topological Sorting#

Topological sort gives the precise order in which we should do each of the steps (visit the graph nodes) in order to achieve our goal.

A topological sort takes a directed acyclic graph (DAG) and produces a linear ordering of all its vertices such that if the graph G contains an edge (v, u) then the vertex v comes before the vertex u in the ordering. Directed acyclic graphs are used in many applications to indicate the precedence of events. The topological sort is a simple but useful adaptation of a depth first search.

The algorithm for the topological sort is as follows:

  1. Call dfs(g) for some graph g. The main reason we want to call depth first search is to compute the finish times for each of the vertices.

  2. Store the vertices in a list in decreasing order of finish time.

  3. Return the ordered list as the result of the topological sort.

from collections import deque


def topological_sort(graph):
    old_graph = graph

    graph = {}
    order = []
    nodes = set()

    for node, edges in old_graph.items():
        nodes.add(node)
        for edge in edges:
            neighbour, weight = edge
            nodes.add(neighbour)
            graph[neighbour] = graph.get(neighbour, set()) | set([node])
    
    nodes_without_prerequisites = nodes - set(graph.keys())

    while nodes_without_prerequisites:
        current_node = nodes_without_prerequisites.pop()

        if current_node in graph:
            del graph[current_node]

        for node, edges in graph.items():
            if current_node in edges:
                edges.remove(current_node)

            if not edges:
                nodes_without_prerequisites.add(node)

        if len(nodes_without_prerequisites) == 0 and len(graph) > 0:
            return []

        order.append(current_node)

    return order


graph = {
    'A': [('B', 4), ('C', 2)],
    'B': [('C', 8), ('D', 4)],
    'C': [('D', 3)],
    'D': [('E', 2), ('F', 1), ('G', 1)],
    'E': [('F', 11)],
    'F': [('I', 2)],
    'G': [('H', 4)],
    'H': [('I', 4)],
}
topological_sort(graph)
['A', 'B', 'C', 'D', 'E', 'G', 'F', 'H', 'I']

Strongly Connected Components#

Strongly Connected Components (SCCs) can be thought of as self-contained cycles within a directed graph where every vertex in a given cycle can reach every other vertex in the same cycle.

Once again we will see that we can create a very powerful and efficient algorithm by making use of a depth first search. Before we tackle the main SCC algorithm we must look at one other definition. The transposition of a graph G is defined as the graph GT where all the edges in the graph have been reversed. That is, if there is a directed edge from node A to node B in the original graph then GT will contain and edge from node B to node A.

Algorithm to compute the strongly connected components for a graph.

  1. Call dfs for the graph G to compute the finish times for each vertex.

  2. Compute GT.

  3. Call dfs for the graph GT but in the main loop of DFS explore each vertex in decreasing order of finish time.

  4. Each tree in the forest computed in step 3 is a strongly connected component. Output the vertex ids for each vertex in each tree in the forest to identify the component.

Shortest Path Problems#

Given a weighted graph, find the shortest path of edges from node A to node B.

graph LR; A((A)) B((B)) C((C)) D((D)) E((E)) F((F)) G((G)) H((H)) I((I)) A-- 4 -->B; A-- 2 -->C; B-- 4 -->D; B-- 8 -->C; C-- 3 -->D; D-- 2 -->E; D-- 1 -->F; D-- 1 -->G; E-- 7 -->F; F-- 2 -->I; G-- 4 -->H; H-- 2 -->I;
from collections import deque


def find_shortest_path(graph, start, end):
    dist = {start: [start]}
    queue = deque(start)

    while queue:
        node = queue.popleft()

        for edge in graph.get(node, []):
            neighbour, weight = edge
            if neighbour not in dist:
                dist[neighbour] = dist[node] + [neighbour]
                queue.append(neighbour)

    return dist.get(end)


graph = {
    'A': [('B', 4), ('C', 2)],
    'B': [('C', 8), ('D', 4)],
    'C': [('D', 3)],
    'D': [('E', 2), ('F', 1), ('G', 1)],
    'E': [('F', 11)],
    'F': [('I', 2)],
    'G': [('H', 4)],
    'H': [('I', 4)],
}

find_shortest_path(graph, 'A', 'I')
['A', 'B', 'D', 'F', 'I']

Dijkstra’s Algorithm#

from heapq import heappush, heappop


def dijkstras(graph, start):
    output = {node: -1 for node in graph.keys()}
    visited = set()
    priority_queue = [(0, start)]

    while priority_queue:
        distance, node = heappop(priority_queue)

        if node in visited:
            continue

        visited.add(node)
        output[node] = distance

        for neighbour, neighbour_distance in graph.get(node, []):
            if neighbour not in visited:
                heappush(priority_queue, (distance +
                         neighbour_distance, neighbour))

    return output


graph = graph = {
    'A': [('B', 4), ('C', 2)],
    'B': [('C', 8), ('D', 4)],
    'C': [('D', 3)],
    'D': [('E', 2), ('F', 1), ('G', 1)],
    'E': [('F', 11)],
    'F': [('I', 2)],
    'G': [('H', 4)],
    'H': [('I', 4)],
}
dijkstras(graph, 'A')
{'A': 0, 'B': 4, 'C': 2, 'D': 5, 'E': 7, 'F': 6, 'G': 6, 'H': 10, 'I': 8}

A* (A Star) Algorithm#

Minimum Spanning Tree (MST)#

A minimum spanning tree (MST) is a subset of the edges of a connected, edge-weighted graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight.

Bonus Material#

  • Dijkstra’s Algorithm

    • By Computerphile

    Dijkstra's Algorithm

  • A (A Star) Search Algorithm*

    • By Computerphile

    Dijkstra's Algorithm