Graph#
Graphs can be used to represent many interesting things about our world, including systems of roads, airline flights from city to city, how the Internet is connected, or even the sequence of classes you must take to complete a major in computer science.
Vocabulary and Definitions#
Vertex: A vertex (also called a “node”) is a fundamental part of a graph. It can have a name, which we will call the “key”. A vertex may also have additional information. We will call this additional information the “payload.”
Edge: An edge (also called an “arc”) is another fundamental part of a graph. An edge connects two vertices to show that there is a relationship between them. Edges may be one-way or two-way. If the edges in a graph are all one-way, we say that the graph is a directed graph, or a digraph.
Weight: Edges may be weighted to show that there is a cost to go from one vertex to another. For example in a graph of roads that connect one city to another, the weight on the edge might represent the distance between the two cities.
Path: A path in a graph is a sequence of vertices that are connected by edges. Formally we would define a path as
w1,w2,...,wn
such that(wi,wi+1) ∈ E
for all1 ≤ i ≤ n−1
. The unweighted path length is the number of edges in the path, specifically n−1. The weighted path length is the sum of the weights of all the edges in the path.Cycle: A cycle in a directed graph is a path that starts and ends at the same vertex. A graph with no cycles is called an acyclic graph. A directed graph with no cycles is called a directed acyclic graph(DAG).
A graph can be represented by G
where G=(V,E)
. For the graph G
, V
is a set of vertices and E
is a set of edges. Each edge is a tuple (v,w)
where w,v ∈ V
. We can add a third component to the edge tuple to represent a weight. A subgraph s is a set of edges e and vertices v such that e ⊂ E
and v ⊂ V
.
Types of Graphs#
Undirected Graph#
An undirected graph is a graph in which edges have no orientation. The edge (u, v) is identical to the edge (v, u).
Directed Graph (Digraph)#
A directed graph or digraph is a graph in which edges have orientations. For example, the edge (u, v) is the edge from node u to node v.
Weighted Graphs#
Many graphs can have edges that contain a certain weight to represent an arbitrary value such as cost, distance, quantity, etc.
Special Graphs#
Trees#
A tree is an undirected graph with no cycles. Equivalently, it is a connected graph with N nodes and N-1 edges.
Rooted Trees#
A rooted tree is a tree with a designated root node where every edge either points away from or towards the root node. When edges point away from the root the graph is called an arborescence (out-tree) and anti-arborescence (in-tree) otherwise.
Directed Acyclic Graphs (DAGs)#
DAGs are directed graphs with no cycles. These graphs play an important role in representing structures with dependencies. Several efficient algorithms exist to operates on DAGs.
Cool fact: All out-trees are DAGs but not all DAGs are out-trees.
Bipartite Graph#
A bipartite graph is one whose vertices can be split into two independent groups U, V such that every edge connects betweens U and V.
Other definitions exist such as: The graph is two colourable or there is no odd length cycle.
Complete Graphs#
A complete graph is one where there is a unique edge between every pair of nodes. A complete graph with n vertices is denoted as the graph Kn.
Representing Graphs#
Adjacency Matrix#
A adjacency matrix m is a very simple way to represent a graph. The idea is that the cell m[i][j] represents the edge weight of going from node i to node j.
Pros |
Cons |
---|---|
Space efficient for representing dense graphs |
Requires Θ(V²) space |
Edge weight lookup is O(1) |
Iterating over all edges takes Θ(V²) time |
Simplest graph representation |
Adjacency List#
An adjacency list is a way to represent a graph as a map from nodes to lists of edges.
Pros |
Cons |
---|---|
Space efficient for representing sparse graphs |
Less space efficient for denser graphs. |
Iterating over all edges is efficient |
Edge weight lookup is O(E) |
Slightly more complex graph representation |
Edge List#
An edge list is a way to represent a graph simply as an unordered list of edges. Assume the notation for any triplet (u,v,w) means: “the cost from node u to node v is w” This representation is seldomly used because of its lack of structure. However, it is conceptually simple and practical in a handful of algorithms.
Pros |
Cons |
---|---|
Space efficient for representing sparse graphs |
Less space efficient for denser graphs. |
Iterating over all edges is efficient |
Edge weight lookup is O(E) |
Very simple structure |
Graph Abstract Data Type#
The graph abstract data type (ADT) is defined as follows:
Graph()
creates a new, empty graph.add_vertex(vert)
adds an instance of Vertex to the graph.add_edge(fromVert, toVert)
Adds a new, directed edge to the graph that connects two vertices.add_edge(fromVert, toVert, weight)
Adds a new, weighted, directed edge to the graph that connects two vertices.get_vertices()
returns the list of all vertices in the graph.in
returns True for a statement of the form vertex in graph, if the given vertex is in the graph, False otherwise.
Implementing a Graph in Python#
An abstract data type (ADT) when given a physical implementation then we refer to the implementation as a data structure.
In any object-oriented programming language, the implementation of choice for an abstract data type such as a Graph is the creation of a new class. The Graph operations are implemented as methods.
Using dictionaries, it is easy to implement the adjacency list in Python. In our implementation of the Graph abstract data type we will create a classes Graph
, which holds the master list of vertices, and edges.
class Graph:
def __init__(self):
self.vertices = set()
self.num_vertices = 0
self.edges = {}
self.num_edges = 0
def add_vertex(self, v):
if v not in self.vertices:
self.vertices.add(v)
self.num_vertices += 1
def add_edge(self, v, u, w=None):
self.add_vertex(v)
self.add_vertex(u)
if v not in self.edges:
self.edges[v] = {u: w}
else:
self.edges[v][u] = w
self.num_edges += 1
def get_vertices(self):
return self.vertices
def get_edges(self, v):
return self.edges[v].items()
g = Graph()
g.add_edge(0, 1, 5)
g.add_edge(0, 4, 4)
g.add_edge(4, 1, 0)
g.add_edge(4, 3, 1)
g.add_edge(1, 0, 6)
g.add_edge(1, 4, 7)
g.add_edge(1, 3, 8)
g.add_edge(1, 2, 1)
g.add_edge(2, 3, 3)
g.add_edge(3, 4, 2)
for v in g.get_vertices():
print(f"{v}")
for u, w in g.get_edges(v):
print(f" └──> {u}: {w}")
0
└──> 1: 5
└──> 4: 4
1
└──> 0: 6
└──> 4: 7
└──> 3: 8
└──> 2: 1
2
└──> 3: 3
3
└──> 4: 2
4
└──> 1: 0
└──> 3: 1
Practical usage in python#
dict
module provides an implementation of the graph data structure. And it could be used to represent graphs.
graph = {0: [1], 1: [3, 5], 2: [1], 3: [2, 4], 4: [5], 5: [7], 7: [6], 6: [4]}
Breadth First Search#
Breadth first search (BFS. is one of the easiest algorithms for searching a graph. It also serves as a prototype for several other important graph algorithms.
Given a graph G
and a starting vertex s
, a breadth first search proceeds by exploring edges in the graph to find all the vertices in G
for which there is a path from s
. The remarkable thing about a breadth first search is that it finds all the vertices that are a distance k
from s before it finds any vertices that are a distance k+1
.
Breadth First Search Analysis#
Breadth First Search loops one time for each vertex in the graph |V|
, this gives us O(V)
for the outer loop. The inner loop, which is nested is executed at most once for each edge in the graph, |E|
. The reason is that every vertex is dequeued at most once and we examine an edge from node u to node v only when node u is dequeued. This gives us O(E. for the for loop, combining the two loops gives us O(V+E).
from collections import deque
graph = {0: [1], 1: [3], 2: [1], 3: [2, 4], 4: [5], 5: [7], 7: [6], 6: [4]}
def bfs(graph, start):
path = []
seen = set()
queue = deque([start])
while queue:
v = queue.popleft()
if v not in seen:
seen.add(v)
path.append(v)
queue.extend(graph[v])
return path
bfs(graph, 0)
[0, 1, 3, 2, 4, 5, 7, 6]
Applications of Breadth First Search#
Shortest Path and Minimum Spanning Tree for unweighted graph In an unweighted graph, the shortest path is the path with least number of edges. With Breadth First, we always reach a vertex from given source using the minimum number of edges. Also, in case of unweighted graphs, any spanning tree is Minimum Spanning Tree.
Peer to Peer Networks. In Peer to Peer Networks like BitTorrent, Breadth First Search is used to find all neighbor nodes.
Crawlers in Search Engines.
Social Networking Websites: In social networks, we can find people within a given distance ‘k’ from a person using Breadth First Search till ‘k’ levels.
GPS Navigation systems: Breadth First Search is used to find all neighboring locations.
Broadcasting in Network: In networks, a broadcasted packet follows Breadth First Search to reach all nodes.
In Garbage Collection: Breadth First Search is used in copying garbage collection using Cheney’s algorithm. Refer this and for details. Breadth First Search is preferred over Depth First Search because of better locality of reference.
def find_shortest_path(graph, start, end):
dist = {start: [start]}
queue = deque(start)
while queue:
node = queue.popleft()
for neighbor in graph.get(node, []):
if neighbor not in dist:
dist[neighbor] = dist[node] + [neighbor]
queue.append(neighbor)
return dist.get(end)
find_shortest_path(graph, 1, 5)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[3], line 12
9 queue.append(neighbor)
10 return dist.get(end)
---> 12 find_shortest_path(graph, 1, 5)
Cell In[3], line 3, in find_shortest_path(graph, start, end)
1 def find_shortest_path(graph, start, end):
2 dist = {start: [start]}
----> 3 queue = deque(start)
4 while queue:
5 node = queue.popleft()
TypeError: 'int' object is not iterable
Depth First Search#
Given a graph G
and a starting vertex s
, a depth first search proceeds by exploring edges in the graph to create the deepest depth first tree, without any branches all the vertices in G
for which there is a path from s
. It is even possible that a depth first search will create more than one tree. When the depth first search algorithm creates a group of trees we call this a depth first forest.
Depth First Search Analysis#
Depth First Search loops one time for each vertex in the graph |V|
, this gives us O(V)
for the outer loop. The inner loop, which is nested is executed at most once for each edge in the graph, |E|
. The reason is that every vertex is pushed at most once and we examine an edge from node u to node v only when node u is poped. This gives us O(E) for the for loop, combining the two loops gives us O(V+E).
from collections import deque
graph = {0: [1], 1: [3], 2: [1], 3: [2, 4], 4: [5], 5: [7], 7: [6], 6: [4]}
def dfs(graph, start):
path = []
seen = set()
stack = deque([start])
while stack:
v = stack.pop()
if v not in seen:
seen.add(v)
path.append(v)
stack.extend(graph[v])
return path
dfs(graph, 0)
[0, 1, 3, 4, 5, 7, 6, 2]
Applications of Depth First Search#
For a weighted graph, DFS traversal of the graph produces the minimum spanning tree and all pair shortest path tree.
Detecting Cycle: A graph has cycle if and only if we see a back edge during DFS. So we can run DFS for the graph and check for back edges. (See this for details)
Path Finding: DFS algorithm can be modified to find a path between two given vertices u and z.
Call DFS(G, u) with u as the start vertex.
Use a stack S to keep track of the path between the start vertex and the current vertex.
As soon as destination vertex z is encountered, return the path as the contents of the stack
Topological Sorting: used for scheduling jobs from the given dependencies among jobs.
Test if a graph is bipartite We can augment either BFS or DFS when we first discover a new vertex, color it opposited its parents, and for each other edge, check it doesn’t link two vertices of the same color. The first vertex in any connected component can be red or black! See this for details.
Finding Strongly Connected Components of a graph A directed graph is called strongly connected if there is a path from each vertex in the graph to every other vertex. (See this for DFS based algo for finding Strongly Connected Components)
Solving puzzles with only one solution, such as mazes. (DFS can be adapted to find all solutions to a maze by only including nodes on the current path in the visited set.)
Topological Sorting#
Topological sort gives the precise order in which we should do each of the steps (visit the graph nodes) in order to achieve our goal.
A topological sort takes a directed acyclic graph (DAG) and produces a linear ordering of all its vertices such that if the graph G contains an edge (v, u)
then the vertex v
comes before the vertex u
in the ordering. Directed acyclic graphs are used in many applications to indicate the precedence of events. The topological sort is a simple but useful adaptation of a depth first search.
The algorithm for the topological sort is as follows:
Call dfs(g) for some graph g. The main reason we want to call depth first search is to compute the finish times for each of the vertices.
Store the vertices in a list in decreasing order of finish time.
Return the ordered list as the result of the topological sort.
from collections import deque
def topological_sort(graph):
old_graph = graph
graph = {}
order = []
nodes = set()
for node, edges in old_graph.items():
nodes.add(node)
for edge in edges:
neighbour, weight = edge
nodes.add(neighbour)
graph[neighbour] = graph.get(neighbour, set()) | set([node])
nodes_without_prerequisites = nodes - set(graph.keys())
while nodes_without_prerequisites:
current_node = nodes_without_prerequisites.pop()
if current_node in graph:
del graph[current_node]
for node, edges in graph.items():
if current_node in edges:
edges.remove(current_node)
if not edges:
nodes_without_prerequisites.add(node)
if len(nodes_without_prerequisites) == 0 and len(graph) > 0:
return []
order.append(current_node)
return order
graph = {
'A': [('B', 4), ('C', 2)],
'B': [('C', 8), ('D', 4)],
'C': [('D', 3)],
'D': [('E', 2), ('F', 1), ('G', 1)],
'E': [('F', 11)],
'F': [('I', 2)],
'G': [('H', 4)],
'H': [('I', 4)],
}
topological_sort(graph)
['A', 'B', 'C', 'D', 'E', 'G', 'F', 'H', 'I']
Strongly Connected Components#
Strongly Connected Components (SCCs) can be thought of as self-contained cycles within a directed graph where every vertex in a given cycle can reach every other vertex in the same cycle.
Once again we will see that we can create a very powerful and efficient algorithm by making use of a depth first search. Before we tackle the main SCC algorithm we must look at one other definition. The transposition of a graph G
is defined as the graph GT
where all the edges in the graph have been reversed. That is, if there is a directed edge from node A
to node B
in the original graph then GT
will contain and edge from node B
to node A
.
Algorithm to compute the strongly connected components for a graph.
Call dfs for the graph
G
to compute the finish times for each vertex.Compute GT.
Call dfs for the graph
GT
but in the main loop of DFS explore each vertex in decreasing order of finish time.Each tree in the forest computed in step 3 is a strongly connected component. Output the vertex ids for each vertex in each tree in the forest to identify the component.
Shortest Path Problems#
Given a weighted graph, find the shortest path of edges from node A to node B.
from collections import deque
def find_shortest_path(graph, start, end):
dist = {start: [start]}
queue = deque(start)
while queue:
node = queue.popleft()
for edge in graph.get(node, []):
neighbour, weight = edge
if neighbour not in dist:
dist[neighbour] = dist[node] + [neighbour]
queue.append(neighbour)
return dist.get(end)
graph = {
'A': [('B', 4), ('C', 2)],
'B': [('C', 8), ('D', 4)],
'C': [('D', 3)],
'D': [('E', 2), ('F', 1), ('G', 1)],
'E': [('F', 11)],
'F': [('I', 2)],
'G': [('H', 4)],
'H': [('I', 4)],
}
find_shortest_path(graph, 'A', 'I')
['A', 'B', 'D', 'F', 'I']
Dijkstra’s Algorithm#
from heapq import heappush, heappop
def dijkstras(graph, start):
output = {node: -1 for node in graph.keys()}
visited = set()
priority_queue = [(0, start)]
while priority_queue:
distance, node = heappop(priority_queue)
if node in visited:
continue
visited.add(node)
output[node] = distance
for neighbour, neighbour_distance in graph.get(node, []):
if neighbour not in visited:
heappush(priority_queue, (distance +
neighbour_distance, neighbour))
return output
graph = graph = {
'A': [('B', 4), ('C', 2)],
'B': [('C', 8), ('D', 4)],
'C': [('D', 3)],
'D': [('E', 2), ('F', 1), ('G', 1)],
'E': [('F', 11)],
'F': [('I', 2)],
'G': [('H', 4)],
'H': [('I', 4)],
}
dijkstras(graph, 'A')
{'A': 0, 'B': 4, 'C': 2, 'D': 5, 'E': 7, 'F': 6, 'G': 6, 'H': 10, 'I': 8}
A* (A Star) Algorithm#
Minimum Spanning Tree (MST)#
A minimum spanning tree (MST) is a subset of the edges of a connected, edge-weighted graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight.