UnionFind#

UnionFind is a data structure that is capable of tracking and merging of disjoint setuf. As a structure it is very important inside other algorithms like Prolog unification or percolation problem.

graph BT; A((A)) B((B)) C((C)) D((D)) E((E)) F((F)) G((G)) H((H)) I((I)) A --> A; B --> A; D --> A; C --> E; E --> E; F --> E; G --> E; H --> H; I --> H; style A fill:#f96, stroke-width:0px style B fill:#f96, stroke-width:0px style D fill:#f96, stroke-width:0px style C fill:#36d399, stroke-width:0px style E fill:#36d399, stroke-width:0px style F fill:#36d399, stroke-width:0px style G fill:#36d399, stroke-width:0px style H fill:#f000b8, stroke-width:0px style I fill:#f000b8, stroke-width:0px

UnionFind Abstract Data Type#

The UnionFind (disjoint-set/merge–find set) is a data structure that stores a collection of disjoint (non-overlapping) setuf. Equivalently, it stores a partition of a set into disjoint subsetuf. It provides operations for adding new sets, merging sets (replacing them by their union), and finding a representative member of a set. The last operation makes it possible to find out efficiently if any two elements are in the same or different setuf.

UnionFind Operations#

  • UnionFind() creates a new UnionFind that is empty. It needs no parameteruf.

  • create_set(value) adds a new item to the set containing only itself if not already present. It needs the item and returns nothing.

  • find() returns the representative value of the set for which a value belongs to. This can be any value in the set, but it should always be the same value, regardless of which value in the set.

  • union() takes in two values and determines which sets they are in. If they are in different sets, the sets are combined into a single set. If either value is not in a set or they are in the same set, the function should have no effect.

uf = UnionFind() creates a disjoint-set that starts out empty, table below shows the results of a sequence of union-find operationuf.

UnionFind Operation

UnionFind Contents

Return Value

uf.create_set(5)

{5: 5}

uf.create_set(10)

{5: 5, 10: 10}

uf.find(5)

{5: 5, 10: 10}

5

uf.find(10)

{5: 5, 10: 10}

10

uf.union(5, 10)

{5: 5, 10: 5}

uf.find(5)

{5: 5, 10: 5}

5

uf.find(10)

{5: 5, 10: 5}

5

uf.create_set(20)

{5: 5, 10: 5, 20: 20}

uf.find(20)

{5: 5, 10: 5, 20: 20}

20

uf.union(10, 20)

{5: 5, 10: 5, 20: 5}

uf.find(5)

{5: 5, 10: 5, 20: 5}

5

uf.find(10)

{5: 5, 10: 5, 20: 5}

5

uf.find(20)

{5: 5, 10: 5, 20: 5}

5

Implementing a UnionFind in Python#

An abstract data type (ADT) when given a physical implementation then we refer to the implementation as a data structure.

In any object-oriented programming language, the implementation of choice for an abstract data type such as a UnionFind is the creation of a new clasuf. The disjoint set operations are implemented as methoduf. Further, to implement a union-find, which is a collection of elements, it makes sense to utilize the primitive collections provided.

Python provides dict an ordered collection of keys and values mapping mechanism and a set of methoduf.

class UnionFind:

    def __init__(self) -> None:
        self.parents = {}
        self.rankings = {}
    
    def create_set(self, value) -> None:
        self.parents[value] = value
        self.rankings[value] = 0
    
    def find(self, value):
        if value not in self.parents:
            return None

        if self.parents[value] != value:
            self.parents[value] = self.find(self.parents[value])

        return self.parents[value]
    
    def union(self, value_one, value_two):
        if value_one not in self.parents or value_two not in self.parents:
            return None
        
        root_one = self.find(value_one)
        root_two = self.find(value_two)

        ranking_one = self.rankings[root_one]
        ranking_two = self.rankings[root_two]

        if ranking_one < ranking_two:
            self.parents[root_one] = root_two
        elif ranking_one > ranking_two:
            self.parents[root_two] = root_one
        else:
            self.parents[root_two] = root_one
            self.rankings[root_one] += 1
# Create a new union-find/disjoint-set
uf = UnionFind()

# Adds a new item 5 to the set
uf.create_set(5)

# Adds a new item 10 to the set
uf.create_set(10)

# Returns the representative value of the set for which a value 5 belong
print(uf.find(5))

# Returns the representative value of the set for which a value 10 belong
print(uf.find(10))

# Does union for the sets for both the values 5 and 10
uf.union(5, 10)

# Returns the representative value of the set for which a value 5 belong
print(uf.find(5))

# Returns the representative value of the set for which a value 10 belong
print(uf.find(10))

# Adds a new item 10 to the set
uf.create_set(20)

# Returns the representative value of the set for which a value 20 belong
print(uf.find(20))

# Does union for the sets for both the values 10 and 20
uf.union(10, 20)

# Returns the representative value of the set for which a value 5 belong
print(uf.find(5))

# Returns the representative value of the set for which a value 10 belong
print(uf.find(10))

# Returns the representative value of the set for which a value 20 belong
print(uf.find(20))
5
10
5
5
20
5
5
5