Disjoint Set/ Union Find (Union by Rank + Path Compression)

Note:

**Must practice the problems mentioned at the end of this post to fully grasp this algorithm.

There can be many ways one can write the Union() and the Find() functions. I have written two approaches to do the Union functions here. Both of them are the most optimized. Both of them are based on the Quick Union algorithm implemented with the two optimizing techniques called Union by Rank and Path Compression. But the difference is just how I am doing the Union by Rank, that is how I am updating the ranking vector inside the Union() function.

In the first oneUnion_ranking_by_total_connected_nodes(), the ranking vector is updated each time with the ranking of the other root node it is being connected to. That is, at the end of calling the Union() function for each pair of nodes to be connected, the ranking vector will contain the total number of connected nodes to each node.

In the second one Union_ranking_by_height(), I use the least possible value in the ranking[] vector. That is, I increase the ranking of the root when both rankings are equal. Otherwise, I just don’t increase the ranking since we are already pushing the lower-valued ranking of a node to a higher-valued one. That is the higher-valued one would remain higher-valued or will be equal to the other now. But in the case of comparison, the comparison result will remain the same. In summary, the ranking vector here contains the “height” of each node.

You can use any one of them. I prefer the first one as that can be quite handy while solving many related problems.

Code:

#include <bits/stdc++.h>

using namespace std;

int Find(int x, vector<int> &root) //O(alpha(N)) //O(1)
{
    if(root[x]==x) return x;
    return root[x]=Find(root[x],root); //Path compression
}

/// Use any one of the following two - maybe rename as Union() 

//ranking_by_total_connected_nodes
void Union_ranking_by_total_connected_nodes(int x, int y, vector<int> &root, vector<int> &ranking) //O(alpha(N)) //O(1) 
{
    int rootX=Find(x, root);
    int rootY=Find(y, root);
    if(rootX!=rootY) //Union by Rank
    {
        if(ranking[rootX]>ranking[rootY]) root[rootY]=rootX;
        else if(ranking[rootY]>ranking[rootX]) root[rootX]=rootY;
        else
        {
            root[rootY]=rootX;
            ranking[rootX]++;
        }
    }
}

//ranking_by_height
void Union_ranking_by_height(int x, int y, vector<int> &root, vector<int> &ranking) //O(alpha(N)) //O(1)
{
    int rootX=Find(x, root);
    int rootY=Find(y, root);
    if(rootX!=rootY) //Union by Rank
    {
        if(ranking[rootX]>ranking[rootY]) root[rootY]=rootX;
        else if(ranking[rootY]>ranking[rootX]) root[rootX]=rootY;
        else
        {
            root[rootY]=rootX;
            ranking[rootX]++;
        }
    }
}

bool unionFind(int x, int y, vector<int> &root)
{
    return Find(x, root)==Find(y, root);
}

int main()
{
    int n,m;
    cout<<"Enter the number of nodes: ";
    cin>>n;
    //Initially all nodes are independent
    vector<int> root(n+1), ranking(n+1);//will be (n) if indexing of nodes starts from 0
    for(int i=1; i<=n; i++)//will be (i=0; i<n;..) if indexing of nodes starts from 0
    {
        root[i]=i;
        ranking[i]=1;
    }

    cout<<"Enter the number of connections: "<<endl;
    cin>>m;
    cout<<"Enter the connections i.e. two nodes wo are connected"<<endl;
    for(int i=0; i<m; i++)
    {
        int x, y;
        cin>>x>>y;
        Union_ranking_by_total_connected_nodes(x, y, root, ranking);
        /*
        cout<<"####Root vector#####"<<endl;
        for(int i=1; i<=n; i++)
        {
            cout<<i<<" "<<root[i]<<endl;
        }
        */
    }

    cout<<"#######################"<<endl;
    int ans=0;
    for(int i=1; i<=n; i++)//will be (i=0; i<n;..) if indexing of nodes starts from 0
    {
        if(root[i]==i) ans++;
    }
    cout<<"There are "<<ans<<" different connected components"<<endl;

    cout<<"#######################"<<endl;
    int q;
    cout<<"Enter the number of queries: "<<endl;
    cin>>q;
    cout<<"Enter the nodes whose connections you want to check: "<<endl;
    for(int i=0; i<q; i++)
    {
        int x, y;
        cin>>x>>y;
        bool ans=unionFind(x, y, root);
        if(ans==1) cout<<"Connected!"<<endl;
        else cout<<"Not connected"<<endl;
    }
    return 0;
}
// Input: 1 2-3-4-5-8-9-10  6 7
/*
10
6
2 3
3 4
3 5
8 9
9 10
5 10
*/
//Output
/*
#######################
There are 4 different connected components
#######################
Enter the number of queries:
...
*/




**One important thing to observe here is – after connecting all the connected nodes using the Union function, the root vector does not contain all the actual root nodes of the other nodes! That is most of the nodes are having the actual root (as ensured by the Find() function that is called at the beginning of the Union() ), but some node’s actual roots are not updated yet, instead, in the root vector, they are containing their parent node may be.

The reason is: Remember that at the beginning of the Union function, we are updating root nodes while calling the Find function for each of the two nodes. But then we are leaving the Union function by just updating the parent nodes’ root of Y (root[rootY]=rootX). We are not doing root[y]=rootX (Please see the code below for the reference of x, y, rootX, rootY).

So after using the function Union to connect all the connected nodes, to get any nodes root node, say for a node called z, we’ll have to use the Find function, or we can just check the root vector and go through it until we find the root node using this – while(z!=root[z]) {z=root[z]; return z;}. Now z is the root node.

Time Complexity:

Each module – Union, Fnd, and UnionFind are O(α(N)). Here α is the Inverse Ackermann function. In practice, we assume it’s a constant. In other words, O(α(N)) is regarded as O(1) on average.

So the overall time complexity is O(N) if those are called N times.

Related Problems to solve (Sorted according to the INCREASING difficulty level):

Must do problems 1, 5 & 8 to understand the algorithm fully. These 3 problems cover every aspect of the Union-Find algorithm. Other problems are mainly about manipulating the input to feed to the basic Union-Find algorithm.

  1. UVa 10583 – Ubiquitous Religions
  2. UVa 11966 – Galactic Bonding
  3. UVa 459 – Graph Connectivity
  4. UVa 793 – Network Connections
  5. UVa 10608 – Friends
  6. UVa – 10685 – Nature
  7. UVa – 11690 – Money Matters
  8. UVa 11503 – Virtual Friends
  9. UVa 10178 – Count the Faces
  10. UVa 10227 – Forests
  11. UVa 10158 – War
  12. UVa 10507 – Waking Up Brain

2 thoughts on “Disjoint Set/ Union Find (Union by Rank + Path Compression)

Leave a comment