Sunday, June 21, 2015

UNION and UNION ALL



The commonly used function to perform UNION ALL operation in R is rbind().

Assuming we have Customers and ArchivedCustomers as two Data Frame, we can call the following:

rbind(Customers, ArchivedCustomers)

However, a few things need to be in place for the above to work properly. Both Customers as well as Archived Customers must have the same number and names of columns. The column order can be different in the two Data Frames, but the names need to be the same. This behavior can be useful, but is not what a SQL minded person may expect. So, be careful.

Using rbind() when numbers of columns are different

If the number of columns is different, either drop the unwanted columns by subsetting the Data Frame or add null columns with matching names to the other data frame.

Using rbind() when column names are different

If the column names are different and we want to UNION ALL based on the column order, we can use the following:

rbind(Customers, 
     setNames(ArchivedCustomers, 
          names(Customers)
     )
)

Here, we are copying the column names from the Customers Data Frame to the ArchivedCustomers Data Frame.

Performing UNION

Remember, rbind() does UNION ALL and not UNION. We can use the unique() function in addition to rbind in order to achieve the result similar to UNION in SQL


unique(
     rbind(Customers, 
          setNames(ArchivedCustomers, 
               names(Customers)
          )
     )
)

No comments:

Post a Comment