Sunday, June 14, 2015

INSERT data into Data Frames

More often than not, we read data into a Data Frame from external sources such as a database, a csv file, or even a compressed csv file. R has ways and libraries to make the reading easy. I intend to write separately on data exchange between R and a SQL database. In this post, I would focus on common ways of populating a Data Frame.

Reading data from a CSV File

Here is how we can read the data from a csv file:

Customers <- read.csv("Customers.csv")

What we get is a Data Frame named Customers.

read.csv can read not only using a file path but also using a URL from internet. It can even read few types of compressed files and internally uncompress it before reading. It takes parameters for column names, headers etc. One should read the help file for the details.

Populating a Data Frame

If we want to populate the data frame computationally and not read from an external data source, there are a few ways. We can use rbind(), or row index based population.

Inserting Row using Row Index

We can explicitly specify the row number to populate at in a Data Frame. If there is an existing row at that index, the values would be replaced. If there is no existing row at that index, a new row would be created.

This approach requires knowing the index of the new row to be inserted. This is how we can insert a new row at the end of the Data Frame in a generic manner:

Customers[nrow(Customers)+1,]<-c(4,'YFitness',123.435,'North',TRUE,'Issaquah')

nrow() gives the number of rows in a Data Frame.

We can also use this approach to insert a new row at a particular location in the Data Frame. Let's say we want to insert a new row at the 3rd position in the Customers Data Frame. This is done in two steps. In the first step, we move rows 3:end to 4:(end+1). In the second step, we insert our new row at 3rd position.

Customers[seq(4,nrow(Customers)+1),] <- Customers[seq(3,nrow(Customers)),]
Customers[3,] <- c(3,'HFitness',123.435,'North',TRUE,'Issaquah')

Inserting Data using rbind()

This is useful when we have few similar Data Frames and we want to create a UNION of the those.

Customers<-rbind(Customers, 
                               OldCustomers, 
                               c(4,'YFitness',123.435,'North',TRUE,'Issaquah'))

In the above line, we are doing a Union of Customers Data Frame, OldCustomers Data Frame, and a vector that is created on the fly; and assigning the result to the Customers Data Frame. The rows are added in the order in which the appear in rbind.

rbind() can also be used for inserting a new record at a specified location. Let's say we want to insert a new row at the 3rd position in the Customers Data Frame. We create two subsets of the Customers Data Frame at the position of interest, and then create a UNION of these two Data Frames along with the new record in the middle.

Customers <- rbind(Customers[1:2,],
                                c(3,'HFitness',123.435,'North',TRUE,'Issaquah'),
                                Customers[-(1:2),])


No comments:

Post a Comment