Saturday, June 6, 2015

Primitive Data Types in R

Data Types are fundamental building blocks that we need to learn before we actually get into data manipulation. So  let's start with these.

Note: I often see R tutorial websites mixing R's primitive data types and R objects and listing the two together. It is like saying that Table is a data type in SQL. This can create confusion. So, let's keep the two separate.


Primitive Data Types


R has the following main primitive data types:
Character, Logical, Integer, Double, Complex, and Date


Mapping SQL Data Types to R Data Types

This is how SQL data types map to the R data types

Char, Varchar : Character
Binary, Varbinary : No primitive type but can be stored as raw vectors.
Boolean : Logical
SmallInt, Integer: Integer
Float, Real, Double, BigInt, Decimal, Numeric : Double
Date, Time : Date


NULL

In SQL, NULL represents a missing data. However, in R, there are two related concepts: NA and NULL. NA represents a missing value, a placeholder for something that exists but is unknown. However, NULL is something that does not exist. The distinction is more nuanced, but for now, just keep in mind that NULL comes in two forms.

is.na() and is.null() functions tell whether a value is NA or NULL.

NaN and Inf

Other interesting values in R are NaN and Inf. NaN represents undefined number, such as 0/0. Inf represents values such as 1/0.

CAST and CONVERT

Often, if you load the data into R from an external source, you may find that the data types are not what you expected it to be. It can be frustrating because R will do some implicit conversions and results may be different from what you expect.

The following R functions are helpful in such situations:

class(x): Tells the class of 'x'
as.X(y): Converts the data type of y to X. X can be character, numeric, logical etc.




No comments:

Post a Comment