Introduction to R

What is R ?

  • R is a powerful statistical programming language that allows scientists to perform statistical computing and visualization.
  • R is based on a well developed programming language (“S” – which was developed by John Chambers at Bell Labs) thus contains all essential elements of a computer programming language such as conditionals, loops, and user defined functions.
  • R is open source and available to the community at no charge. This has made it much more accessible and encouraged contribution from other developers.
  • R has the ability to store large datasets and efficiently query them. Additionally it can also query other database management systems such as MySQL.
  • R has made it very easy to make plots quickly. For example creating a histogram is created by using simply one command hist()

Main websites to learn more about R project

R-project

http://www.r-project.org – Main destination to find everything you want to know about R including links to tutorials and learning about how you can contribute.

CRAN – Comprehensive R Archive Network

https://cran.r-project.org/ – Place to download R and other packages.

Installing R

There are many tutorials on the web that will help you install R. Additionally for this course we will also be using R Studio as the IDE to help us manage our R data and code.

You can check out my video

or simply go to the following sites. Please install R first and then RStudio.

Using R

Using R on command line

Requires knowledge of command-line operations. Will use less computer resources because window system not required.

The Window System (Rgui)

The most convenient way to use R is at a graphics workstation running a windowing system. R uses the X window system bundled with R software.

RStudio (www.rstudio.org)

https://www.rstudio.org/

Much more powerful and user-friendly interface compared to Rgui. Requires that R be already installed on the system.

PLEASE DOWNLOAD THE DESKTOP VERSION

R Console Prompt

R is used by typing in a list of commands Commands are entered after the prompt > After you type a command and its arguments, simply press the Return Key Separate commands using ; or newline (enter)

R session

Default Workspace

Workspace contains the different R objects only (not the code) The name of the default workspace is saved as .Rdata To load .RData, set the directory where .RData is located as current directory and then select to “load Default workspace”

Working Directory

It is a good idea to have separate workspace and history for different projects saved in different directories(folders)

R editor and Scripts

A history of your commands is saved and it can be accessed by using the up and down keys. Your history is saved as .Rhistory in your working directory. It is a good idea to save your successful commands in a separate file because your history will also contain your mistakes. Open editor by selecting “New Script” from the File Menu. Similar to Notepad, it will allow you to type and save code as text. MS Word is not a good choice for this because when you paste it can insert funny characters. You can execute an entire R script by using the “Source R code” using source() function.

Familiar commands that work in R

Ctrl-c : copy

Ctrl-v : paste

Ctrl-1 : clear the console

Esc : stop

Packages

Packages such as Bioconductor are available on CRAN. They contain specialized functions and data that can be used for your analysis.

R object

Container for a piece of data or lines of code Objects can be named so they can be accessed at any point. Three ways to assign data to a named object:

R functions

Functions contain lines of pre-written code that performs some task. Gather information about R environment Change properties of an environment Perform task on one or more data structures Below is an example of the function sum()

Basic Syntax

In order to see the contents of an object you can simply type the name of the object. If you type a word that is not an object you will get an error. Names of objects are case sensitive so “Print” is not the same as “print”

You can add comment to your code without it being computed by preceding it with #.

In a case when not all the code can fit in one line, or you want to make the command more readable, you can press “Return” and R will simply start the prompt with +

Getting Help with functions and features

R objects: Modes

The type of the components.

Numeric : numbers

Complex : complex numbers

Logical : True/False

Character: Alphanumeric values

Raw : Bytes

The class of an object

Class of a vector is the same as a mode Other classes: “matrix”, “array”, “factor” and “data.frame” These classes help R act like an object-oriented language. Plot function for an object of class matrix is different then plot for numeric vector. Can use “unclass()” if you do not want to treat the object as its a class

R objects: Vectors

Most basic data structure Sequence of data that can be numbers, characters and also logical. Scalar is a vector of length 1. Vector of more than one element can be created using c() function. Elements of vector must be same type and mode. Characters must be enclosed in either single or double quotes Missing data can be represented as NA

R objects: Character Vectors

Denoted by double quotes “x-values”, “New iteration results”

R object: Logical Vectors

A vector with three possible values:

TRUE, FALSE, NA (Not available)

Are generated by conditions.

List of Logical operators

<, <=, >, >=, == for exact equality, != for inequality, & (and), | (or)

Vector examples

R objects: Factors

A type of character vector that has its elements defined by groups Use factor() on a character vector to create a factor

R objects: Arrays and Matrices

Array: A multiply subscripted collection of data entries Matrix: is a two-dimensional array. Matrix can be created by using the matrix() function or the array() function. The first argument for both functions is a data vector. Matrix then requires nrow and ncol arguments where as array requires a vector defining the dim property of the array The dim() function can be used to convert a vector to a matrix.

class(mat)

Different ways to index

R objects: Creating matrices using cbind() and rbind()

Arguments to cbind() must be either vectors of any length, or matrices with the same column size, that is the same number of rows. For vectors that are shorter than the matrix, the values are cyclically added to the matrix

R objects: Data Frames

A drawback to matrices is that all the values have to be the same mode. A dataframe is composed of vectors of the same length but can be of different modes. This makes it perfect structure for mixed-type biomedical data Header of the dataframe can be obtained/set using names() function. Specific columns can be accessed using the $ or traditional way for matrix. Dataframe$column Dataframe[,1]

Row labels can be modified using the rownames() function and similarly column labels can be modified using colnames() function

R objects: Lists

List is a collection of objects. It can contain vectors, matrices, and dataframes of different lengths. Great way to collate different information

Some useful commands

The mode() and typeof() functions provide mode and type of the object. The attributes() function provides useful information such as dimensions and names. The as() function can be used to coerce one object type to another. sample() – Get a random sample of numbers order() – Returns a numeric vector of the element position in ascending order sort() – Returns the values in ascending order paste() – Create a character vector by concatenating two other vectors print() – Prints content of an object to screen range() – Returns minimum and maximum value of a vector t() – Transpose a matrix or dataframe