R (released by Robert and Ross) is great language for statistical analysis and graphical display. Its inspired from language S.
Its ideal to interact, interpret, analyse data(e.g ANOVA, regression), data manipulation, modelling, chart making
R is rich like python, in its libraries. Its strength is huge library developed by users
Bioconductor project has generated many packages for R
Other related languages are SPSS, Stata, SAS
Environments: TStudio (Linux), RGui (Windows)
Using RGui for Windows
Step 1. Install R (from any mirror site)
Step 2. Starting R (get the shortcut button on the desktop)
Step 3. Tutorials (at Quick R project, try Bioconductor
Its ideal to interact, interpret, analyse data(e.g ANOVA, regression), data manipulation, modelling, chart making
R is rich like python, in its libraries. Its strength is huge library developed by users
Bioconductor project has generated many packages for R
Other related languages are SPSS, Stata, SAS
Environments: TStudio (Linux), RGui (Windows)
Using RGui for Windows
Step 1. Install R (from any mirror site)
Step 2. Starting R (get the shortcut button on the desktop)
Step 3. Tutorials (at Quick R project, try Bioconductor
Generate the script with R or r extension (e.g. script.R, script.r)
Load the script----when the script opens in the editor, select the content and run by right click and selecting 'run line or selection'...output should pop up as result or graph in the console
Load the script----when the script opens in the editor, select the content and run by right click and selecting 'run line or selection'...output should pop up as result or graph in the console
Save the graph
-----------------------------------------------------------
R is vector-based language.
Vectors are used to store measurements which can be numeric, character, logical vectors.
-----------------------------------------------------------
#R is rich in packages.
If a package is not part of original CRAN, it needs to be installed through any Mirror.
e.g. To install the package ggplot2, the following lines will be used. If download mirror site is not set , it needs to be mentioned.
-----------------------------------------------------------
R is vector-based language.
Vectors are used to store measurements which can be numeric, character, logical vectors.
-----------------------------------------------------------
#R is rich in packages.
R packages
base, boot, class, cluster, codetools, compiler, datasets, foreign, graphics, grDevices, grid, KernSmooth, lattice, MASS, Matrix, methods, mgcv, nlme, nnet, parallel, rpart, spatial, splines, stats, stats4, survival, tcltk, toolsIf a package is not part of original CRAN, it needs to be installed through any Mirror.
e.g. To install the package ggplot2, the following lines will be used. If download mirror site is not set , it needs to be mentioned.
install.packages("ggplot2")
library("ggplot2")
#Other important packages are data.table, plyr (to split big data)
CRAN (Comprehensive R Archive Network) is the R package repository. It has above 7,000 packages
Many package are already installed in R Gui, which can be just loaded
Some are available through the Mirrors, those can be installed without going to CRAN.
Some are available through the Mirrors, those can be installed without going to CRAN.
However, to install packages to R Gui, Go to CRAN
Suppose, stringr package is to be installed
Download it , install it from local zip file
#Important R functions
table (), head() ,rownames(), colnames(), nrow(), ncol(), by(), with()
rowSums(), rowMeans(), summary()
#Use of table ()
-----------------------------------------------------------
Useful links
http://rseek.org/
http://www.r-tutor.com/
-----------------------------------------------------------
#Environment
help
# Get all variables
ls()
# Checking the type of variable
typeof(x)
# Get working directory
getwd()
#Calculations
1 + 2
2+3+sqrt(4)
2^3
10%%3
10%/%3
pi
# Converting float to integer
floor
ceiling
trunc
round
# Mathematical functions
abs
sqrt
# Logarithmic functions
exp
log
log10
#Assigning value 1 to x
x = 1
x
class(x)
x = 10.5
x
class(x)
#Function see combining 3 numbers into a vector
c(1, 2, 3)
#Numerics, integer, complex, logical,
x = 5
x
class(x)
is.integer(x)
y = as.integer(5)
y
class(y)
is.integer(y)
as.integer(5.7)
as.integer("6.8")
as.integer("Autumn")
as.integer(TRUE)
as.integer(FALSE)
z = 5 + 3i
z
class(z)
sqrt(−3)
sqrt(−4+0i)
sqrt(as.complex(−7))
x = 8; y = 5
z = x > y
z
class(z)
-----------------------------------------------------------
Matrix (2D)
List (slicing, member reference)
Data frames
b = c("ef", "gb", "hj")
c = c(ed, rt, yu)
df = data.frame(a, b, c)
mtcars
#Find value at fisrt row, second column
mtcars[1, 2]
head(mtcars)
nrow(mtcars)
ncol(mtcars)
help(mtcars)
R can be used for NGS, microarary, ChipSeq data analysis
-----------------------------------------------------------
#To clean the terminal
Ctrl L
# print the current working directory
getwd()
# list files and folders in the current directory
dir()
# list the objects in the current workspace
ls()
# change to specified directory, suppose to Desktop
setwd("C:/Users/Seema/Desktop")
# list all R libraries
library()
#To find help page for a function
?data.frame
#To find functions with the a particular word
??list
#To quit console
q()
----------------------------------------------------------------------------
#Assignment with symbol <-
x<- 5
x %% y
x == 4
#Operators
x<-c(1:5)
x
x>4
logical1<- x>4
logical2<- x<3
logical1
logical2
logical1 | logical2
logical1 & logical2
logical1 && logical2
x[logical1]
x[logical2]
----------------------------------------------------------------------------
The link below is a great online tool for R practice
http://www.tutorialspoint.com/r_terminal_online.php
Execute the script by the command
source ('script.R')
----------------------------------------------------------------------------#Important R functions
table (), head() ,rownames(), colnames(), nrow(), ncol(), by(), with()
rowSums(), rowMeans(), summary()
#Use of table ()
-----------------------------------------------------------
Useful links
http://rseek.org/
http://www.r-tutor.com/
-----------------------------------------------------------
#Environment
help
# Get all variables
ls()
# Checking the type of variable
typeof(x)
# Get working directory
getwd()
#Calculations
1 + 2
2+3+sqrt(4)
2^3
10%%3
10%/%3
pi
# Converting float to integer
floor
ceiling
trunc
round
# Mathematical functions
abs
sqrt
# Logarithmic functions
exp
log
log10
#Assigning value 1 to x
x = 1
x
class(x)
x = 10.5
x
class(x)
#Function see combining 3 numbers into a vector
c(1, 2, 3)
#Numerics, integer, complex, logical,
x = 5
x
class(x)
is.integer(x)
y = as.integer(5)
y
class(y)
is.integer(y)
as.integer(5.7)
as.integer("6.8")
as.integer("Autumn")
as.integer(TRUE)
as.integer(FALSE)
z = 5 + 3i
z
class(z)
sqrt(−3)
sqrt(−4+0i)
sqrt(as.complex(−7))
x = 8; y = 5
z = x > y
z
class(z)
#Logical operations are "&" (and), "|" (or), and "!" (negation)
a = TRUE; b = FALSE
a & b
a | b
!a
!b
#Numeric to string conversion
x = as.character(4.7)
x
class(x)
#Concatenation
fname = "Carl"; lname ="Jones"
paste(fname, lname)
#String manipulation (printing, sub-string extraction, substitution)
sprintf("%s is %d year old", "Mila", 15)
substr("Autumn is my favorite season", start=1, stop=6)
sub("Banana", "Apple", "Banana is my favorite fruit")
-----------------------------------------------------------
Vector (1D)
c(3,7,8)
c(TRUE, TRUE, FALSE)
c("rt", "vg", "em")
length(c(3,7,8))
a = c(1,8,5)
b = c("spring", "autumn", "winter")
c(a,b)
a=c("AA","V")
a[0]
a[-1]
a[-2]
a[-3]
a=c("AA","V")
a[0]
a[-1]
a[-2]
a[-3]
#Vector arithmetic
a = c(2,4)
b = c(3,7)
3 * a
a + b
a - b
a * b
a / b
#Recycling of smaller vector
a = c(2,3,8)
b = c(4,5,6,9)
a + b
#Vector index
a = c("grapes", "apple", "cherry")
a[2]
a[c(1, 2)]
a[c(1, 1)]
a[c(3,1,2)]
a[-2]
a[5]
a[1:3]
a[1:2]
L = c(FALSE, TRUE, FALSE)
a[L]
L = c(TRUE, TRUE, FALSE)
a[L]
a = c("Alice", "Mulan")
a
names(a) = c("First", "Last")
a
a["First"]
a[c("Last", "First")]
Creating vectors
x=c(1, 2, 3)
> x
Functions to generate vectors
> x=1:10
> x
[1] 1 2 3 4 5 6 7 8 9 10
> x=seq(10)
> x
[1] 1 2 3 4 5 6 7 8 9 10
> x=seq(1,20,2)
> x
[1] 1 3 5 7 9 11 13 15 17 19
> x=rep(10,5)
> x
[1] 10 10 10 10 10
Summary Statistics
> x <-c(1,3,5,7,9,2,4,6,8,-1)
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.00 2.25 4.50 4.40 6.75 9.00
Statistical functions
> x <-c(1,3,5,7,9,2,4,6,8,-1)
> length(x)
[1] 10
> min(x)
[1] -1
> max(x)
[1] 9
> sort(x)
[1] -1 1 2 3 4 5 6 7 8 9
> median(x)
[1] 4.5
> mean(x)
[1] 4.4
> sd(x)
[1] 3.204164
> var(x)
[1] 10.26667
> quantile(x,0.25)
25%
2.25
> quantile(x,0.75)
75%
6.75
Creating vectors
x=c(1, 2, 3)
> x
Functions to generate vectors
> x=1:10
> x
[1] 1 2 3 4 5 6 7 8 9 10
> x=seq(10)
> x
[1] 1 2 3 4 5 6 7 8 9 10
> x=seq(1,20,2)
> x
[1] 1 3 5 7 9 11 13 15 17 19
> x=rep(10,5)
> x
[1] 10 10 10 10 10
Summary Statistics
> x <-c(1,3,5,7,9,2,4,6,8,-1)
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.00 2.25 4.50 4.40 6.75 9.00
Statistical functions
> x <-c(1,3,5,7,9,2,4,6,8,-1)
> length(x)
[1] 10
> min(x)
[1] -1
> max(x)
[1] 9
> sort(x)
[1] -1 1 2 3 4 5 6 7 8 9
> median(x)
[1] 4.5
> mean(x)
[1] 4.4
> sd(x)
[1] 3.204164
> var(x)
[1] 10.26667
> quantile(x,0.25)
25%
2.25
> quantile(x,0.75)
75%
6.75
Matrix (2D)
a = matrix(
c(6,8,5,4),
nrow=2,
ncol=2,
byrow = TRUE)
#a = matrix(c(6,8,5,4), nrow=2, ncol=2, byrow = TRUE)
a
#prints element at row 2 and column 1
a[2, 1]
#prints row 2
a[2, ]
#prints column 2
a[ ,2]
#prints column 1 and 2
a[ ,c(1,2)]
a = matrix(
c(6,8,5,4),
nrow=2,
ncol=2,
byrow = TRUE)
dimnames(a) = list(
c("row1", "row2")
("col1", "col2"))
a
#matrix construction
a = matrix(
c(3,7,5,8,9,2),
nrow=3,
ncol=2)
a
#Transposition makes rows into columns and vive-versa
t(a)
#Another matrix
b = matrix(
c(2,7,3,9,5,8),
nrow=3,
ncol=2)
b
#Another matrix
d = matrix(
c(8,9,7),
nrow=3,
ncol=1)
d
#Combining matrix a and b. For it, row and column number of both matrix must be same.
cbind(a,b)
#Combining matrix a and d. For it, row number of both matrix must be same.
cbind(a,d)
#Deconstruction of matrix a
c(a)
-----------------------------------------------------------List (slicing, member reference)
#List is a number of vectors with diffferent components
a = c(4,7)
b = c("sweet", "sour", "bitter")
d = c(TRUE, FALSE,FALSE)
x = list(a,b,d)
z = list(a,b,d,9)
x[1]
x[2]
x[3]
x[c(2, 3)]
z[c(1, 4)]
x[[1]]
x[[1]][1] = 6
x[[1]]
a
a = list(spring=c("bird", "flower", "breeze"), autumn=c("fruits", "foliage"))
a
a["spring"]
a[c("spring", "autumn")]
a[["spring"]]
a$spring
attach(a)
spring
detach(a)
-----------------------------------------------------------Data frames
#Data frame is used to store data tables.
#It is a list of vectors of equal length
#There are many in-built data frames in R.
a = c(1, 3, 5)b = c("ef", "gb", "hj")
c = c(ed, rt, yu)
df = data.frame(a, b, c)
a = c(4,7,9)
b = c("sweet", "sour", "bitter")
d = c(TRUE, FALSE,FALSE)
df = data.frame(a,b,d)
df
#Element at row1 and column2
df[1, 2]
nrow(df)
ncol(df)
head(df)
#column slicing
df[[2]]
df[["b"]]
df$b
df[,"b"]
df[1]
df["a"]
df[c("a", "b")]
#row slicing
#First row
df[1,]
#First and second row
df[c(1, 2),]
#Example of an in-built data frame
#The header contains columns names. Data rows constitute of many rows.mtcars
#Find value at fisrt row, second column
mtcars[1, 2]
head(mtcars)
nrow(mtcars)
ncol(mtcars)
help(mtcars)
mtcars
mtcars[1, 2]
mtcars["Mazda RX4", "cyl"]
nrow(mtcars)
ncol(mtcars)
head(mtcars)
#Importing a data frame
getwd()
setwd( "C:/Users/Seema/Desktop")
library(XLConnect)
wk = loadWorkbook("translation.xls")
df = readWorksheet(wk, sheet="Sheet1")
df
-----------------------------------------------------------R can be used for NGS, microarary, ChipSeq data analysis
-----------------------------------------------------------
#To clean the terminal
Ctrl L
# print the current working directory
getwd()
# list files and folders in the current directory
dir()
# list the objects in the current workspace
ls()
# change to specified directory, suppose to Desktop
setwd("C:/Users/Seema/Desktop")
# list all R libraries
library()
#To find help page for a function
?data.frame
#To find functions with the a particular word
??list
#To quit console
q()
----------------------------------------------------------------------------
#Assignment with symbol <-
x<- 5
x %% y
x == 4
#Operators
x<-c(1:5)
x
x>4
logical1<- x>4
logical2<- x<3
logical1
logical2
logical1 | logical2
logical1 & logical2
logical1 && logical2
x[logical1]
x[logical2]
----------------------------------------------------------------------------
The link below is a great online tool for R practice
http://www.tutorialspoint.com/r_terminal_online.php
Execute the script by the command
source ('script.R')
Being familiar with naming conventions make it easy to be well-versed with the language, and customize output, so here is some common R jargons.
---------------------------------------------------------------------
help (Distributions), help (Normal), help (TDist), help(Chisquare), help(Binomial)
---------------------------------------------------------------------
seq (from, to, by=)
NA: Not available
NAN: Not a number
dt: Distribution function
pt: Cumulative probability distribution function
qt: Inverse cumulative probability distribution function
rt: Random number generation
dnorm: Gives the height of the probability distribution (density)
pnorm: Gives the distribution function
qnorm: Gives the quantile function
rnorm: Generates random deviates
col: plotting color (for axis, labels, titles, subtitles, foreground, background)
par: current settings
mar: margin
----------------------------------------------------------------------------
Some examples to see result as infinite value or NAN
0 / 0
1/0
sin(Inf)
cos(Inf)
tan(Inf)
Some examples to see result as infinite value or NAN
0 / 0
1/0
sin(Inf)
cos(Inf)
tan(Inf)
> 0/0
[1] NaN
> 1/0
[1] Inf
> sin (Inf)
[1] NaN
> cos (Inf)
[1] NaN
> tan(Inf)
[1] NaN
##########################################
Plot
x <-c(1,2,3)
y <-c(10,9,8)
plot(x,y,”l”)
To plot dibution graph (using dt)
x <- seq(-10,20,by=.5)
y <- dt(x,df=5)
plot(x,y)
y <- dt(x,df=20)
plot(x,y)
To plot distribution graph (using pt)
pt(3,df=10)
1-pt(3,df=5)
x = c(-3,-6,-2,-1)
pt((mean(x)-2)/sd(x),df=10)
To plot distribution graph (using qt)
qt(0.05,df=10)
v <- c(0.005,.025,.05, 0.5)
qt(v,df=27)
To plot distribution graph (using rt)
rt(3,df=5)
##########################################
To plot graph (using dnorm)
dnorm(0)
dnorm(0)*sqrt(2*pi)
x <- seq(-10,10,by=0.5)
y <- dnorm(x)
plot(x,y)
y <- dnorm(x,mean=1,sd=0.5)
plot(x,y)
##########################################
To plot graph (using pnorm)x <- seq(-5, 2, 1)
y1 <- pnorm(x)
y2 <- pnorm(x,1,4)
plot(x,y1,type="l",col="green")
plot(x,y2,type="l",col="blue")
##########################################
To plot both y1 and y2 in one graph
x <- seq(-1, 1, 0.5)
y1 <- pnorm(x)
y2 <- pnorm(x, 1, 2)
matplot(x, cbind(y1,y2),type="l",col=c("blue","red"),lty=c(1,1))
##########################################
Chi square calculation (dchisq, pchisq, qchisq, rchisq)
x <- seq(-10,20,by=.5)
y <- dchisq(x,df=5)
plot(x,y)
y <- dchisq(x,df=10)
plot(x,y)
----------------------------
pchisq(2,df=10)
x = c(2,4,5,6)
pchisq(x,df=20)
----------------------------
qchisq(0.05,df=5)
y <- c(0.005,.025,.05)
qchisq(y,df=20)
----------------------------
rchisq(3,df=10)
##########################################
Binomial calculation (dbinom, pbinom, qbinom, rbinom)
x <- seq(0,20,by=1)
y <- dbinom(x,20,0.2)
plot(x,y)
pbinom(10,20,0.5)
qbinom(0.5,25,0.5)
rbinom(5,20,0.5)
##########################################
#Use of ggplot2 (the graphic package implemented on top of R package)
library(ggplot2)
ggplot(ToothGrowth, aes(x=as.factor(dose), y=len, color=supp)) +
geom_boxplot(position=position_dodge(0.5))+
geom_jitter(position=position_dodge(0.4)) +
xlab("dose")
-------------------------------------------------------------------------------
#Loading excel file data to make data frame
#Option1: perl needed for the code to work
help(read.xls)
data = read.xls("data.xls")
#Option2: java needed for the code to work
wk = loadWorkbook("data.xls")
df = readWorksheet(wk, sheet="Sheet1")
#Loading text file data
mydata = read.table("mydata.txt")
mydata
**For the above loading codes to work, the data file must be in proper directory path. The codes below can be used to fix the path.
getwd()
setwd( "C:/Users/Seema/Desktop")
-------------------------------------------------------------------------------
Limma: Linear Models for Microarray Data
##########################################
Chi square calculation (dchisq, pchisq, qchisq, rchisq)
x <- seq(-10,20,by=.5)
y <- dchisq(x,df=5)
plot(x,y)
y <- dchisq(x,df=10)
plot(x,y)
----------------------------
pchisq(2,df=10)
x = c(2,4,5,6)
pchisq(x,df=20)
----------------------------
qchisq(0.05,df=5)
y <- c(0.005,.025,.05)
qchisq(y,df=20)
----------------------------
rchisq(3,df=10)
##########################################
Binomial calculation (dbinom, pbinom, qbinom, rbinom)
x <- seq(0,20,by=1)
y <- dbinom(x,20,0.2)
plot(x,y)
pbinom(10,20,0.5)
qbinom(0.5,25,0.5)
rbinom(5,20,0.5)
##########################################
#Use of ggplot2 (the graphic package implemented on top of R package)
library(ggplot2)
ggplot(ToothGrowth, aes(x=as.factor(dose), y=len, color=supp)) +
geom_boxplot(position=position_dodge(0.5))+
geom_jitter(position=position_dodge(0.4)) +
xlab("dose")
-------------------------------------------------------------------------------
#Loading excel file data to make data frame
#Option1: perl needed for the code to work
install.packages("gdata")
library("gdata")library(gdata)
help(read.xls)
data = read.xls("data.xls")
#Option2: java needed for the code to work
install.packages("XLConnect")
library("XLConnect")library(XLConnect)
wk = loadWorkbook("data.xls")
df = readWorksheet(wk, sheet="Sheet1")
#Loading text file data
mydata = read.table("mydata.txt")
mydata
**For the above loading codes to work, the data file must be in proper directory path. The codes below can be used to fix the path.
getwd()
setwd( "C:/Users/Seema/Desktop")
-------------------------------------------------------------------------------
R for statistics
#Regression command
practice <- 21:100
score <- 20 + 2 * hours + rnorm(n=80,mean=0,sd=20)
plot(practice,score)
fit <- lm(score ~ practice)
lines(practice, predict(fit), col='orange')
#K-means Clustering
kmeans_clust <- kmeans(expression, 10)
#Hierarchical Clustering
dist_mat <- dist(expression, method = "euclidean")
hier_clust <- hclust(dist_mat, method="ward")
plot(hier_clust)
Limma: Linear Models for Microarray Data
No comments:
Post a Comment