Tuesday, July 25, 2017

Language: R (for bioinformatics problems)....

#Functions
 > seq = "atGCTGGATGAGaGCGAaaGAGCAGAGAGCCA"
 > toupper(seq)
 [1] "ATGCTGGATGAGAGCGAAAGAGCAGAGAGCCA"
 > tolower(seq)
 [1] "atgctggatgagagcgaaagagcagagagcca"

#Substring
seq=toupper(seq)
substr(seq, start = 1, stop = 3)

#Joining strings
x <-"AAA"
x
y <- "BBB"
c(x,y)

paste(x,y)
paste(x,y,sep="")
 [1] "AAABBB"
#Statistical operations on string vectors

 > x=c("Orange","Apple","Banana")
 > min(x)
 [1] "Apple"
 > max(x)
 [1] "Orange"
 > sort(x)
 [1] "Apple"  "Banana" "Orange"
Summary statistics

 > x=c("A","T","A","G","C","A","A","T")
 > summary(x)
    Length     Class      Mode
         8 character character
 > y=factor(x)
 > summary(y)
 A C G T
 4 1 1 2
 > y[0]
 factor(0)
 Levels: A C G T


 
# filter() (and slice())

 # filter based on text match
 filter(mydata, Danio_symbol=="hoxd4a" )

 # filter based on regular expression match
 filter(mydata, grepl("hox", Danio_symbol) )

 # filter based on columns above or below levels
 filter(mydata, main.EO > 20000, Sach.s.EO > 20000, Hunter.s.EO> 20000, sk..muscle< 100 )

 # filter based on ratio of columns
 filter(mydata, Sach.s.EO/sk..muscle > 50)
 filter(mydata, Sach.s.EO/sk..muscle > 50, sk..muscle > 10)

 # arrange()

 # select() (and rename())

 # distinct()


 # mutate() (and transmute())

 # summarise()


 # sample_n() (and sample_frac())

#If mydata is the filename
dim(mydata)
colnames(mydata)
names(mydata)
head(mydata)
head(mydata$gene_ID)

#Reading and writing csv files
mydata = read.table("/share/data/expression.csv", head=TRUE, row.names = 1)
 write.table(mydata, "file", sep="\t")

#Reading and writing excel files
 library(gdata)
 mydata <- read.xls("/share/data/expression.xls", sheet=1)

#Read as matrix

 mydata <- as.matrix(read.xls("/share/data/expression.xls", sheet=1))


#Process fastq files
library(ShortRead)
seq <- readFastq(“a.fq”)
primer <- “GGA”
trimmed <- trimLRPatterns(Lpattern=Primer,subject=sread(seq),Lfixed=”subject”)
primer
(First line opens a library called ‘ShortRead’. The second line loads a fastq file ‘a.fq’ into an array named ‘seq’. Fastq is a fasta like format typically used for short reads. The third line creates a string called ‘primer’. It contains the sequence ‘GGA’.
The fourth line goes through all short reads read from a.fq and trims them, if they have GGA or part of it on the 5’ end)

‘getwd()’ (it will tell you where it picks file from.
‘q()’ or ‘Exit’ (to quit)

No comments:

Post a Comment

Laboratory tools and reagents (Micro-pipettes)...

Micro-pipettes are essential tools of R & D labs, and integral part of Good Laboratory Practices (GLPs) Micro-pipetting methods include ...