Biomed&Informatics: Language: R (for bioinformatics problems)....

#Functions
> seq = "atGCTGGATGAGaGCGAaaGAGCAGAGAGCCA"
> toupper(seq)
[1] "ATGCTGGATGAGAGCGAAAGAGCAGAGAGCCA"
> tolower(seq)
[1] "atgctggatgagagcgaaagagcagagagcca"

#Substring
seq=toupper(seq)
substr(seq, start = 1, stop = 3)

#Joining strings
x <-"AAA"
x
y <- "BBB"
c(x,y)

paste(x,y)
paste(x,y,sep="")
[1] "AAABBB"
#Statistical operations on string vectors

> x=c("Orange","Apple","Banana")
> min(x)
[1] "Apple"
> max(x)
[1] "Orange"
> sort(x)
[1] "Apple" "Banana" "Orange"
Summary statistics

> x=c("A","T","A","G","C","A","A","T")
> summary(x)
Length Class Mode
8 character character
> y=factor(x)
> summary(y)
A C G T
4 1 1 2
> y[0]
factor(0)
Levels: A C G T

# filter() (and slice())

# filter based on text match
filter(mydata, Danio_symbol=="hoxd4a" )

# filter based on regular expression match
filter(mydata, grepl("hox", Danio_symbol) )

# filter based on columns above or below levels
filter(mydata, main.EO > 20000, Sach.s.EO > 20000, Hunter.s.EO> 20000, sk..muscle< 100 )

# filter based on ratio of columns
filter(mydata, Sach.s.EO/sk..muscle > 50)
filter(mydata, Sach.s.EO/sk..muscle > 50, sk..muscle > 10)

# arrange()

# select() (and rename())

# distinct()

# mutate() (and transmute())

# summarise()

# sample_n() (and sample_frac())

#If mydata is the filename
dim(mydata)
colnames(mydata)
names(mydata)
head(mydata)
head(mydata$gene_ID)

#Reading and writing csv files
mydata = read.table("/share/data/expression.csv", head=TRUE, row.names = 1)
write.table(mydata, "file", sep="\t")

#Reading and writing excel files
library(gdata)
mydata <- read.xls("/share/data/expression.xls", sheet=1)

#Read as matrix

mydata <- as.matrix(read.xls("/share/data/expression.xls", sheet=1))

#Process fastq files
library(ShortRead)
seq <- readFastq(“a.fq”)
primer <- “GGA”
trimmed <- trimLRPatterns(Lpattern=Primer,subject=sread(seq),Lfixed=”subject”)
primer
(First line opens a library called ‘ShortRead’. The second line loads a fastq file ‘a.fq’ into an array named ‘seq’. Fastq is a fasta like format typically used for short reads. The third line creates a string called ‘primer’. It contains the sequence ‘GGA’.
The fourth line goes through all short reads read from a.fq and trims them, if they have GGA or part of it on the 5’ end)

‘getwd()’ (it will tell you where it picks file from.
‘q()’ or ‘Exit’ (to quit)

Biomed&Informatics

Tuesday, July 25, 2017

Language: R (for bioinformatics problems)....

No comments:

Post a Comment

Laboratory tools and reagents (Micro-pipettes)...

Report Abuse