Saturday, July 22, 2017

Language: Perl (udiff for FASTA sequence difference finding)...

#! usr/bin/bash
--------------------------------------------------------
#To find difference between fasta sequence of reference and query sequence
#ref genome, query genome
reference="$1"
query="$2"

# Read the FASTA sequence, get the ID
read-fasta () {
    tail -n+2 "$1" | \
perl -n -e 'chomp; print join("\n",split("",$_))."\n"'
}

get-id () {
    head -n1 "$1" | sed 's/>//' | cut --fields=1 --delimiter=' '
}

refid=$(get-id "$reference")
cat <<EOF
#Useful when passing multi-line string to a variable, file or a piped command
cat <<EOF
EOF

#Options used for command diff
diff --ignore-case \
     --unified=1 \
     --minimal \
     --ignore-blank-lines \
#Input ref and query seq, pipe to udiff command
     <(read-fasta "$reference")  <(read-fasta "$query") | \
    udiff2vcf -r "$refid"
#udiff is module used to work with unified diff files via a JS API

No comments:

Post a Comment

Laboratory tools and reagents (Micro-pipettes)...

Micro-pipettes are essential tools of R & D labs, and integral part of Good Laboratory Practices (GLPs) Micro-pipetting methods include ...