Biomed&Informatics: September 2017

Friday, September 22, 2017

Laboratory tools and reagents (Micro-pipettes)...

Micro-pipettes are essential tools of R & D labs, and integral part of Good Laboratory Practices (GLPs)

Micro-pipetting methods include forward and reverse.

Manual micropipettes come in 4 sizes.
P10: 1.0 - 10.0 micro L
P20: 2.0 - 20.0 micro L
P200: 20 - 200 micro L
P1000: 200 - 1000 micro L

Accuracy decreases as you use unnecessarily large pipettes for small volumes.

Use filtered pipette tips when performing PCR or working with RNA. Filtered tips are vitally important for preventing contamination during PCR experiments and other molecular biology applications such as bacteriology and work in radioactive areas. Primarily, they prevent the sample solution from contaminating the pipette cone during aspiration.

(https://www.genfollower.com/)

Pipettes can be manual, single-channel, multi-channel, manual or electronic.

Its important to calibrate the pipettes periodically. For high-precision molecular protocols, even slight variation in settings can lead to errors.

Monday, September 18, 2017

About Thermo Fisher sequencing.......

Ion semiconductor sequencing technology is used for Ion Torrent (Thermo Fisher) sequencers.

Sequencing-by-synthesis method and emulsion PCR (emPCR) is used but no light (no optics; post-light sequencing) is used. Instead of fluorescence or chemiluminescence, H+ ions released during base incorporation is measured.

It offers sequencing within 2-3 hours, rapid turnaround time (~2 days) from sample to DNA sequences, high Accuracy: 99.97%.

(Ion PGM sequencer)

(Ion S5 sequencer))

The steps occurring in the sequencer include primer annealing, polymerase binding, and chip loading.

Kits: library kit, template kit, sequencing kit. The library kit is based on 200bp chemistry and includes barcodes to enable the cost-effective and flexible processing of samples. The barcode adapters contain sequences, and they enable pooling up to 24 amplicon libraries and to conduct multiplexed NGS analysis, reducing the sequencing cost per sample.

Life Technologies/ Thermo Fisher NGS portfolio :

Ion Personal Genome Machine (PGM) sequencer---device cost $50k , run cost $300 to $750, output capability 1Gb, speed of runs 2 hours, read length 200b. For this sequencer, 10 ng input DNA needed. Consensus accuracy 99.99%

Ion Proton™-----device $149k, first chip PI, is able to generate ~10Gb per run while the PII is able to generate ~100Gb per run. Its for whole genome sequencing.

S5 system is superior to Proton or the PGM. S5 deals with Indel issues better. With the Ion Torrent Chef's library construction functionality, eight AmpliSeq libraries can be prepared in an eight-hour run. Combined with Ion Torrent AmpliSeq technology for target selection, the Ion Chef System for automated library and template preparation, and Ion Reporter Software for automated variant annotation, targeted sequencing is easy.

Workflow consists of four major steps: library construction (construct library), template prep (prepare template), sequencing (run sequence) and analysis (analyze data).

Library Construction: taking DNA (or RNA converted to DNA), fragmenting it to a uniform size (generally 200-400b), adding sequencing adapters.

Template Prep/Amplification: fragments generated during the library prep are attached to beads and amplified using emulsion PCR. Beads coated with complementary primers are mixed with a dilute aqueous solution containing the fragments to be sequenced along with the necessary PCR reagents. This solution is then mixed with oil to form an emulsion of microdroplets. The concentration of beads and fragments is kept low enough such that each microdroplet contains only one of each. Clonal amplification of each fragment is then performed within the microdroplets. Following amplification the emulsion is ‘broken’ and the amplified beads are enriched in a glycerol gradient.

(http://www.thermofisher.com/us/en/home/technical-resources/research-tools/image-gallery/image-gallery-detail.19800.html)

Sequencing: Individual bases are introduced one at a time and incorporated by DNA polymerase. The direct release of H+ (protons) from the reaction is measured. The chips act as pH meters and are disposable. Image scans are not needed, so sequencing reactions are relatively fast, with 200b reads taking about 2 hours.

#Error rates are ~1%, but pyrosequencing chemistry has trouble with long homopolymers (e.g., AAAAAA). Stretches of the same base will result in a single, but stronger, signal.

Data analysis: Standard output files are FASTQ, BAM, SFF, VCF. Torrent Browser’ software is main interface, while cloud-based solution ‘Ion Reporter’ is used for data analysis.

Chip:

The chip has a dense array of >1 million micro-machined wells with ion sensor. Each well contains a different DNA template, allowing massively parallel sequencing. 1 chip is used per run. Beneath the wells is an ion-sensitive layer and a proprietary Ion sensor at the bottom.

Three chips are available for the S5 instruments: the 520, 530, and 540, all of which have 2.5-hour sequencing run times for 200 base single-end reads.

The 520 chip generates between 1Gb to 2 Gb of data, or 3 million to 5 million reads with read lengths up to 400 bases. Analysis takes five hours on the S5 and one-and-a-half hours on the XL. The 520 chip is geared primarily toward gene panels.

The 530 chip generates between 3 Gb and 5 Gb of data, or 15 million to 20 million reads with read lengths up to 400 bases. The standard eight-hour analysis time can be shortened to two-and-a-half hours with the XL.

The 540 chip generates between 10 Gb to 15 Gb of data, or 60 million to 80 million reads. It does not yet support 400-base reads. Analysis time is 16.5 hours on the S5 and five hours with the XL. The 540 chip is similar to the PI chip on the Proton, able to sequence exomes and transcriptomes, as well as larger gene panels.

The microprocessor chips generate fewer reads than HiSeq or SOLiD, so it's used was confined to small genomes and targeted sequencing.

(http://www.genomics.cn/en/navigation/show_navigation?nid=2640)

(https://www.researchgate.net/publication/255715498_Single-Cell_Semiconductor_Sequencing)

(http://www.genomics.cn/en/navigation/show_navigation?nid=2640)

(http://allseq.com/knowledge-bank/sequencing-platforms/ion-torrent/)

#If problem occurs, shut down the system and server and reboot them. The server may enter a system check after the reboot, which will take 3-4 hours. If you have a monitor and keyboard connected directly to the Torrent Server, you can avoid the system check by pressing “c” on the keyboard. The data will automatically transfer to the server after connection.

Uses of this sequencer include, exome sequencing, noninvasive prenatal testing, variant detection by targeted gene sequencing in cancer and genetic disorders, characterizing novel microbes, typing of pathogens.

Friday, September 1, 2017

Indel calling....

Insertion and deletion (indel) are common class of mutations in the human genome.

As per Human Gene Mutation Database (HGMD), indels are associated with at least 22% human diseases.

Indel is linked to human genetic diseases and cancer (cystic fibrosis, fragile X syndrome, Bloom syndrome, Huntington disease, acute myeloid leukemia, lung cancer etc.). They are common mechanism of kinase activation in cancer.

Insertion of transposable elements such as Alu, L1, and SVA can interrupt gene function and cause diseases like hemophilia, neurofibromatosis, muscular dystrophy, and cancer.

On the other hand, deletions in the CFTR gene have been found to cause cystic fibrosis.

Serial replication slippage model explains the simple deletions and tandem duplications, leading to gene rearrangements.

Indel can occur in coding or non-coding region. The indel can be in-frame or frameshift. Frameshift indels can lead to premature stop codons. Indels can also change gene expression by altering phasing and spacing of DNA sequences in the promoter regions. A small insertion of 5 bps can rotate the binding site to the opposite face of the DNA helix, whereas a long insertion of 100 bps can increase the spacing between two binding sites. Studies have shown that in the human body, 16 to 25 % of all sequence polymorphisms are indels. Yet, a high percentage of known indels still remain undetected.

So, indel discovery and characterization in the exonic regions is important.

In that objective, popular tools include

Read mappers: BFAST, Bowtie2, BWA, Novoalign, SHRIMP

Indel callers Dindel, FreeBayes, SNVer, GATK Unified Genotyper, VarScan, SAMtools, GATK

HaplotypeCaller, and Platypus.

Bayesian probabilistic model (Dindel, GATK Unified Genotyper, SAMtools )

Heuristic model (VarScan)

Alignment-based methods map the reads to the reference sequence using read mapping software such as BWA and Novoalign. Then the mappers call indels using the alignment data following filtering steps. “True Call” refers to the indels that passed after the filters.

Analysis: Copy Number Alterations (CNA) calling/ Copy Number Variation (CNV) calling....

Copy number alterations (CNAs) are changes in copy number that have arisen in somatic tissue

(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3017908/)

Identification of gene CNAs in an individual patient’s tumor is increasingly relevant.

FASTQ files sample sheet-----Parsing---------Variant Calling--------CNA calling

Calling CNA from amplicon sequencing data is feasible. Non-tumor (germline) DNA is needed for data normalization.

---------------
Copy Number Variation (CNV), the structural variation in the genome occur due to repetition, due to duplication or deletion

(https://en.wikipedia.org/wiki/Copy-number_variation#/media/File:Gene-duplication.png)

The BCFtools detects copy number alterations, aneuploidy and contamination.Chromosome aneuploidy (an abnormal number of chromosomes) is one cause of IVF failure as embryos with aneuploidy mostly do not implant or miscarry during the first trimester of pregnancy.

#compile with
make USE_GPL=1 clean all
#type
bcftools polysomy
#(polysomy command takes on input VCF
#Find chromosomal aberrations
bcftools polysomy -v -o outdir/ file.vcf
#results can be found in outdir/dist.dat
#See the output file
cat outdir/dist.dat | awk '$1=="CN" && $3!=2.0'
#For clean data, the third column should be 2.0 for normal diploid state, 1.0 for a loss, 3.0 for gain, and -1 is used when the program cannot determine the state, usually because of noisy data
#The distribution can be plotted using the auto-generated matplotlib script

#to detect differences between two samples
bcftools cnv -c conrol_sample -s query_sample -o outdir/ -p 0 file.vcf
The -p 0 option tells the program to automatically call matplotlib and produce plots like the one in this example
# Annotation file with BAF values for two samples
$ zcat baf.txt.gz | head -2

# Index the annotation file and fill in the BAF values. Add a BAF definition into the VCF header
$ tabix -s1 -b2 -e2 baf.txt.gz
$ echo '##FORMAT=<ID=BAF,Number=1,Type=Float,Description="NGS estimate of BAF">' > baf.hdr
$ bcftools annotate -a baf.txt.gz -h baf.hdr -c CHROM,POS,FMT/BAF -Ob -o output.bcf input.bcf