Friday, September 22, 2017

Laboratory tools and reagents (Micro-pipettes)...

Micro-pipettes are essential tools of R & D labs, and integral part of Good Laboratory Practices (GLPs)
Micro-pipetting methods include forward and reverse. 
Manual micropipettes come in 4 sizes.
P10: 1.0 - 10.0 micro L
P20: 2.0 - 20.0 micro L
P200: 20 - 200 micro L
P1000: 200 - 1000 micro L

Accuracy decreases as you use unnecessarily large pipettes for small volumes.
Use filtered pipette tips when performing PCR or working with RNA. Filtered tips are vitally important for preventing contamination during PCR experiments and other molecular biology applications such as bacteriology and work in radioactive areas. Primarily, they prevent the sample solution from contaminating the pipette cone during aspiration.
(https://www.genfollower.com/)

Pipettes can be manual, single-channel, multi-channel, manual or electronic.
Its important to calibrate the pipettes periodically. For high-precision molecular protocols, even slight variation in settings can lead to errors.


Monday, September 18, 2017

About Thermo Fisher sequencing.......

Ion semiconductor sequencing technology is used for Ion Torrent (Thermo Fisher) sequencers.
Sequencing-by-synthesis method and emulsion PCR (emPCR) is used but no light (no optics; post-light sequencing) is used. Instead of fluorescence or chemiluminescence, H+ ions released during base incorporation is measured.

It offers sequencing within 2-3 hours, rapid turnaround time (~2 days) from sample to DNA sequences, high Accuracy: 99.97%.
(Ion PGM sequencer)

(Ion S5 sequencer))
The steps occurring in the sequencer include primer annealing, polymerase binding, and chip loading.
Kits: library kit, template kit, sequencing kit. The library kit is based on 200bp chemistry and includes barcodes to enable the cost-effective and flexible processing of samples. The barcode adapters contain sequences, and they enable pooling up to 24 amplicon libraries and to conduct multiplexed NGS analysis, reducing the sequencing cost per sample.

Life Technologies/ Thermo Fisher  NGS portfolio :
Ion Personal Genome Machine (PGM) sequencer---device cost $50k , run cost  $300 to $750, output capability 1Gb, speed of runs 2 hours,  read length 200b. For this sequencer, 10 ng input DNA needed.  Consensus accuracy 99.99%

Ion Proton™-----device $149k, first chip PI, is able to generate ~10Gb per run while the PII is able to generate ~100Gb per run. Its for whole genome sequencing.

S5 system is superior to Proton or the PGM.  S5 deals with Indel issues better. With the Ion Torrent Chef's library construction functionality, eight AmpliSeq libraries can be prepared in an eight-hour run. Combined with Ion Torrent AmpliSeq technology for target selection, the Ion  Chef System for automated library and template preparation, and Ion Reporter Software for automated variant annotation, targeted sequencing is easy.

Workflow consists of four major steps: library construction (construct library), template prep (prepare template), sequencing (run sequence) and analysis (analyze data).
Library Construction:  taking DNA (or RNA converted to DNA), fragmenting it to a uniform size (generally 200-400b),  adding sequencing adapters.

Template Prep/Amplification: fragments generated during the library prep are attached to beads and amplified using emulsion PCR. Beads coated with complementary primers are mixed with a dilute aqueous solution containing the fragments to be sequenced along with the necessary PCR reagents. This solution is then mixed with oil to form an emulsion of microdroplets. The concentration of beads and fragments is kept low enough such that each microdroplet contains only one of each. Clonal amplification of each fragment is then performed within the microdroplets. Following amplification the emulsion is ‘broken’ and the amplified beads are enriched in a glycerol gradient.


(http://www.thermofisher.com/us/en/home/technical-resources/research-tools/image-gallery/image-gallery-detail.19800.html)
Sequencing: Individual bases are introduced one at a time and incorporated by DNA polymerase.  The direct release of H+ (protons) from the reaction is measured. The chips act as pH meters and are disposable. Image scans are not needed, so sequencing reactions are relatively fast, with 200b reads taking about 2 hours. 
#Error rates are ~1%, but pyrosequencing chemistry has trouble with long homopolymers (e.g., AAAAAA). Stretches of the same base will result in a single, but stronger, signal. 

Data analysis: Standard output files are FASTQ, BAM, SFF, VCF. Torrent Browser’ software is main interface, while  cloud-based solution ‘Ion Reporter’ is used for data analysis.

Chip:
The  chip has  a dense array of >1 million micro-machined wells with ion sensor. Each well contains a different DNA template, allowing massively parallel sequencing. 1 chip is used per run. Beneath the wells is an ion-sensitive layer and a proprietary Ion sensor at the bottom. 


Three chips are available for the S5 instruments: the 520, 530, and 540, all of which have 2.5-hour sequencing run times for 200 base single-end reads. 
The 520 chip generates between 1Gb to 2 Gb of data, or 3 million to 5 million reads with read lengths up to 400 bases. Analysis takes five hours on the S5 and one-and-a-half hours on the XL. The 520 chip is geared primarily toward gene panels.
The 530 chip generates between 3 Gb and 5 Gb of data, or 15 million to 20 million reads with read lengths up to 400 bases. The standard eight-hour analysis time can be shortened to two-and-a-half hours with the XL.
The 540 chip generates between 10 Gb to 15 Gb of data, or 60 million to 80 million reads. It does not yet support 400-base reads. Analysis time is 16.5 hours on the S5 and five hours with the XL. The 540 chip is similar to the PI chip on the Proton, able to sequence exomes and transcriptomes, as well as larger gene panels.
The microprocessor chips generate fewer reads than HiSeq or SOLiD, so it's used was confined to small genomes and targeted sequencing. 

(http://www.genomics.cn/en/navigation/show_navigation?nid=2640)

(https://www.researchgate.net/publication/255715498_Single-Cell_Semiconductor_Sequencing)

(http://www.genomics.cn/en/navigation/show_navigation?nid=2640)

(http://allseq.com/knowledge-bank/sequencing-platforms/ion-torrent/)
#If problem occurs, shut down the system and server and reboot them. The server may enter a system check after the reboot, which will take 3-4 hours. If you have a monitor and keyboard connected directly to the Torrent Server, you can avoid the system check by pressing “c” on the keyboard. The data will automatically transfer to the server after connection.

Uses of this sequencer include, exome sequencing, noninvasive prenatal testing, variant detection by targeted gene sequencing in cancer and genetic disorders, characterizing novel microbes, typing of pathogens.

Friday, September 1, 2017

Indel calling....

Insertion and deletion (indel) are common class of mutations in the human genome. 


As per Human Gene Mutation Database (HGMD), indels are associated with at least 22% human diseases.
Indel is linked to human genetic diseases and cancer (cystic fibrosis, fragile X syndrome, Bloom syndrome, Huntington disease, acute myeloid leukemia, lung cancer etc.). They are common mechanism of kinase activation in cancer. 
Insertion of transposable elements such as Alu, L1, and SVA can interrupt gene function and cause diseases like hemophilia, neurofibromatosis, muscular dystrophy, and cancer.
On the other hand, deletions in the CFTR gene have been found to cause cystic fibrosis.

Serial replication slippage model explains the simple deletions and tandem duplications, leading to gene rearrangements.

Indel can occur in coding or non-coding region. The indel can be in-frame or frameshift. Frameshift indels can lead to premature stop codons. Indels can also change gene expression by altering phasing and spacing of DNA sequences in the promoter regions. A small insertion of 5 bps can rotate the binding site to the opposite face of the DNA helix, whereas a long insertion of 100 bps can increase the spacing between two binding sites. Studies have shown that in the human body, 16 to 25 % of all sequence polymorphisms are indels. Yet, a high percentage of known indels still remain undetected.
So, indel discovery and characterization in the exonic regions is important.


In that objective, popular tools include
Read mappers: BFAST, Bowtie2, BWA, Novoalign, SHRIMP
Indel callers Dindel, FreeBayes, SNVer, GATK Unified Genotyper, VarScan, SAMtools, GATK

HaplotypeCaller, and Platypus.
Bayesian probabilistic model (Dindel, GATK Unified Genotyper, SAMtools )
Heuristic model (VarScan)
Alignment-based methods map the reads to the reference sequence using read mapping software such as BWA and Novoalign. Then the mappers call indels using the alignment data following filtering steps. “True Call” refers to the indels that passed after the filters.

Analysis: Copy Number Alterations (CNA) calling/ Copy Number Variation (CNV) calling....

Copy number alterations (CNAs) are changes in copy number that have arisen in somatic tissue
(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3017908/)
Identification of gene CNAs in an individual patient’s tumor is increasingly relevant.
FASTQ files sample sheet-----Parsing---------Variant Calling--------CNA calling
Calling CNA from amplicon sequencing data is feasible. Non-tumor (germline) DNA is needed for data normalization.
---------------
Copy Number Variation (CNV), the structural variation in the genome occur due to repetition, due to duplication or deletion
(https://en.wikipedia.org/wiki/Copy-number_variation#/media/File:Gene-duplication.png)
The BCFtools detects copy number alterations, aneuploidy and contamination.Chromosome aneuploidy (an abnormal number of chromosomes) is one cause of IVF failure as embryos with aneuploidy mostly do not implant or miscarry during the first trimester of pregnancy.


#compile with
make USE_GPL=1 clean all
#type
bcftools polysomy
#(polysomy command takes on input VCF
#Find chromosomal aberrations
bcftools polysomy -v -o outdir/ file.vcf
#results can be found in outdir/dist.dat
#See the output file
cat outdir/dist.dat | awk '$1=="CN" && $3!=2.0'
#For clean data, the third column should be 2.0 for normal diploid state, 1.0 for a loss, 3.0 for gain, and -1 is used when the program cannot determine the state, usually because of noisy data
#The distribution can be plotted using the auto-generated matplotlib script

#to detect differences between two samples
bcftools cnv -c conrol_sample -s query_sample -o outdir/ -p 0 file.vcf
The -p 0 option tells the program to automatically call matplotlib and produce plots like the one in this example
# Annotation file with BAF values for two samples
$ zcat baf.txt.gz | head -2

# Index the annotation file and fill in the BAF values. Add a BAF definition into the VCF header
$ tabix -s1 -b2 -e2 baf.txt.gz
$ echo '##FORMAT=<ID=BAF,Number=1,Type=Float,Description="NGS estimate of BAF">' > baf.hdr
$ bcftools annotate -a baf.txt.gz -h baf.hdr -c CHROM,POS,FMT/BAF -Ob -o output.bcf input.bcf

Tuesday, August 15, 2017

Plants tissue culture..........

Plant tissue culture is the technique to grow plant cells, tissues or organs under sterile conditions on a nutrient culture medium of known composition.
Plant tissue culture is widely used to produce clones of a plant in a method known as micropropagation. These new plantlets are grown in a short period of time, and they are free of soil-borne pathogens.
Sterile agar gel, plant hormones, and nutrients is needed for this objective. So. it makes tissue culture expensive and difficult than taking plant cuttings. However, its used widely, for different goals, including clonal propagation and genetic alteration for better vigor against pathogens/pests.
Four steps of tissue culture techniques:
1. Inoculation of explant 2. Incubation of culture 3. Sub-culturing 4. Transplantation of the regenerated plant.
(http://askmissteong.blogspot.com/2013/01/plant-tissue-culture-recommended-videos.html)

Step 1. Inoculation of explant:
The sterilized explants are transferred to the nutrient medium.

Step  2. Incubation of culture: 
After inoculation, the cultures are incubated in culture room or in incubator. Low humidity can cause desiccation, while high humidity can lead to microbial contamination.

Step 3. Sub-culturing:
The progress of the in vitro-grown tissues are monitored periodically.
For suspension culture, media is changed.
For callus culture, the sub-culturing of the callus tissue is performed.

Step 4. Transplantation of the regenerated plant:
Plants regenerated from in vitro tissue culture are transplanted to soil.  Acclimatization of these regenerated plants are conducted to get it prepared for survival in the field conditions.


Biostatistics for biomedical use..

Feature selection is used to identify the most discriminating features for biomarker discovery, medical diagnosis, and gene selection. 
Random Forest (RF): it is an ensemble (of multiple decision trees) classifier, which applies bagging technique to construct an ensemble of trees, with randomization technique for the growth of each tree.   RF is suitable for high-dimensional and small-sample datasets.
Support Vector Machine (SVM): it is a supervised classifier, generally used in bi-classification problem, but can be extended to multi-class problem.
provide a part of the data to linear SVM and tune the parameters such that SVM can can act as a discriminatory function separating the ham messages from the spam messages
#In R code
sms_data<-read.csv("sms_spam.csv",stringsAsFactors = FALSE)

head(sms_data)
************
Parzen window based distribution calculates probability density function (pdf) in non-parametric approach

Each data point contributes equally to pdf
Uniform distribution
Normal distribution (bell curve). pdf is Gaussian function here
Variance is square of standard deviation
Probability (p) = k/n
************
Artificial neural network (ANN)
ANN helps model complex relations between input and output. Finds patterns in data (e.g. protein catabolic rate, optical character recgnition (OCR), )
Input---hidden---output
ANN architecture can have many layers i.e 1 (3 node), 2 (4 node), 3 (2 node)...
Transfer function = sum of all weight * input
There are man activation functions
Deep learning is about making data analysis sophisticated enough to derive personality
Lowest E means less difference between desired and actual value (training iteration tends to minimize E)
Genetic algorithm (GA) is more random
If wave E(w), use GA
If steep descent, use back propagation or anything based on gradient descent
Clustering can be crisp or fuzzy

FDA, Pharmacy and regulatory writing...........

I got interested for a career in medical writing......
(https://www.kent.ac.uk/careers/workin/sciencewriting.htm)
Though I have a immense number of medical publications, I had not much knowledge of regulatory writing. So, I had to learn from scratch.  It had  a steep learning curve, but the journey was exciting.
Had to be familiarized with  a number of acronyms......
AE = Adverse Event
BIMO = Bioresearch Monitoring program
CER=Clinical Evaluation Reports
CPM = Clinical Project Manager
CRC = Clinical Research Coordinator
CRF = Case Report Form
CFR = Code of Federal Regulations
CRA = Clinical Research Associate
CRO = Contract Research Organization
EDC = Electronic Data Capture
FCE = Field Clinical Engineer
FDA = Food and Drug Administration
FDF = Financial Disclosure Form
ICF = Informed Consent Form
IDE = Investigational Device Exemption
IMV = Interim Monitoring Visits
IND = Investigational New Drug
IRB = Institutional Review Board
MP = Monitoring Plan
NDA = New Drug Application
PI = Principal Investigator
PMA = Pre Market Approval
QC = Quality Check
RBM = Risk Based Monitoring
RMP = Risk Management Plan
TMF = Trial Master File
SAE = Serious Adverse Event
SSR=Safety Surveillance Reports
WL = Warning Letter
483 = Inspectional Findings


FDA categorizes medical devices into three classes
Class I : low risk and subject to the least regulatory controls
Class III : highest risk devices and subject to the highest level of regulatory control, often requiring agency approval before they can be marketed.



Laboratory tools and reagents (Micro-pipettes)...

Micro-pipettes are essential tools of R & D labs, and integral part of Good Laboratory Practices (GLPs) Micro-pipetting methods include ...