Biomed&Informatics: July 2017

Monday, July 31, 2017

Databases (CouchDB, Storm)........

As big data arrives, databases have become a critical part of the data storage and mining.

(http://crossing-technologies.com/big-data-analytics/)

Database management systems: Relational Database Management System (RDBMS), NoSQL
RDBMS (table based) : MS SQL server, Oracle, IBM DB2, MySQL
NoSQL (key-value, colun and documents based): Cassandra, DynamoDB, BigTable, MongoDB, CouchDB
These two systems vary in their architecture, data handling
Creation, deletion, updating, installation are part of database technology.
Oracle
When Oracle is running in your environment, you need some DBA tasks, like exporting data from the Oracle database. Oracle’s exp utility, it is very easy to dump data from database.
Connect to DB in exp utility
Export (full database; one or more specific Schemas/Users; one or more specific Tables; all objects in a Tablespac
exp
ls -l expdat.dmp
CouchDB
Its written in Erlang language and now operated by Apache software
Its a document store database of NoSQL
JSON to store data, javascript to mine data, HTTP for API
API is RESTful HTTP API
Communication with HTTP protocol using GET (get a particular item), HEAD (to find only header of an item), POST (to upload data), PUT (create new items), DELETE (remove items), COPY (copy items)
Interaction with CouchDB with curl utility and Futon to create, delete and update databases and documents
#Communicating with the database
curl http://127.0.0.1:5984/

#Get list of databases

curl -X GET http://127.0.0.1:5984/_all_dbs

#Create a new database

curl -X PUT http://127.0.0.1:5984/new_database

#To check if the database is created

curl -X GET http://127.0.0.1:5984/_all_dbs

#To get information of the newly created database

curl -X GET http://127.0.0.1:5984/new_database

Futon is web-based interface of CouchDB for database and documents handling.

#To create database using Futon

http://127.0.0.1:5984/_utils/

#Creating a Document using curl

#Document is Field-value separated by colon closed in braces

curl -X PUT http://127.0.0.1:5984/database_ name/"id" -d ' { document} '

curl -X PUT http://127.0.0.1:5984/database_ name/"01" -d '

{ Name : Carl, Age : 32, Job : Artist, City: Paris} '

#Response of the database as 'ok', 'id', 'rev'

#Verifying the Document using curl

curl -X GET http://127.0.0.1:5984/database_ name/01

Storm

Apache Storm is a free and open source distributed realtime computation system.

(https://hortonworks.com/apache/storm/)

Storm makes it easy to reliably process unbounded streams of data.

Storm can be used with any programming language.

Its usage include realtime analytics, online machine learning, continuous computation.

Its fast, as over a million tuples can be processed per second per node.

Databases, jargons,acronyms, tools, emerging area, omics...

Bioinformatics
Layer 1: NCBI BLAST, EBI tools, etc.
Layer 2: install and run programs like BLAST, Clustal, Bowtie or Trinity in your local machine
Layer 3: R, PERL or python.
#######################Databases################
BioCyc : Assortment of organism specific pathway
COG: This protein database was generated by comparing predicted and known proteins
DAVID: Database for Annotation, Visualization and Integrated Discovery
DDBJ: DNA
DEG: Database of essential genes
DOOR: Database of Prokaryotic operons
EggNOG: Ortholog
EMBL: Molecular Biology
ENCODE: Encyclopedia of DNA Elements (an annotation database)
Ensemble: Vertebrate and eukaryote genome
GenBank: Genome
GEO: Gene Expression Omnibus (repository of high throughput gene expression data and hybridization arrays, chips, microarrays)
GO: Gene Ontology
GOLD : Metagenomics studies (Genome online database)
HMDB: Human Metabolome Database
IMG: Integrated Microbial Genomes comparative analysis system
KEGG: Kyoto Encyclopedia of Genes and Genomes

MetaCyc: metabolic-pathway

MG-RAST : Metagenomic samples

miRBase: the microRNA database (http://www.mirbase.org/)
nr: non-redundant (Protein search ) (All non-redundant GenBank CDS translations)
OMIM: Online Mendelian Inheritance in Man
Panther: Protein ANalysis THrough Evolutionary Relationships
PATRIC: Bacterial bioinformatics
PDB: Protein Data Bank
pfam: Proten family
PIR: Protein Information Resource
PRF: Protein Research Foundation
Rebase: Restriction enzymes, DNA methyltransferases
RefSeq: Reference Sequence
SRA: Short Read Archive (contains small reads of genomes)
Swiss Prot: Protein database
UCSC: Human genome reference
(versions of human genome references hg18 and hg19)
*Other databases: FlyMine, MouseMine, RatMine, YeastMine, EuPathDB, EpiGRAPH, GenomeSpace
###########################Bioinformatics acronyms#################################
ANN: Artificial neural networks
ASD: Autism Spectrum Disorders
BAC: Bacterial Artificial Chromosome
BAM: Binary Alignment Map
BLASR: Basic Local Alignment with Successive Refinement (a mapper_
BLAST: Basic Local Alignment Tool
BUSCO: Benchmarking Universal Single Copy Orthologs
CDS: Coding Sequence
COG: Clusters of Orthologous Groups

CRISPR: Clustered regularly interspaced short palindromic repeats

DCA: DNA Composition Analysis
DDBJ: DNA Database of Japan
EMBL: European Molecular Biology Laboratory
EST: Expressed Sequence Tag
FBA: Flux balance analysis
FMRI: Functional Magnetic Resonance Imaging (for brain imaging)
GATK: Genome Analysis Toolkit
GEE: Generalized estimating equations
GML: Graph Modelling Language

GRN: Genetic Regulatory Network

GSS: Genome Survey Sequences
GWAS: Genome-wide association studies (e.g. association between a SNP and each phenotype)
HDF: Hierarchical Data Format (sequencers generate high throughput data in this format)
HSP: Heat Shock Protein
HGAP: Hierarchical Genome Assembly Process

HMP: Human Microbiome Project

HGP: Human Genome Project
HTGS: High-Throughput Genome Sequence
MG-RAST: Metagenomics Rapid Annotation using Subsystem Technology
MANOVA: Multivariate analysis of variance
NCBI:National Center for Biotechnology Information
NIPT: Non-invasive Prenatal Testing
ORF: Open Reading Frame
QTL: Quantitative Trait Loci
RAD: Restriction-site Associated DNA
RFLP: Restriction fragment length polymorphism (exploits variations in homologous DNA sequences)
RMSD: Root Mean Square Distance
SAM: Sequence Alignment Map
SMRT: Single Molecule Real Time
SMS:Single Molecule Sequencing

TFBS: transcription factor-binding sites
TMHMM: Transmembrane Hidden Markov model (membrane protein topology prediction )

TSS: transcription start site

TU: transcription units
VCF: Variant calling format
WGS: Whole Genome Sequencing
##########Bioinformatics jargon###########
Bioinformatics is a multidisciplinary area, so is full of jargons. Some key words have been presented below.
Annotation: Generating metadata of data
Base calling: Identifying a nucleotide base
Bit score: Statistical properties of raw alignment score
Bootstrap: Any test that relies on random sampling
Built-in functions: Comes as standard package
Canonical: Conventional
Central dogma: DNA to make RNA, and then translation uses RNA to
make proteins
Clonal expansion: Progeny or daughter cells arising from single parent cell
cluster computing: parallel computing, ganglia, load distribution
Consensus sequence: Calculated order of most frequent residues
Coverage depth: Depth in squencing in terms of number of times a nucleotide is read
Deterministic: When returned result is always same
Epigenetics: Study of the complete set of epigenetic modifications on the genome
E-value: Measure of similarity between sequences (value 0 is best concordance, less value means more congruent sequences)
frameshift: stop codon mutation or indel modify ORF
GC bais: Dependence between GC content and coverage depth of a genomic region
Genotyping: Lineage determination
Horizonatal transfer: Transfer of genes between organisms
Inversion: Sequence put in other orientation
In vitro passage: Changes in the cells or microbes due to serial culturing
Isoform: Splice variants
Ka/Ks: rate of non-synonymous to synonymous substitutions
Moonlighting protein: When a protein performs many functions
Non-synonymous substitution: When a base change causes change of amino acidPromoter: 100 bases upstream of start position
Pleiotropy: When one gene influences more than one phenotypic traits
Pulling: Extracting
Quality Trimming: Wrongly called bases are removed
Redundancy of the genetic code: Multiple codons for one amino acid
Synteny: When two or more genomic regions are derived one genomic region
Translocation: Sequence moved to another position
File formats
aln: Alignment
Axt:
BAM: Binary alignment
BED:Browser Extensible Data (12 columns: 11th is blockSizes, 12th is blockstarts)
bigWig:
Chain:
Fasta: text files that starts with > symbol
fna: fasta nucleic acid
gbk: Detailed info, NCBI format
GenePhred:
GFF: General Feature Format) format consists of one line per feature
hmm: has position-specific scores
jnlp: java file that needed to be run by java web start
json: Javascript Object Notation
md: Its the short of MarkDown editing in Windows (it shows preview of the edit)
Microarray:
Net:
pileup: (its generated by SAMTOOLS, from it SNP are extracted)
PLINK: For genotypic analysis
SAM: Sequence Alignment Map
vCard: Simple text files with features
VCF: variant calling (It has 19 columns)
WIG: Wiggle Track Format (compact)
XML: Extensible Markup Language
Softwares
Aligners (mappers): Bowtie, BWA, BBmap, SOAPaligner (will give SAM, BAM or any other aligned files)
Local aligners (re-aligners around indels): GATK, BQSR (these aligners will give realigned BAM files) AbySS: Paired-end sequence assembler that is designed for short reads
aragorn: To find tRNA
BAMtools: To generate coverage files
BEDTOOLS: To manipulate big data files
Bowtie: Ultrafast short read alignment (takes read and ref, aligns, converts sam to bam, sort)
BreakDancer: For genome-wide detection of structural variants
Broad's variant calling software: for pathogens, cancer
BQSR: Base Quality Score Recalibration
GATK: Genome Analysis Toolkit
Gephi: for visualizing and analyzing large network graphs
HMMER: For sequence alignment and homolog finding
kallisto:For near-optimal RNA-Seq quantification
khmer :for working with DNA shotgun sequencing data from genomes, transcriptomes, metagenomes, and single cells
MAUVE: Multiple genome alignment. It takes care of large genome rearrangements
Picard: To manipulate SAM files
prodigal: To predict CDS
prokka: for de novo annotation of genomes
SAMTOOLS: For NGS analysis
tbl2asn: Generates sequence records for submission to Genbank
VarScan: Mutation Caller
velvet: de novo assembler to build long continuous sequences (contigs)
Web tools (interfaces)
Galaxy: Web-based platform for NGS data analysis
MetaboAnalyst: For metabolomic data analysis
SMART: Simple Modular Architecture Research Tool
UCSC: A genome browser
Bioinformatics Resources
Rosalind
StackExchange
SuperUser
Seqanswer
Bioconductor consortium (from R community)
Protein 3D structure prediction...........
Ab initio:
QUARK is used for ab initio modelling. it uses Monte Carlo simulation to predict protein structure, even without available global template.
Comparative: MODELLER, Phyre
Protein 3D structure viewer...........
PyMol
RasMol
Swiss pdb Viewer
Emerging areas.......
Comparative genomics
Regulatory genomics
Systems biology
Epigenomics
Cancer biology
Next generation sequencing
De novo genome assembly
Functional annotation
Gene prediction
Metageomics
------------------------------------------------------------------------------------------------------------
Ome............
Exome: Protein coding parts of genes

Genome: Complete set of DNA

Interactome: Whole set of molecular interactions
Metabolome: Entire set of small molecules
Patome: All patents
Proteome: All proteins
Reactome: All biological pathways in an organism
Transcriptome: Entire set of RNA molecules
------------------------------------------------------------------------------------------------------------------
Indel can occur by replications, recombination, mobile genetic element...
Deletions can be one bp long or entire gene length.
Frameshifts cause gene fusions, restoring functions ablated (lost) before
non-sense mutation (introduction of stop codon) cause ORF truncation
mutation stop codon cause ORF extension
Coverage of BAM file is calculated
Partitioning approach for metagenome assembly.
Prepeocessing include digital normalization, knot removal
genome separation/binning/strain extraction on raw reads.
Digital normalization and partitioning are effective methods to assemble large metagenomic data. The assembly data has been uploaded to MG-RAST for annotation

Bio/ bioinfo tools: Sharing and storage tools.....

Google drive
https://drive.google.com/drive/my-drive
create
share
name it
invite people

Dropbox

Own Cloud

Mendely

Bio/ bioinfo tools: Entrez Direct (EDirect)...

NCBI databases: publication, sequence, structure, gene, variation, expression, etc. These can be accessed from a UNIX terminal window (by command-line), using EDirect.

#To install EDirect software (Jonathan Kans), following commands should be pasted into a terminal window. It needs perl and installs the folder in user home, configuring the PATH environment variable to allow execution of programs in that location.

    cd ~
  perl -MNet::FTP -e \
    '$ftp = new Net::FTP("ftp.ncbi.nlm.nih.gov", Passive => 1); $ftp->login;
     $ftp->binary; $ftp->get("/entrez/entrezdirect/edirect.zip");'
  unzip -u -q edirect.zip
  rm edirect.zip
  export PATH=$PATH:$HOME/edirect
  ./edirect/setup.sh

Usage e.g. esearch, elink, efilter, efetch, xtract, einfo, epost, nquire

esearch -db pubmed -query "opsin gene conversion"

esearch -db pubmed -query "opsin gene conversion" | elink -related

 esearch -db pubmed -query "opsin gene conversion" | \
 elink -related | \
 elink -target protein

Bio/ bioinfo tools: Galaxy...

http://usegalaxy.org has GATK, Varscan tools.

Sunday, July 30, 2017

IT/ framework: .NET/ASP.NET.......

.Net is a framework with web-related functionalities.(object-oriented).
.Net allows to write windows applications, web applications, web services .
Its compatible with many languages, including C#, C++, Visual Basic, JScript..
Mono is a version of .NET framework that includes C# compiler and runs on multiple OS.ASP.NET is a data-driven, web application framework to build dynamic web sites. (based on C# or Visual Basic.NET, JScript, J# language).
ASP.NET along with HTTP, enable browser-server bilateral communication. It encompasses many controls to generate HTML pages.
page state: user input information
session state: global information of a session
Tool to build ASP.NET application is the IDE Visual Studio.
#This IDE can be used for web services, desktop applications, mobile applications.
ASP.NET klife cycle: Application & Page

user------>browser------>web server------>response object
request page------>initialize------>load------>rendering------>unload

#Open the file from a browser to execute it

Saturday, July 29, 2017

Language: SQL (1) ...............

SQL: Structured Query Language
SQL is the best language for database query, deleting and updating entries.

SQLite SQL IDE
http://www.tutorialspoint.com/execute_sql_online.php
Script file extension is .sql
/* Comment */
STATEMENT keywords: SELECT, INSERT, UPDATE, DELETE, TRUNCATE, DESC, ALTER, DROP, CREATE, USE, SHOW, COMMIT, ROLLBACK, ...
STATEMENTS are not case-sensitive and they end with ; (STATEMENTS operate on tables, index, databases)
CLAUSES: distinct, where, and/or, in, between, like, order by, group by, count, having,
* for all selection
% to match zero or more characters
_ to match exact character
space for selected few

Use your schema (blueprint)
e.g. use mycobacteria;
----------------------------------
Basic syntax:

select column, from table, where condition is

SELECT column1, column2...........columnN

FROM table_name
WHERE [CONDITION|EXPRESSION];

e.g.SELECT column1, column2
FROM table1, table2
WHERE column2='value';
last_name= 'Patel'
e.g. SELECT book_id, title, description
FROM books
WHERE genre = 'novel';

e.g. SELECT book_id, title, pub_year,
CONCAT(author_first, ' ', author_last) AS author
    FROM books
JOIN authors USING(author_id)
WHERE author_last = 'Smith';
----------------------------------
Some examples:
SELECT Cust_No, First_Name FROM Customers WHERE Last_Name='Smith';
SELECT First_Name, Nickname FROM Friends WHERE Nickname LIKE '%brain%';
SELECT * FROM Friends WHERE First_Name LIKE '_en';

#dst, pacbio_snps, sanger_snps, discrepant_snps,  gene_list, lineage are tables
select * from discrepant_snps;
select * from gene_list;
select * from lineage;
select Miru  from lineage;
select Spoligo, Miru from lineage;
select *  from ncbi_snps;*/
select *  from ncbi_snps where gene like '%gyrase%';*/#to serach the pattern 'gyrase’;
select Isolate_id, lineage_name from lineage*/mv scr;
select * from tb_genes

select gene_name from tb_genes

select gene_end from tb_genes

select * from dst_original

select* from pacbio_snps

select * from dst_original where inh ='R' or inh = 'S'

select count(d.isolate_id) from dst_original d where isolate_id in (select s.Isolate_Id from sanger_sequenced_list s

where inhA_promoter='Y') and isolate_id in (select distinct isolate_id from seq_data) and d.inh in ('R','S')

select count(distinct s.isolate_id) from sanger_sequenced_list s, dst_original d where inh in ('R','S') and katG='Y'

select count(distinct s.isolate_id) from seq_data s, dst_original d where inh in ('R','S')

select * from true_ra_snps

Select distinct p.isolate_id from pacbio_snps p, dst d where p.isolate_id=d.isolate_id and d.fq='R'

select * from country;

SELECT count(distinct position) FROM pacbio_snps;

SELECT count(distinct position, mutation) FROM pacbio_snps;

SELECT count(*) FROM pacbio_snps WHERE position = 761110;

SELECT count(*)FROM pacbio_snps WHERE position = 761110 AND mutation='T';

SELECT count(*) FROM pacbio_snps WHERE position = 761110 AND mutation='G';

SELECT count(pacbio_snps.isolate_id) FROM pacbio_snps, dst WHERE pacbio_snps.isolate_id = dst.isolate_id AND dst.rif = 'R' AND pacbio_snps.position=761110;

SELECT position, count(p.isolate_id), rif from pacbio_snps p, dst d WHERE p.isolate_id = d.isolate_id AND position in max(count(position)) /*AND rif = 'R'*/

GROUP BY rif;

SELECT p.isolate_id #(select isolate)

FROM pacbio_snps p, dst d #(from table1 and table2)

WHERE p.isolate_id=d.isolate_id #(those satisfy the condition the isolates are same)

AND d.DRUG_NAME='R'; # (and are resistant)

select count(distinct s.isolate_id)

from sanger_sequenced_list s, dst_original d

where inh in ('R','S')

and inhA_promoter='Y';

select count(sanger_sequenced_list. isolate_id, pacbio_snps.isolate_id)

from sanger_sequenced_list, pacbio_snps, dst_original

where sanger_sequenced_list.isolate_id = dst_original.isolate_id

and pacbio_snps.isolate_id = dst_original.isolate_id

and inh is not null

select distinct position, #(select count(isolate_id)

from pacbio_snps_original p

where exists (select1

from dst d

where p.isolate_id=d.isolate_id

and rif = 'r')

and p.position = s.position

and position not in(761155)) num_res,

(select count(isolate_id)

From pacbio_snps_original p

where exists (select 1 from dst d where p.isolate_id=d.isolate_id and rif = 's')

and p.position = s.position)

num_sus (#no. susceptible)

from pacbio_snps_original s

order by num_sus, num_res desc

limit 50;
--------------------------------

BEGIN TRANSACTION;

/* Create a table PRACTICE */
CREATE TABLE PRACTICE (Id integer PRIMARY KEY, Name text);
/* Create records in the table made*/
INSERT INTO PRACTICE VALUES(1,'Rick');
INSERT INTO PRACTICE VALUES(2,'Alice');
INSERT INTO PRACTICE VALUES(3,'Judy');
INSERT INTO PRACTICE VALUES(4,'Nicole');
COMMIT;
/* Display all the records from the table */
SELECT * FROM PRACTICE;

1|Rick                                                                                        

2|Alice                                                                                       

3|Judy                                                                                        

4|Nicole                                                                                      

----------------------------------
BEGIN TRANSACTION;
CREATE TABLE SUMMER (Id integer PRIMARY KEY, Name text);
/* Create records in the table made*/
INSERT INTO SUMMER VALUES(1,'Mango');
INSERT INTO SUMMER VALUES(2,'Pineapple');
SELECT * FROM SUMMER;

SELECT Id integer
FROM SUMMER;

SELECT Name
FROM SUMMER;

SELECT DISTINCT ID integer, Name
FROM SUMMER;

SELECT *
FROM SUMMER
WHERE Name = 'Mango';
COMMIT;

1|Mango                                                                                       

2|Pineapple                                                                                   

1

2

Mango                                                                                         

Pineapple                                                                                     

1|Mango                                                                                       

2|Pineapple                                                                                   

1|Mango
----------------------------------
#If it says table exists, delete the pre-formed database and execute the script again.
BEGIN TRANSACTION;
/* Create a table CUSTOMERS */
CREATE TABLE CUSTOMERS (ID integer PRIMARY KEY, Name text, Age integer, Address text, Salary float);
/* Create records in the table made*/
INSERT INTO CUSTOMERS VALUES(1,'Rick', 34, 'Paris', 200.00);
INSERT INTO CUSTOMERS VALUES(2,'Alice', 28, 'Chicago', 500.00);
INSERT INTO CUSTOMERS VALUES(3,'Judy', 24, 'Seattle', 300.00);
INSERT INTO CUSTOMERS VALUES(4,'Nicole', 19, 'Fresno', 100.00);
INSERT INTO CUSTOMERS VALUES(5,'Kelly', 26, 'Boise', 150.00);
INSERT INTO CUSTOMERS VALUES(6,'Nicole', 37, 'Augusta', 250.00);
COMMIT;

/* Display all the records from the table */
SELECT * FROM CUSTOMERS;

1|Rick|34|Paris|200.0                                                                                    

2|Alice|28|Chicago|500.0                                                                                 

3|Judy|24|Seattle|300.0                                                                                  

4|Nicole|19|Fresno|100.0                                                                                 

5|Kelly|26|Boise|150.0                                                                                   

6|Nicole|37|Augusta|250.0                                                                                

SELECT * FROM CUSTOMERS WHERE SALARY = 100;
4|Nicole|19|Fresno|100.0

SELECT * FROM CUSTOMERS WHERE SALARY > 200;

2|Alice|28|Chicago|500.0                                                                                 

3|Judy|24|Seattle|300.0                                                                                  

6|Nicole|37|Augusta|250.0                                                                                

/* To perform any mathematical operation in any query */
SELECT numerical_expression as OPERATION_NAME
FROM table_name
WHERE CONDITION;
SELECT (15 + 6) AS ADDITION
FROM CUSTOMERS
WHERE Age = 28;
21
SELECT COUNT(*) AS "RECORDS" FROM CUSTOMERS;
6
SELECT CURRENT_TIMESTAMP;
2015-12-05 06:27:20

Database creation and table insertion into it (yet to make it work)
CREATE DATABASE testDB;
CREATE DATABASE bioPedia;
SHOW DATABASES;
DROP DATABASE testDB;
SHOW DATABASES;
USE bioPedia;
CREATE TABLE EMPLOYEES(
ID INT NOT NULL,
NAME VARCHAR (20) NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25) ,
SALARY DECIMAL (18, 2),
PRIMARY KEY (ID)
);
DESC EMPLOYEES;
/*DROP TABLE EMPLOYEES;*/
DESC EMPLOYEES;

INSERT INTO EMPLOYEES (ID,NAME,AGE,ADDRESS,SALARY)
VALUES (1, 'Cindy', 37, 'CA', 700.00 );

INSERT INTO EMPLOYEES (ID,NAME,AGE,ADDRESS,SALARY)
VALUES (2, 'Ryan', 42, 'MT', 1200.00 );

INSERT INTO EMPLOYEES (ID,NAME,AGE,ADDRESS,SALARY)
VALUES (3, 'Rob', 60, 'MA', 600.00 );
-------------------------------------------------------
Database creation..............
mysql -u root -p
mysql> CREATE DATABASE testdb;
mysql> GRANT ALL PRIVILEGES ON testdb.* TO Seema@localhost IDENTIFIED BY 'pass';
$ mysql -u Seema -p
mysql> SHOW DATABASES;

-------------------------------------------------------

#Printing server version and current database name
mysql> SELECT VERSION(), DATABASE();
mysql> use database_name;
#Select all names except Smith
mysql> SELECT * FROM database_name WHERE user !=Smith;
#Use of AND and OR operators
mysql> SELECT * FROM database_name WHERE user = Carl AND root = Mike
mysql> SELECT * FROM database_name WHERE user != Carl OR root != Mike
#To see only certain rows from a result set
mysql> SELECT * FROM database_name LIMIT 1;
mysql> SELECT * FROM database_name LIMIT 5;
mysql> SELECT * FROM database_name ORDER BY column_name LIMIT 1;
mysql> SELECT CURRENT_DATE();
#To export tables as an XML file in MySQL
mysql -u USER_NAME –xml -e 'SELECT * FROM table_name' > table_name.xml

mysql -u root -p
MariaDB [(none)] > CREATE DATABASE my_db;

MariaDB is RDBMS and it is one of the most popular database servers
#Install
sudo apt install default-mysql-
sudo apt install mariadb-server

#Check installation
mysql -u root -p
#You will greeted with MariaDB mysql prompt. You can type sql commands to see databases, version and much more:
MariaDB [(none)]> show databases;
MariaDB [(none)]> exit
-------------------------------------------
#Connect to mysql database of ucsc
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A

#Extract Similar Genes From Ucsc Via Mysql Command
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'select * from knownGene' -N | awk -f script.awk
#script.awk
BEGIN {
FS="\t";
}
{
split($9,exonStarts,",");
split($10,exonEnds,",");
geneSize=1.0*int($5)-int($4);
exonCount=int($8);
if(exonCount<2)
{
next;
}
if($3=="+")
{
printf("%f\t%s\n",(exonStarts[2]-exonEnds[1])/geneSize,$0);
}
else
{
printf("%f\t%s\n",(exonStarts[exonCount]-exonEnds[exonCount-1])/geneSize,$0);
}
}
#Download genome annotations from UCSC's MySQL database
echo -e "chr1\t10000\t20000\nchr1\t30000\t40000" |\
awk -F '\t' '{printf("select \"%s\",\"%s\",\"%s\", G.name,G.txStart,G.txEnd from refGene as G where chrom=\"%s\" and not(%s>txEnd or %s<txStart);\n",$1,$2,$3,$1,$2,$3);}' |\
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -D hg38

Biomed&Informatics