Gene set enrichment analysis (GSEA) (also functional enrichment analysis) is a method to identify genes or proteins, those are biologically annotated and are part of signaling pathways, that are over-represented in a large set of genes or proteins, and may have an association with disease phenotypes. GSEA evaluates microarray data at the level of gene sets. The gene sets are defined based on prior biological knowledge, e.g., published information about biochemical pathways or coexpression in previous experiments. GSEA determines whether an a priori defined set of genes (genes encoding products in a metabolic pathway, located in the same cytogenetic band, or sharing the same GO category) shows differences between two biological states
GO annotations, PANTHER Classification System, Molecular Signatures Database (MSigDB), GOrilla, Kyoto Encyclopedia of Genes and Genomes (KEGG)are useful for enrichment analysis
type the names of the genes to be analyzed
calculate the enrichment (for biological process, molecular function, or cellular component)
Background frequency is the number of genes annotated to a GO term, while sample frequency is the number of genes annotated to that GO term in the input list
If the input list contains 20 genes and the enrichment is done for biological process for M. bacterium, where background set contains 4000 genes, then if 10 out of the 20 input genes are annotated to the GO term: DNA repair, then the sample frequency for DNA repair will be 10/20
P-value is the probability or chance of seeing at least x number of genes out of the total n genes in the list annotated to a particular GO term
A knowledge-based approach for interpreting genome-wide expression profiles
it studies groups of genes that share common biological function, chromosomal location, or regulation
High scoring gene sets can be grouped on the basis of leading-edge subsets of genes that they share
P value can be estimated by permuting the genes. It can be useful for hypothesis generation
A ranking metric is important
#Application of GSEA
to identify functional gene sets correlated with p53 status
GSEA method is implemented using 64-bit MATLAB R2016a programming environment
computational experiment is based on the collection of 28 microarray data sets, where to each of them a target pathway is referred
The publicly available microarray data sets from two Bioconductor packages were used
Boxplots of gene set analysis
--------------------------
Network analysis: iGraph
GO annotations, PANTHER Classification System, Molecular Signatures Database (MSigDB), GOrilla, Kyoto Encyclopedia of Genes and Genomes (KEGG)are useful for enrichment analysis
type the names of the genes to be analyzed
calculate the enrichment (for biological process, molecular function, or cellular component)
Background frequency is the number of genes annotated to a GO term, while sample frequency is the number of genes annotated to that GO term in the input list
If the input list contains 20 genes and the enrichment is done for biological process for M. bacterium, where background set contains 4000 genes, then if 10 out of the 20 input genes are annotated to the GO term: DNA repair, then the sample frequency for DNA repair will be 10/20
P-value is the probability or chance of seeing at least x number of genes out of the total n genes in the list annotated to a particular GO term
A knowledge-based approach for interpreting genome-wide expression profiles
it studies groups of genes that share common biological function, chromosomal location, or regulation
High scoring gene sets can be grouped on the basis of leading-edge subsets of genes that they share
P value can be estimated by permuting the genes. It can be useful for hypothesis generation
A ranking metric is important
#Application of GSEA
to identify functional gene sets correlated with p53 status
computational experiment is based on the collection of 28 microarray data sets, where to each of them a target pathway is referred
The publicly available microarray data sets from two Bioconductor packages were used
Boxplots of gene set analysis
--------------------------
Network analysis: iGraph
No comments:
Post a Comment