Following are some examples of gene sets that can be explored with GeneAnalytics:
For detailed examples, view our case studies.
Currently, GeneAnalytics is recommended for the analysis of gene sets that contain 300 or fewer genes. Analyzing longer lists may yield biased results, with over-representation of entities that contain higher number of genes.
If you insert a gene set with more than 300 genes, you will be asked whether you want to proceed with your long set, or to trim the list to 300 genes. If you choose to trim the list, the first 300 genes will be used (duplicate genes will be removed automatically).
You can insert your gene symbols by either:
GeneAnalytics does not enable data normalization and calculation of differentially expressed genes.
However, we offer a service for identifying differentially expressed genes from microarray experiments. For more information, please contact us.
Unidentified genes are those which were not recognized as official mouse or human gene symbols.
GeneAnalytics identifies tissues, cells, diseases, molecular pathways, GO biological process and molecular function and compounds that match the query gene set.
The full list of matched genes can be viewed by clicking on the number of matched genes for each entity in the table. The list of matched genes in the pop up window can be used for a new GeneAnalytics query.
The GeneAnalytics analysis is based on proprietary, comprehensive and organized databases which are part of the LifeMap suite.
The following table summarizes the data sources for each section in GeneAnalytics. For more details, see Resources and statistics.
Section | Underlying databases |
---|---|
Tissues & cells | LifeMap Discovery®-the embryonic development and stem cells compendium |
Diseases | Malacards-the human malady compendium |
Pathways | PathCards-the human biological pathway unification |
GO terms | GeneCards®-the human gene compendium |
Compounds | GeneCards-the human gene compendium |
Individual entity scores are classified as high, medium or low. This classification is indicated by the color of the score bar (dark green, light green or beige, respectively). The distribution of the scores can be viewed by clicking on the pie icon.
In the detailed results table, clicking on the entity (e.g., cell, tissue, disease) name reverts you to the entity card in the underlying database. Specifically, you will be directed to the gene section in the relevant card.
In addition, the full list of matched genes can be viewed by clicking on the ‘number of matched genes’ for each entity in the table. In the tissues & cells results, the matched gene list includes an indication whether this information is derived directly from scientific manuscripts, high throughput gene expression comparisons, or large scale datasets. In the diseases section, the matched gene lists includes an indication for the gene-disease relations.
In the tissues & cells section, the matching algorithm relies on gene annotations of each gene in each entity available in LifeMap Discovery®. These annotations are derived from the scientific literature and/or on bioinformatics calculations executed on expression data in LifeMap Discovery.
Each gene can have one or more of the following annotations:
Each gene in each entity is scored, based on both the entity type and the combination of the above mentioned annotations of that particular gene in the specific entity.
Query size | Entity score | Tissue/system score |
---|---|---|
1 gene | =Gene score | The maximal gene score among all genes in the tissue/system. |
>1 gene | A weighted sum of the scores of all genes matched to this entity, normalized to log of the maximal score that can be potentially achieved for the entity | A weighted sum of the scores of all genes matched to this tissue/system, normalized to log of the maximal score that can be potentially achieved for the tissue/system |
The matching score is calculated at three levels: the entity, the tissue and the system. The calculation procedure is different between a single-gene query and larger queries:
All matched genes are presented in descending order of their score. If several matches have the same score, they are ordered by the ratio of matched to total number of genes in the entity (from highest to lowest). In single-gene queries, in vivo entities will appear before in vitro entities with the same score.
The tissues and cells section in GeneAnalytics contains four powerful filters which enable you to focus on specific sub-sets of the results:
1. Tissue/system:
2. 'Expressed in’: is used to filter for specific entity types:
3. In vivo/In vitro filter
4. Prenatal/postnatal filter
The matching score is positively affected by higher numbers of matched genes, but is also significantly influenced by three additional parameters:
The organ/tissue column on the right side of the detailed result table indicates the organs, tissues, developmental paths and cell classes to which the matched entities belong.
The tissues in the tissue/system filter serve to filter the detailed result table, to only show the matches from the selected tissue(s). In addition, each tissue receives its own match score.
In the tissue/system filter, ‘tissues’ is a more general term that refers to all organs, tissues and cell classes defined in LifeMap Discovery.
To receive a list of all genes expressed in a specific tissue, organ or developmental path, including annotations for selective markers, specific genes and tissue-enriched genes, please contact us.
The disease matching score is determined using the following parameters:
For each gene, the maximal score of all the above mentioned possible scores is used as the final gene score. The final disease score is based on the final scores of all the matched genes.
All matched diseases are presented in descending order of their score. If several matches have the same score, they are ordered by the ratio of matched to total number of genes in the disease (from highest to lowest).
The disease section in GeneAnalytics contains four powerful filters which enable you to focus on specific sub-sets of the results:
Gene-disease relations: filtration of diseases that have either differentially expressed genes or genes with genetic association to the disease. Read more on gene-disease relations.
Genetic association type: filtration of diseases that have genes with genetic association to the disease.
Disease categories: filtration of diseases in accordance with the MalaCards disease categorization. Read more about disease categorization.
GeneAnalytics disease results are derived from information available in Malacards. The Malacards algorithm categorizes each disease into up to four anatomical categories and/or up to five global categories, based on:
Please note that not all the diseases in the detailed results table are categorized.
This binomial distribution is used to test the null hypothesis that the user’s input genes are not over-represented within any SuperPath, GO term or compound in the databases from which GeneAnalytics extracts its data (See Resources and statistics).
The presented score is a transformation (-log2) of the resulting p-value, where higher scores indicate better matches.
The score range is divided into three levels of quality, based on the p-value corrected for the multiple comparisons (using the false discovery rate method):
High: corrected p-value smaller or equal to 0.0001.
Medium: corrected p-value higher than 0.0001 but smaller or equal to 0.05.
Low: corrected p-value higher than 0.05.
The score is presented by a score bar whose color indicates the match quality: dark green for high, light green for medium and beige for low. This graphic visualization of the score enables the user to evaluate the overall quality of the results.
Results are ranked in descending order of their score. If several matches have the same score, they are ordered by the ratio of matched to total number of genes in the entity (from highest to lowest).
The detailed result tables in the pathways/ GO terms and compounds sections present all the results of which the score was derived from a p-value (after correction for multiple comparisons) smaller or equal to 1.
If less than 20 results pass this threshold, the best 20 will be displayed results, even though they have lower statistical significance.
A SuperPath clusters one or multiple pathways from various PathCards data sources, based on similarity in their associated genes.
In GeneAnalytics, the SuperPaths are presented with links to their cards in PathCards, the list of their constituent pathways and the number of matched genes versus their total number of genes.
(Read more about pathway analysis in PathCard).
The Pathways section includes a filter that enables viewing results derived from a specific data source only.
Note that for pathways originating from Reactome, the matched genes are highlighted in the original source pathway illustration.
The compounds section in GeneAnalytics is the only section which does not yet rely on a single unique database that unifies multiple compound sources and provides one web page for each compound (in contrast: tissues and cells data are unified in LifeMap Discovery, diseases are unified in MalaCards, pathways are unified in PathCards and GO terms in the GeneOntology database). The GeneAnalytics Compounds section takes advantage of multiple sources which relate to more than 83,000 compounds, including those found in GeneCards® (for more information about the compounds data sources click here).
The Novoseek data source extracts knowledge from biological databases and text repositories, providing relationships between chemical compounds and genes based on scoring algorithm running on Pubmed articles. Note that the Novoseek website is no longer available. Read more about Novoseek data in GeneCards and about its literature-text mining algorithm.
We have applied a unification process which seeks out similar compounds described in different data sources, to enable gene aggregation for unified compounds, and to avoid redundancy in the resulting compounds list. Compounds unification is established by an identical name and/or combination of other identifiers as: CAS number, PubChem ID and synonyms. Unified compounds are shown with links to all relevant data sources (the exact compound name is shown near the original data source name).
Metabolites unification: the following compound families contain thousands of metabolites which were unified based on their primary name and associated genes.
If genes associated with these compounds are matched to your gene set, GeneAnalytics presents only the matched group, to avoid a multitude of identical results. The evidence link enables viewing all the relevant metabolites in the original database. The unified compounds and their specific groups are as following:
1. Triglycerides
Group name | # of associated genes | # of metabolites in the group |
---|---|---|
Triglycerides group A | 26 | 170 |
Triglycerides group B | 30 | 113 |
Triglycerides group C | 39 | 6 |
Triglycerides group D | 34 | 13631 |
2. Diglycerides
Group name | # of associated genes | # of metabolites in the group |
---|---|---|
Diglycerides group A | 130 | 803 |
Diglycerides group B | 131 | 39 |
Diglycerides group C | 131 | 1 |
Diglycerides group D | 115 | 435 |
3. Phosphatidylcholines
Group name | # of associated genes | # of metabolites in the group |
---|---|---|
Phosphatidylcholines group A | 78 | 955 |
Phosphatidylcholines group B | 72 | 119 |
Phosphatidylcholines group C | 44 | 73 |
4. Phosphatidylethanolamines
Group name | # of associated genes | # of triglycerides in the group |
---|---|---|
Phosphatidylethanolamines group A | 43 | 959 |
Phosphatidylethanolamines group B | 30 | 114 |
If you want to cite GeneAnalytics in a journal article or on-line publication, please note the URL geneanalytics.genecards.org and cite the relevant publication below.
Ben-Ari Fuchs S, Lieder I, Stelzer G, Mazor Y, Buzhor E, Kaplan S, Bogoch Y, Plaschkes I, Shitrit A, Rappaport N, Kohn A, Edgar R, Shenhav L, Safran M, Lancet D, Guan-Golan Y, Warshawsky D, and Strichman R. GeneAnalytics: An integrative gene set analysis tool, OMICS(2016) [PDF]