FAQs | FAQs | GeneAnalytics

What kinds of gene sets can be used as a query in GeneAnalytics?

Following are some examples of gene sets that can be explored with GeneAnalytics:

Differentially expressed genes identified by microarray experiment, RNA sequencing, real-time PCR or any other molecular method
Gene sets that contain genomic variants known to be related to a specific disease or group of diseases
Genes that encode protein targets of a specific drug
Genes encoding proteins known to be a part of a specific molecular pathway or process

For detailed examples, view our case studies.

What is required of the gene query in GeneAnalytics?

Official gene symbols only!
GeneAnalytics is designed to identify mouse and human gene symbols. The user must indicate the query species to enable the identification procedure but this indication does not impact the results.
Currently, GeneAnalytics is recommended for the analysis of gene sets that contain 300 or fewer genes. Analyzing longer lists may yield biased results, with over-representation of entities that contain higher number of genes.

If you insert a gene set with more than 300 genes, you will be asked whether you want to proceed with your long set, or to trim the list to 300 genes. If you choose to trim the list, the first 300 genes will be used (duplicate genes will be removed automatically).

How do I submit a query to GeneAnalytics?

You can insert your gene symbols by either:

Typing in the gene(s) symbol in the input window, one gene at a time. Use the auto-complete feature to define the correct official gene symbol.
Pasting in a list of genes into the input window. The pasted genes automatically undergo an identification procedure.
Uploading a file containing the gene list. Only text files are accepted. The uploaded genes automatically undergo an identification procedure.

Which source species can be analyzed by GeneAnalytics?

GeneAnalytics is designed to identify mouse and human gene symbols. The user must indicate the query species to enable the identification procedure. However, the query species does not impact the results:
- The expression matching algorithm is performed on all the genes available in LifeMap Discovery®, regardless of their species (human, mouse, rat, chicken, pig).
- The matching algorithm used in the diseases, pathways, GO terms and compounds sections converts all gene symbols into human gene symbols.
Gene symbols from another species can be queried, but GeneAnalytics will identify them according to the official mouse or human gene symbols.
Please note that changing the input species after inserting genes will automatically result in a new identification process.

Can I use GeneAnalytics to normalize my data and reveal differentially expressed genes?

GeneAnalytics does not enable data normalization and calculation of differentially expressed genes.

However, we offer a service for identifying differentially expressed genes from microarray experiments. For more information, please contact us.

What are unidentified genes?

Unidentified genes are those which were not recognized as official mouse or human gene symbols.

How can I correct my unidentified genes?

Unidentified genes are genes that were not recognized as official human or mouse gene symbols.
Note: unidentified genes are excluded from the analysis and do not impact its results.
To correct unidentified gene symbols perform one of the following:

Click the ‘edit’ icon and start typing in your postulated gene symbol. Use the auto-complete function to identify the correct symbol.
Click the ‘search in GeneCards’ icon . Then, copy the correct symbol from GeneCards® into GeneAnalytics.
Try to switch the selected input species from human to mouse or vice versa. Note that changing the input species after inserting gene symbols will lead to a new identification process.

How can I optimize my input gene set?

Try to maximize the number of identified genes in your query by searching for the official gene symbols for your unidentified genes (see: How can I correct my unidentified genes?).
In the results page, examine the ‘notes’ in the ‘analyzed genes’ panel. Consider removing abundant and housekeeping genes from your query, particularly if you are interested in expression analysis, since including such genes can result in nonspecific results).

What kinds of matching results are supplied by GeneAnalytics?

GeneAnalytics identifies tissues, cells, diseases, molecular pathways, GO biological process and molecular function and compounds that match the query gene set.

How can I view the matched genes?

The full list of matched genes can be viewed by clicking on the number of matched genes for each entity in the table. The list of matched genes in the pop up window can be used for a new GeneAnalytics query.

What are the GeneAnalytics data sources?

The GeneAnalytics analysis is based on proprietary, comprehensive and organized databases which are part of the LifeMap suite.

The following table summarizes the data sources for each section in GeneAnalytics. For more details, see Resources and statistics.

Section	Underlying databases
Tissues & cells	LifeMap Discovery®-the embryonic development and stem cells compendium
Diseases	Malacards-the human malady compendium
Pathways	PathCards-the human biological pathway unification
GO terms	GeneCards®-the human gene compendium
Compounds	GeneCards-the human gene compendium

How can I evaluate the match quality?

Individual entity scores are classified as high, medium or low. This classification is indicated by the color of the score bar (dark green, light green or beige, respectively). The distribution of the scores can be viewed by clicking on the pie icon.

In the tissues & cells and the diseases sections, the classification of the matching scores is based upon an empirical examination of many test cases, and is different for each query size.
In the pathways, GO terms and compounds sections, the score is a transformation of the binomial distribution p-value and the score range is divided into three quality levels, according to the p-value they are derived from (after correction for multiple comparisons), as following:

High: corrected p-value smaller or equal to 0.05.
Medium: corrected p-value higher than 0.05 but smaller or equal to 1.
Low: corrected p-value higher than 1.

How can I view the evidence supporting the matching results?

In the detailed results table, clicking on the entity (e.g., cell, tissue, disease) name reverts you to the entity card in the underlying database. Specifically, you will be directed to the gene section in the relevant card.

In addition, the full list of matched genes can be viewed by clicking on the ‘number of matched genes’ for each entity in the table. In the tissues & cells results, the matched gene list includes an indication whether this information is derived directly from scientific manuscripts, high throughput gene expression comparisons, or large scale datasets. In the diseases section, the matched gene lists includes an indication for the gene-disease relations.

How are the tissues and cells results scored and ranked?

In the tissues & cells section, the matching algorithm relies on gene annotations of each gene in each entity available in LifeMap Discovery®. These annotations are derived from the scientific literature and/or on bioinformatics calculations executed on expression data in LifeMap Discovery.

Each gene can have one or more of the following annotations:

Specific gene: a gene that is only expressed in a few tissues/organs
Enriched gene: a gene that is expressed in many entities of the same tissue/organ
Selective gene: an established cell-specific marker or a gene suggested to be characteristic of the cell
Expressed gene: a gene known to be expressed in the entity, but that is not defined as a selective cell marker
Abundant gene: a gene that is expressed in a large number of organs and tissues
Housekeeping gene: a gene that appears in a list of housekeeping genes established by integrating information from several studies:
A low confidence level gene: a gene for which expression evidence originates from the analysis of a large scale dataset but lacks strong supporting evidence (for example, appears in a small number of cells in a specific organ).

Each gene in each entity is scored, based on both the entity type and the combination of the above mentioned annotations of that particular gene in the specific entity.

Query size	Entity score	Tissue/system score
1 gene	=Gene score	The maximal gene score among all genes in the tissue/system.
>1 gene	A weighted sum of the scores of all genes matched to this entity, normalized to log of the maximal score that can be potentially achieved for the entity	A weighted sum of the scores of all genes matched to this tissue/system, normalized to log of the maximal score that can be potentially achieved for the tissue/system

The matching score is calculated at three levels: the entity, the tissue and the system. The calculation procedure is different between a single-gene query and larger queries:

All matched genes are presented in descending order of their score. If several matches have the same score, they are ordered by the ratio of matched to total number of genes in the entity (from highest to lowest). In single-gene queries, in vivo entities will appear before in vitro entities with the same score.

How can I focus my results?

The tissues and cells section in GeneAnalytics contains four powerful filters which enable you to focus on specific sub-sets of the results:

1. Tissue/system:

The tissue/system results panel is available in the upper left part of the results screen, and also serves as a filter for the detailed results.
Note, that in this panel, ‘Tissues’ are all organs, tissues and cell classes defined in LifeMap Discovery.
Entities that are not categorized into a specific tissue or system (e.g., fibroblasts) are presented as “uncategorized” tissues/systems, which appear at the end of the tissues/systems list, with no score.

2. 'Expressed in’: is used to filter for specific entity types:

Cells (in vivo and in vitro)
Compartments (anatomical compartments)
Organs and tissues
High throughput comparisons (and large scale dataset samples)

3. In vivo/In vitro filter

4. Prenatal/postnatal filter

Why do results with a smaller number of matched genes sometimes get high scores?

The matching score is positively affected by higher numbers of matched genes, but is also significantly influenced by three additional parameters:

The specific annotations of the matched genes (see annotation details). ‘Specific’, ‘enriched’ and ‘selective’ annotations increase the gene score, while ‘abundant’ and’ housekeeping’ annotations decrease it.
The total number of genes in the entity: there is a negative effect of a high total number of genes in the entity on the match score.
The entity type: genes in large scale data set samples have lower scores when compared to genes in other entities.

What is the difference between the organ/tissue column in the detailed result table, and the tissue/system filter?

The organ/tissue column on the right side of the detailed result table indicates the organs, tissues, developmental paths and cell classes to which the matched entities belong.

The tissues in the tissue/system filter serve to filter the detailed result table, to only show the matches from the selected tissue(s). In addition, each tissue receives its own match score.

In the tissue/system filter, ‘tissues’ is a more general term that refers to all organs, tissues and cell classes defined in LifeMap Discovery.

How can I identify cell markers and genes that are specific or enriched within a specific organ/tissue?

You can view annotations for selective cell markers by clicking on the number of matched genes for a specific cell in the results table. Matched genes that serve as cellular markers are indicated by the following icon: .
To view genes in your query that are annotated as abundant or housekeeping genes, go to the ‘Analyzed genes’ panel at the top of the results page, and click on . Abundant genes are genes that appear in a large number of organs or tissues in LifeMap Discovery (for more information about LifeMap Discovery entities, click here). Housekeeping genes are genes contained within a list that was established by integrating information from several studies aimed to identify human housekeeping genes.
The LifeMap Discovery database contains more gene annotations which are not yet presented to the user but are used for the GeneAnalytics calculations:
- Specific gene: a gene that is specific to or is expressed in only a few organs or tissues.
- Enriched gene: a gene that is expressed in many entities of the same organ or tissue.

To receive a list of all genes expressed in a specific tissue, organ or developmental path, including annotations for selective markers, specific genes and tissue-enriched genes, please contact us.

How are the disease results scored?

The disease matching score is determined using the following parameters:

The number of genes in your query that match a specific disease normalized by the total number of genes specifically associated with the disease.
The quality and type of the gene-disease relation. These relations are determined from MalaCards sources and include the following relation types:

Differentially expressed (DE) genes are genes that were found to be significantly up- or downregulated in the diseased tissues in comparison to identical tissues obtained from unaffected subjects. The disease card contains up to 200 up- and /or 200 downregulated genes, calculated by a LifeMap Discovery algorithm, derived from high throughput experiments extracted from the Gene Expression omnibus (GEO) or from the literature. Each gene is tagged with a differential expression score determined by the fold-change in expression of the gene in the disease vs. the normal tissue.
Genetic associations to diseases are determined from several MalaCards data sources. Since each data source has its own annotation terminology, we categorized all possible associations as shown in the table, in descending order of their score.
The “GeneCards-inferred” relation indicates that the disease name appears in the gene page in GeneCards. Of note, the ‘GeneCards-inferred’ relation does not imply causality between the gene and the disease and the nature of this relation can sometimes be unclear. For example, the gene can be indicated as ‘unaffected’ in the disease. Similarly, if a gene-disease relation is based on a ‘publication’, it means that the gene and the disease were mentioned in the title and/or abstract of the same publication. There are two score levels for genes with GeneCards-inferred relations: a gene that appears in three or more different sections in GeneCards or a gene that has a higher score than a gene that appears in only one or two sections.

For each gene, the maximal score of all the above mentioned possible scores is used as the final gene score. The final disease score is based on the final scores of all the matched genes.

All matched diseases are presented in descending order of their score. If several matches have the same score, they are ordered by the ratio of matched to total number of genes in the disease (from highest to lowest).

How can I focus my results?

The disease section in GeneAnalytics contains four powerful filters which enable you to focus on specific sub-sets of the results:

Gene-disease relations: filtration of diseases that have either differentially expressed genes or genes with genetic association to the disease. Read more on gene-disease relations.

Genetic association type: filtration of diseases that have genes with genetic association to the disease.

Disease categories: filtration of diseases in accordance with the MalaCards disease categorization. Read more about disease categorization.

How are the diseases categorized?

GeneAnalytics disease results are derived from information available in Malacards. The Malacards algorithm categorizes each disease into up to four anatomical categories and/or up to five global categories, based on:

Existing and widely used disease categorization systems, like ICD10 and Orphanet.
Identification of category-specific keywords contained in disease names and annotations.

Please note that not all the diseases in the detailed results table are categorized.

How are the pathways/GO terms/compounds results scored and ranked?

This binomial distribution is used to test the null hypothesis that the user’s input genes are not over-represented within any SuperPath, GO term or compound in the databases from which GeneAnalytics extracts its data (See Resources and statistics).

The presented score is a transformation (-log₂) of the resulting p-value, where higher scores indicate better matches.

The score range is divided into three levels of quality, based on the p-value corrected for the multiple comparisons (using the false discovery rate method):

High: corrected p-value smaller or equal to 0.0001.

Medium: corrected p-value higher than 0.0001 but smaller or equal to 0.05.

Low: corrected p-value higher than 0.05.

The score is presented by a score bar whose color indicates the match quality: dark green for high, light green for medium and beige for low. This graphic visualization of the score enables the user to evaluate the overall quality of the results.

Results are ranked in descending order of their score. If several matches have the same score, they are ordered by the ratio of matched to total number of genes in the entity (from highest to lowest).

What is the threshold score for showing results in the detailed result tables?

The detailed result tables in the pathways/ GO terms and compounds sections present all the results of which the score was derived from a p-value (after correction for multiple comparisons) smaller or equal to 1.

If less than 20 results pass this threshold, the best 20 will be displayed results, even though they have lower statistical significance.

What is a SuperPath?

A SuperPath clusters one or multiple pathways from various PathCards data sources, based on similarity in their associated genes.

In GeneAnalytics, the SuperPaths are presented with links to their cards in PathCards, the list of their constituent pathways and the number of matched genes versus their total number of genes.

(Read more about pathway analysis in PathCard).

How can I focus the pathways results?

The Pathways section includes a filter that enables viewing results derived from a specific data source only.

Note that for pathways originating from Reactome, the matched genes are highlighted in the original source pathway illustration.

How are the compounds unified from multiple data sources?

The compounds section in GeneAnalytics is the only section which does not yet rely on a single unique database that unifies multiple compound sources and provides one web page for each compound (in contrast: tissues and cells data are unified in LifeMap Discovery, diseases are unified in MalaCards, pathways are unified in PathCards and GO terms in the GeneOntology database). The GeneAnalytics Compounds section takes advantage of multiple sources which relate to more than 83,000 compounds, including those found in GeneCards® (for more information about the compounds data sources click here).

The Novoseek data source extracts knowledge from biological databases and text repositories, providing relationships between chemical compounds and genes based on scoring algorithm running on Pubmed articles. Note that the Novoseek website is no longer available. Read more about Novoseek data in GeneCards and about its literature-text mining algorithm.

We have applied a unification process which seeks out similar compounds described in different data sources, to enable gene aggregation for unified compounds, and to avoid redundancy in the resulting compounds list. Compounds unification is established by an identical name and/or combination of other identifiers as: CAS number, PubChem ID and synonyms. Unified compounds are shown with links to all relevant data sources (the exact compound name is shown near the original data source name).

Metabolites unification: the following compound families contain thousands of metabolites which were unified based on their primary name and associated genes.

If genes associated with these compounds are matched to your gene set, GeneAnalytics presents only the matched group, to avoid a multitude of identical results. The evidence link enables viewing all the relevant metabolites in the original database. The unified compounds and their specific groups are as following:

1. Triglycerides

Group name	# of associated genes	# of metabolites in the group
Triglycerides group A	26	170
Triglycerides group B	30	113
Triglycerides group C	39	6
Triglycerides group D	34	13631

2. Diglycerides

Group name	# of associated genes	# of metabolites in the group
Diglycerides group A	130	803
Diglycerides group B	131	39
Diglycerides group C	131	1
Diglycerides group D	115	435

3. Phosphatidylcholines

Group name	# of associated genes	# of metabolites in the group
Phosphatidylcholines group A	78	955
Phosphatidylcholines group B	72	119
Phosphatidylcholines group C	44	73

4. Phosphatidylethanolamines

Group name	# of associated genes	# of triglycerides in the group
Phosphatidylethanolamines group A	43	959
Phosphatidylethanolamines group B	30	114

Citing us

If you want to cite GeneAnalytics in a journal article or on-line publication, please note the URL geneanalytics.genecards.org and cite the relevant publication below.

Publications

Ben-Ari Fuchs S, Lieder I, Stelzer G, Mazor Y, Buzhor E, Kaplan S, Bogoch Y, Plaschkes I, Shitrit A, Rappaport N, Kohn A, Edgar R, Shenhav L, Safran M, Lancet D, Guan-Golan Y, Warshawsky D, and Strichman R. GeneAnalytics: An integrative gene set analysis tool, OMICS(2016) [PDF]

Frequently Asked Questions

What kinds of gene sets can be used as a query in GeneAnalytics?

What is required of the gene query in GeneAnalytics?

How do I submit a query to GeneAnalytics?

Which source species can be analyzed by GeneAnalytics?

Can I use GeneAnalytics to normalize my data and reveal differentially expressed genes?

What are unidentified genes?

How can I correct my unidentified genes?

How can I optimize my input gene set?

What kinds of matching results are supplied by GeneAnalytics?

How can I view the matched genes?

What are the GeneAnalytics data sources?

How can I evaluate the match quality?

How can I view the evidence supporting the matching results?

How are the tissues and cells results scored and ranked?

How can I focus my results?

Why do results with a smaller number of matched genes sometimes get high scores?

What is the difference between the organ/tissue column in the detailed result table, and the tissue/system filter?

How can I identify cell markers and genes that are specific or enriched within a specific organ/tissue?

How are the disease results scored?

How can I focus my results?

How are the diseases categorized?

How are the pathways/GO terms/compounds results scored and ranked?

What is the threshold score for showing results in the detailed result tables?

What is a SuperPath?

How can I focus the pathways results?

How are the compounds unified from multiple data sources?

Citing us

Publications

Start analyzing your gene sets