User Guide

Input

Select Input Species

You must indicate the input species before inserting your gene set. This information is only required in order to identify your gene symbols and their orthologs.

The matching algorithm considers genes and gene orthologs, and differs between the distinct sections:

The tissues and cells matching algorithm considers all the genes available in
LifeMap Discovery®, regardless of their species (human, mouse, rat, chicken, pig).
The matching algorithm on all other sections converts all gene symbols into human gene symbols.

Please note that changing the input species after inserting gene symbols will activate a new identification process.

Enter Gene Symbol

GeneAnalytics identifies official human and mouse gene symbols only.

Currently, GeneAnalytics is recommended for the analysis of gene sets that contain 300 or fewer genes. Analyzing longer lists may yield biased results, with over-representation of entities that contain higher number of genes.

If you insert a gene set with more than 300 genes, you will be asked whether you want to proceed with your long set, or to trim the list to 300 genes. If you choose to trim the list, the first 300 genes will be used (duplicate genes will be removed automatically).

You can insert your gene symbols by either:

1. Typing in the gene symbol(s) in the input window, one gene at a time. Use the auto-complete feature to define the correct official gene symbol.

2. Pasting a list of genes into the input window. The pasted genes automatically undergo an identification procedure.

3. Uploading a file containing the gene list. Only text files are accepted. The uploaded genes automatically undergo an identification procedure.

Your Gene List

Unidentified genes:

Unidentified genes are genes that were not recognized as official human or mouse gene symbols.
Unidentified genes are not included in the analysis and do not impact its results.

To correct an unidentified gene:

Click the ‘edit’ icon and start typing in your postulated gene symbol. Use the auto-complete function to identify the correct symbol.
Click the ‘search in GeneCards’ icon . Then, copy the correct symbol into GeneAnalytics.
Try to switch the selected input species from human to mouse or vice versa. Note that changing the input species after inserting genes will activate a new identification process.

Following the gene symbol correction, the identified genes will be automatically added to the “ready for analysis” gene list.

Ready for analysis:

Only the genes included in this list will be analyzed.

Each gene in the ‘ready for analysis’ list, is shown with its official symbol, full name and all available aliases/synonyms.

In order to edit gene symbols in this list, delete the gene symbol and re-type your desired gene symbol in the input box above.

Results

Analyzed Genes

This section presents all the queried genes that were identified and included in the analysis.

Click on ‘notes’ to see the genes in your query that found to be abundant or defined as housekeeping genes in human (read more on abundant and housekeeping genes).

These genes get lower scores in the tissues and cells analysis. You may consider removing them from your query to optimize the results.

Tissues and Cells Analysis

Detailed Results Table

The detailed results table presents all entities in which at least one of the analyzed genes is expressed, along with links to their cards in LifeMap Discovery.

The entities are presented in descending order of their matching score. If several entities have the same score, they are ordered by the ratio of matched to total number of genes in the entity (from highest to lowest). In single-gene queries, in vivo entities appear before in vitro entities with the same score.

The list can be sorted by any other parameter presented in the table, by simply clicking on the column title. Please note that sorting by the number of matched genes per entity can provide important information, but should be considered with caution due to the large variance in the total number of genes per entity.

Score

Each gene in each entity has a score, which is based on the entity type and the gene annotations. These annotations are based on information from the scientific literature and/or on bioinformatics calculations performed using expression data in LifeMap Discovery. Each gene can have one or more of the following annotations:

Specific gene: a gene that is expressed in only a few tissues/organs
Enriched gene: a gene that is expressed in many entities of the same tissue/organ.
Selective gene: An established cell-specific marker or a gene suggested to be characteristic of the cell.
Expressed gene: a gene known to be expressed but that is not defined as a selective cell marker.
Abundant gene: a gene that is expressed in a large number of different organs/tissues/developmental paths in LifeMap Discovery.
Housekeeping gene: a gene that appears in a list of housekeeping genes established by integrating information from several studies:

A low confidence level gene: a gene for which expression evidence originates from the analysis of a large scale dataset but with no strong supporting evidence (for example, appears in a small number of cells in a specific organ).

To receive a list of all genes expressed in a specific tissue, organ or developmental path, including annotations for selective markers, specific genes and tissue-enriched genes, please contact us.

The entity score calculations are based on the gene score, and are different between single-gene and multiple-gene queries:

The entity score for single-gene queries is the score of the gene in this entity.
The entity score for queries that include more than one gene is a weighted sum of the scores of the matched genes in this entity, normalized to the log of the maximal score that can be achieved for the tissue/system.

The entity scores are classified by their quality (high, medium or low), indicated by the color of the score bar (dark green, light green or beige, respectively). The distribution of the scores among the different quality levels is shown by clicking on the pie icon.

Entity Name

Clicking on the entity name will lead you to the entity card in LifeMap Discovery, which contains a full list of all genes known to be expressed in the entity, and additional information about its development, signaling pathways, related diseases and more.

Please note the followings:

Large scale data set samples: the sample name includes the name of the large scale data set that contains this sample (in parenthesis)
In vivo cells: the cell name includes the name of the anatomical compartment that contains the cells (in parenthesis).
Protocol-derived cells: the cell name includes the name of the author and the publication year of the manuscript describing these cells (in parenthesis).
Identical cells: two or more cells that have identical names and genes, but are located in different anatomical compartments within the same organ. In such cases, only one of the cell names is presented, and the others can be seen by clicking on the plus sign.

Matched Genes

The number of genes in the entity that match the query and the total number of genes in entity (which is presented in parenthesis).

Sorting the list by this parameter can be informative but should be considered with caution since the total number of genes per entity varies significantly.

Clicking on the number of matched genes opens a list of the genes that includes gene symbols, full name, links to GeneCards® and NCBI, and information about their expression, localization and evidence:

Expression information:

Matched genes can be indicated as either 'expressed' or 'positive selective marker':

Positive selective marker : This indication appears only in cells and describes genes that are either established cell markers, or that have been suggested to be characteristic of the cell, through their expression.

Expressed gene : a gene that is known to be expressed in the entity but is not defined as a cell marker.

Evidence:

Indication for the type of evidence supporting the expression of the indicated gene (clicking on the entity name will lead you to a table with active links to all supporting sources of evidence):

: Scientific literature.

: High throughput experiments, such as microarray and RNA sequencing, available at GEO and/or the scientific literature.

: Large scale data set.

Filters

Tissue/System Filter

This panel has two functions:

Summarizing the expression results in the tissue and system levels, providing matching score for each tissue/system.
Enabling filtration of the detailed results by tissues and systems.

Tissue/system results:

The tissue/system score calculations are based on the gene score (read more about gene scores in the tissue and cell analysis score section), and differ between single-gene and multiple-gene queries:
- The tissue/system score for single-gene queries is the actual maximal score calculated for this gene in all matched entities within the tissue/system.
- The tissue/system score for queries that include more than one gene is a weighted sum of the scores of all the matched genes in this tissue/system, normalized to the log of the maximal score that can be achieved for the tissue/system.
The tissues/systems are presented in descending order of their matching score. If two entities have the same score, they are ordered by the percentage of matched genes out of the total number of genes in the tissue/system.
In this panel, ‘Tissues’ are used as a general word, and includes organs, tissues and cell classes defined in LifeMap Discovery (Read more).
Entities that are not categorized as a specific tissue or system (e.g., fibroblasts are not categorized as a part of any system) are presented under “uncategorized”, which appears at the end of the tissues/systems list, with no score.
Clicking on the name of a tissue/system opens an information box that details the total number of genes and entities in the tissue/system described in LifeMap Discovery (“total”), as well as the number of genes and entities that match the query (“matched”). This panel also includes the list of all matched genes in this tissue/system and enables a new search using these gene subset as a query.

Tissue/system filter:

Clicking on specific tissue(s)/system(s) will filter the detailed results list on the right and only entities related to the selected tissue/system will be shown.
Tissue/system filtration impact the number of hits in all the other filters.

In Vivo / In Vitro Filter

This filter can be used to show only in vivo or only in vitro results.

'Expressed in' Filter

This filter can be used to show only specific entity types including:

Cells (in vivo and in vitro)
Compartments (anatomical compartments)
Organs and tissues
High throughput comparisons (and large scale dataset samples)

Prenatal / Postnatal Filter

This filter can be used to show only prenatal or only postnatal results.

Entities that are defined as “prenatal-postnatal” appear when filtering for both prenatal and postnatal results.

Disease Analysis

Score

The disease matching score is based on the following parameters:

The number of genes in your query that match to a specific disease normalized by the total number of genes specifically associated with the disease.
The quality and type of the gene-disease relation. These relations are based on MalaCards sources and include the following relation types:

Differentially expressed (DE) genes () are genes that were found to be significantly up- or downregulated in the diseased tissues in comparison to identical tissues obtained from unaffected subjects. The disease card contains up to 200 up- and 200 downregulated genes, derived from high throughput experiments extracted from the Gene Expression omnibus (GEO) or from the literature. The differentially expressed genes are calculated by an algorithm which filters out genes which their expression is highly variable among the samples. Then, differentially expressed genes are identified using the e-bayesian method; a gene is defined as differentially expressed if it is up-regulated or down-regulated by more than 2-folds in the diseased tissue in comparison to the matched normal tissue, and the p-value is equal to 0.05 or lower.
Genetic associations to diseases () are determined from several MalaCards data sources. Since each data source has its own annotation terminology, we categorized all possible associations as shown in the genetic association table in a descending order of their score.
The “GeneCards-inferred” () relation indicates that the disease name appears in the gene page in GeneCards. Of note, the ‘GeneCards-inferred’ relation does not imply causality between the gene and the disease and the nature of this relation may sometimes be unclear. For example, the gene can be indicated as ‘unaffected’ in the disease. Similarly, if a gene is related to a disease based on a ‘publication’, it means that the gene and the disease were mentioned in the title and/or abstract of the same publication. There are two score levels for genes with GeneCards-inferred relations: a gene that appears in three or more different sections in GeneCards has higher score than a gene that appears in only one or two sections.

For each gene, the maximal score of all the above mentioned possible scored is used as the final gene score. The disease score is based on the final scores of all the matched genes.

Disease Categories Filter

This panel filters results in accordance with the Malacards disease categorization. The Malacards algorithm categorizes each disease into 0-4 anatomical and 0-5 global disease categories based on:

Existing and widely used categorization systems such as ICD10 and Orphanet.
Identification of category-specific keywords contained in disease names and annotations.

Use this filter to focus the results.

The numbers indicate the number of hits per disease category. These numbers are modified upon use of the additional category filter.

Note that not all the diseases are categorized.

Genetic Associations Table

Association category	Genetic association	Source
Causative mutation	Pathogenic	ClinVar
	Likely Pathogenic	ClinVar
	Molecular basis known	OMIM
	Causative germline mutation	Orphanet
	Causative somatic mutation	Orphanet
	Causative mutation	Uniprot
Risk factor	Confers sensitivity	ClinVar
	Risk factor	ClinVar
	Genetic association	OMIM
	Susceptibility factor	OMIM
	Modifying Germline mutation	Orphanet
	Role in phenotype	Orphanet
	Modifying Somatic mutation	Orphanet
Resistance factor	Protective	ClinVar
Resistance factor	Resistance factor	OMIM
Genetic tests	Genetic tests	GeneTest
Drug response	Drug response	ClinVar
Structural gene variation	Structural variation	OMIM
Structural gene variation	Gene fusion	Orphanet
Unconfirmed association	Unconfirmed association	OMIM
	Candidate gene tested	Orphanet
	Genetic linkage	OMIM

Pathway, GO terms, phenotypes and compound analysis

Detailed Results Table

This list presents pathways, GO terms, phenotypes or compounds that match your gene set, with links to their cards in the relevant database (PathCards or GeneCards).

(read more on pathways, GO terms, phenotypes or compounds analysis result types and data sources).

The matches are presented in descending order of the matching scores. If several matches have the same score, they are ordered by the ratio of matched to total number of genes associated with the matched entity (from highest to lowest).

Clicking on the number of matched genes for each match opens a list of these genes, which can be used as a new query.

Score

The binomial distribution is used to test the null hypothesis that the user’s input genes are not over-represented within any SuperPath, GO term or compound. The presented score for each match is a transformation (-log₂) of the resulting p-value, where higher scores indicate better matches. Results with p-values lower than 10^-50 are assigned the maximum score.

The score range is divided into three quality levels, based on the p-value corrected for multiple comparisons (using the false discovery rate method):

High: corrected p-value smaller or equal to 0.0001

Medium: corrected p-value higher than 0.0001 but smaller or equal to 0.05

Low: corrected p-value higher than 0.05

The scores are classified by their quality (high, medium or low), indicated by the color of the score bar (dark green, light green or beige, respectively). The distribution of the quality of the results can be viewed by clicking on the pie icon.

The entities are presented in descending order of their matching scores. If several matches have the same score, they are ordered by the ratio of matched to total number of genes associated with the entity (from highest to lowest).

The table presents all results whose score was derived from corrected p-values smaller or equal to 1. If less than 20 results pass this threshold, the best 20 results will be presented, even if they are of lower statistical significance.

Compounds unification and data sources

GeneAnalytics Compounds section takes advantage of multiple sources related to more than 83,000 compounds, including those found in GeneCards. (for more information about the data sources click here).

The Novoseek data source extracts knowledge from biological databases and text repositories, providing relationships between chemical compounds and genes based on scoring algorithm running on Pubmed articles. Note that the Novoseek website is no longer available. Read more about Novoseek data in GeneCards and about its literature-text mining algorithm.

We have applied a unification process which seeks out similar compounds described in different data sources, to enable gene aggregation for unified compounds, and to avoid redundancy in the resulting compounds list. Compounds unification is established by an identical name and/or combination of other identifiers as: CAS number, PubChem ID and synonyms. Unified compounds are shown with links to all relevant data sources (the exact compound name is shown near the original data source name).

Metabolites unification:

The following compound families contain thousands of metabolites which were unified based on their primary name and associated genes.

If genes associated with these compounds are matched to your gene set, GeneAnalytics presents only the matched group, to avoid a multitude of identical results. The evidence link enables viewing all the relevant metabolites in the original database. The unified compounds and their specific groups are as following:

1. Triglycerides

Group name	# of associated genes	# of metabolites in the group
Triglycerides group A	26	170
Triglycerides group B	30	113
Triglycerides group C	39	6
Triglycerides group D	34	13631

2. Diglycerides

Group name	# of associated genes	# of metabolites in the group
Diglycerides group A	130	803
Diglycerides group B	131	39
Diglycerides group C	131	1
Diglycerides group D	115	435

3. Phosphatidylcholines

Group name	# of associated genes	# of metabolites in the group
Phosphatidylcholines group A	78	955
Phosphatidylcholines group B	72	119
Phosphatidylcholines group C	44	73

4. Phosphatidylethanolamines

Group name	# of associated genes	# of triglycerides in the group
Phosphatidylethanolamines group A	43	959
Phosphatidylethanolamines group B	30	114

Input

Select Input Species

Enter Gene Symbol

Your Gene List

Results

Analyzed Genes

Tissues and Cells Analysis

Detailed Results Table

Score

Entity Name

Matched Genes

Filters

Tissue/System Filter

In Vivo / In Vitro Filter

'Expressed in' Filter

Prenatal / Postnatal Filter

Disease Analysis

Score

Disease Categories Filter

Genetic Associations Table

Pathway, GO terms, phenotypes and compound analysis

Detailed Results Table

Score

Compounds unification and data sources

Start analyzing your gene sets