Pathway Analysis

Ariadne constantly develops and innovates workflows, tools and algorithms for mining and analysis of molecular interaction and pathway data, and conducts research in the field of pathway discovery in large biomolecular networks.

Network Navigation and Data Mining Tools for Pathway Building
Tools for Mining Pathway Collections and Ontologies
Tools for Building Correlation Networks from Microarray Data
Algorithms for Analysis of Differential Expression Data
Pathway Reconstruction Algorithms

Network Navigation and Data Mining Tools for Pathway Building

Most advanced package of graph navigation queries is implemented in the Build Pathway tool of Pathway Studio. Using wave-front propagation algorithm the Build Pathway tool allows:

  • Searching for the intercating partners of a biological entity or a group of entities (proteins, complexes, small molecules, etc) in the network;
  • Searching for common targets or common regulators of a group of entities;
  • Finding the direct links and the shortest paths between entities.

All these searches can be tuned with a wide range of filters, such as Entity Type and Attributes, Relation Type and Attributes, and Direction of the Relations to address different bioinformatics questions. 

Publications:
Pathway studio - the analysis and navigation of molecular networks. Nikitin A, Egorov S, Daraselia N, and Mazo I, Bioinformatics 19:2152157, 2003

Tools for Mining Pathway Collections and Ontologies

The "Find Groups" and "Find Pathways" tools calculate the significance of the overlap between the user’s pathway or gene list and every group or pathway existing in the database using Fisher's Exact test.

The "Find Groups" tool is instrumental, for instance, in mining the Gene Ontology groups.  It finds out the functional categories/biological processes or any group of entities that are overrepresented among the entities in the user’s pathway or, in general, in any set of “interesting genes” (e.g. set of differentially expressed genes from a microarray experiment). The Find Pathways tool finds the existing pathways in the database that overlap significantly with a set of “interesting genes.”

Tools for Building Correlation Networks from Microarray Data

The algorithm for building Correlation Network or Relevance Network is available as Pearson Correlation tool in Pathway Studio.  In Relevance Network genes with similar expression profiles are connected with links indicating the significance of the correlation.  The group of tightly correlated genes form cluster in the correlation network.  The algorithm can be used for clustering genes according to their expression profiles across multiple samples, for example, time-course measurements. 

The Relevance Network algorithm is based on the theory that expression of interacting or, otherwise, closely related entities are correlated.  The tool calculates correlation coefficients between all pairs of gene expression profiles measured in the experiment and outputs clusters of highly correlated genes.  Identified gene clusters can be further validated and analyzed using relations from the ResNet database that have been extracted from the literature using MedScan technology.  The function of a gene cluster can be found by comparison with Gene Ontology or Pathway collection.

Algorithms for Analysis of Differential Expression Data

Another set of algorithms perform the analysis using gene differential expression. In contrast to the unsupervised Relevance Network, these algorithms take into account an existing pathway and entire biomolecular network knowledge.

  • Gene Set Enrichment Analysis (GSEA) is available in "Find Differentially Expressed Groups" and "Find Differentially Expressed Networks" tools in Pathway Studio Enterprise. The algorithms calculate the statistical significance of the expression changes across every group or pathway in the database, thus, allowing identification of groups or pathways most strongly affected by the observed expression changes.  Both tools use non-parametric Mann-Whitney statistical test to calculate the p-value indicating the significance of the enrichment score.  This allows avoiding any asumptions about the shape of sampling distribution.  The analytical calculation of p-values accelerates the algorithm performance many times compared with permutation. 
  • Network Enrichment Analysis (NEA) is available as "Find Significant Regulators" tool in Pathway Studio Enterprise.  This algorithm first breaks the global network into a set of small networks consisting of a regulator and all of its targets. After that, it evaluates individually statistical significance of the differential expression changes across the targets of each regulator.  Thus, FSR algorithm offers help in hypothesis building by providing a list of putative upstream regulators driving the observed expression changes. FSR also takes into account a network connectivity eliminating the bias due to promiscuous targets.

The statistical significance in GSEA and NEA algorithms is determined by comparison of the actually observed distribution of expression values in each network to the expected “baseline” distribution.  These algorithms can detect relatively weak, but consistent expression changes across the pathway genes or particular regulator targets.

Publications:
Identification of Significant Transcription Regulators through Integration of Microarray Data with Regulatory Networks. Sivachenko A, Yuryev A, Daraselia N, Mazo I, ISBM, Aspen, CO, 2005
Identifying Local Gene Expression Patterns in Biomolecular Networks. Sivachenko A, Yuryev A, Daraselia N, Mazo I, IEEE Computational Systems Bioinformatics Conference, Aug. 11, 2005

  • Build Differentially Expressed Networks (BDEN) algorithm is available  in Pathway Studio Enterprise.  It searches ResNet network of interactions for dense clusters of differentially expressed genes.  A network cluster consists of genes or metabolites that are more densely interconnected with each other through ResNet relations than with the rest of the network. At the same time, the clusters also contain significantly differentially expressed genes. The network clustering is performed by mapping genes into same “spin” domains of q-state Potts-like Hamiltonian.

Publications:
Finding mesoscopic communities in sparse networks. I Ispolatov, I Mazo and A Yuryev, J. Stat. Mech. P09014, 2006

Pathway Reconstruction Algorithms

Ariadne has also developed algorithms for factoring large biomolecular network into putative signaling and regulatory pathways. These algorithms have been applied to the whole ResNet database. And the resulting automatically generated pathways are shipped with ResNet in addition to the curated pathways.

  • The first algorithm predicts regulomes pathways, and based on the notion that the regulatory interactions between proteins are mediated by the physical interaction between them. It starts with a ligand-receptor pair and finds all proteins regulated by either receptor or ligand in the ResNet database. Next, it connects all found downstream targets by direct physical interactions such as Binding or Protein Modification, and then removes unconnected entities.

    Publications:
    Automatic pathway building in biological association networks . Yuryev A, Mulyukov Z, Kotelnikova E, Maslov S, Egorov S, Nikitin A, Daraselia N and Mazo I, BMC Bioinformatics 2006, 7:171

  • The Signaling Line Pathways algorithm predicts the optimal signaling path from a receptor to a downstream transcription factor through the various signal transduction proteins, such as kinases and phosphatases. Algorithm, first, assigns weights to all relations in ResNet based on their reference count and number of similar relations for paralogous proteins. Then it uses the Dikjstra algorithm to find the optimal path in the weighted graph between the receptor and the downstream transcription factor using physical interactions in ResNet. Algorithm considers only transcription factors that are regulated by the receptor in the ResNet database. At the last step the optimal paths found by algorithm are manually curated.