Software and Datasets

Please note that our software are freely available for non-commercial use only.

DATA MANAGEMENT AND ANALYTICS

  SOFTWARE DETAILS
5 DUALSIM Publication (SIGMOD 2016) Subgraph enumeration is important for many applications such as subgraph frequencies, network motif discovery, graphlet kernel computation, and studying the evolution of social networks. Recently, efforts to enumerate all subgraphs in a large-scale graph have seemed to enjoy some success by partitioning the data graph and exploiting the distributed frameworks such as MapReduce and distributed graph engines. However, we notice that all existing distributed approaches have serious performance problems for subgraph enumeration due to the explosive number of partial results. DUALSIM is a disk-based, single machine parallel subgraph enumeration solution that can handle massive graphs without maintaining exponential numbers of partial results. Specifically, it implements a novel concept of the dual approach for subgraph enumeration, which swaps the roles of the data graph and the query graph. DUALSIM outperforms the state-of-the-art methods by up to orders of magnitude, while they fail for many queries due to explosive intermediate results. Download
4 Structure-Preserving Query Service Publication (ICDE 2015, TKDE 2015) This software implements the first practical private approach for subgraph query services, asymmetric structure-preserving subgraph query processing, where the data graph is publicly known and the query structure/topology is kept secret. Such query service is useful when the query computation is outsourced to a third-party service provider. Download
3 ASTERIX Publication (SIGIR 2017, SIGMOD 2013) Existing XML keyword search (XKS) engines primarily suffer from two limitations. First, although the smallest lowest common ancestor (SLCA) algorithm (or a variant, e.g.,ELCA) is widely accepted as a meaningful way to identify subtrees containing the query keywords, SLCA typically performs poorly on dcuments with missing elements, i.e., (sub)elements that are optional, or appear in some instances of an element type but not all. Second, since keyword search can be ambiguous with multiple possible interpretations, it is desirable for an XKS engine to automatically expand the original query by providing a classification of different possible interpretations of the query w.r.t.the original results. However, existing XKS systems do not support such result-based query expansion. ASTERIX is an innovative XKS engine that addresses these limitations. Download
2 Generalized Subgraph Search Publication (CIKM 12) This software implements a new type of graph queries, which injectively maps its edges to paths of the graphs in a given database, where the length of each path is constrained by a given threshold specified by the weight of the corresponding matching edge. Download
1 MustBlend Publication (DASFAA 2013, ICDE 09, ICDE 06) MUSTBLEND (MUlti-Source Twig BLENDer) is a novel visual XML querying paradigm where the visual query formulation and processing is interleaved. A key practical feature of MUSTBLEND is its portability as it does not employ any special-purpose storage, indexing, and query cost estimation schemes. Download

 

COMPUTATIONAL SYSTEMS BIOLOGY AND BIOINFORMATICS

  SOFTWARE DETAILS
7 TINTIN Publication (ACM BCB 2017) A network-based approach that ranks a given set of networks based on its "similarity" to a reference network. TINTIN exploits target feature-based network similarity in order to determine if two networks are similar. Specifically, it leverages topological and dynamic features of targets to compute similarity distances between signaling networks and rank them accordingly. TINTIN is useful to address problems such target prioritization and drug target repositioning. Download
6 TAPESTRY Publication (ACM BCB 2016) Target prioritization ranks molecules in biological networks according to a score that seeks to identify molecules that fulfill particular roles (e.g., drug targets). Tapestry is a network-based approach that prioritizes candidate targets in a given signaling network with unknown targets by utilizing knowledge (target characteristics) gained from curated targets in another set of signaling networks. It exploits a knowledge base of characterization models and predictive topological features of a set of signaling networks (candidate networks) with curated targets. Given a signaling network G with unknown targets, Tapestry identifies a candidate network most similar to G and selects its characterization model as prioritization model for computing a topological feature-based rank of each candidate node in G. Then, a dynamic feature-based rank is computed for these nodes by leveraging the time-series curves of odes associated with the edges in G. Finally, these two ranks are integrated and used for prioritizing candidate targets. Download
5 TENET Publication (Bioinformatics 2015) A network-based approach that characterizes known targets in signaling networks using topological features. TENET first computes a set of topological features and then leverages a support vector machine-based approach to identify predictive topological features that characterizes known targets. A characterization model is generated and it specifies which topological features are important for discriminating the targets and how these features should be combined to quantify the likelihood of a node being a target. Download
4 DUALALIGNER Publication (Bioinformatics 2014) DualAligner performs dual network alignment, in which both region-to-region alignment, where whole subgraph of one network is aligned to subgraph of another, and protein-to-protein alignment, where individual proteins in networks are aligned to one another, are performed to achieve higher accuracy network alignments. Dual network alignment is achieved in DualAligner via background information provided by a combination of Gene Ontology annotation information and protein interaction network data. Download
3 DiffNet Publication (Methods 2014) The study of genetic interaction networks that respond to changing conditions is an emerging research problem. Bandyopadhyay et al. (2010) proposed a technique to construct a differential network (dE-MAPnetwork) from two static gene interaction networks in order to map the interaction differences between them under environment or condition change (e.g., DNA-damaging agent). This differential network is then manually analyzed to conclude that DNA repair is differentially effected by the condition change. Unfortunately, manual construction of differential functional summary from a dE-MAP network that summarizes all pertinent functional responses is time-consuming, laborious and error-prone, impeding large-scale analysis on it. DiffNet is a novel data-driven algorithm that leverages Gene Ontology (GO) annotations to automatically summarize a dE-MAP network to obtain a high-level map of functional responses due to condition change. Download
2 FACETS Publication (Bioninformatics 2012) FACETS is a novel PPI network decomposition algorithm to make sense of the deluge of interaction data using Gene Ontology (GO) annotations. It finds not just a single functional decomposition of the PPI network, but a multi-faceted atlas of functional decompositions that portray alternative perspectives of the functional landscape of the underlying PPI network. Each facet in the atlas represents a distinct interpretation of how the network can be functionally decomposed and organized. Our algorithm maximizes interpretative value of the atlas by optimizing inter-facet orthogonality and intra-facet cluster modularity. Download
1 BIDEL Publication (DASFAA 2007) Warehousing heterogeneous, dynamic biological data is a key technique for biological data integration as it greatly improves performance. However, it requires complex maintenance procedures to update the warehouse in light of the changes to the sources. Consequently, a key issue to address is how to detect changes to the underlying biological data sources. BIDEL is a software for detecting exact changes to biological annotations. In our approach we transform heterogeneous biological data to XML format and then detect changes between two versions of XML representation of biological data. Download

 

 

Back to top