HiPiler

Visual exploration
of large genome interaction matrices
with interactive small multiples.

HiPiler is an interactive web application for exploring and visualizing many features in large genome interaction matrices. Genome interaction matrices approximate the physical distance of pairs of genomic regions to each other and can contain up to 3 million rows and columns. Traditional matrix aggregation or pan-and-zoom interfaces largely fail in supporting search, inspection, and comparison of local features. HiPiler represents features as thumbnail-like snippets. Snippets can be laid out automatically based on their data and meta attributes. They are linked back to the matrix, visualized with HiGlass, and can be explored interactively.

News

Fritz presented HiPiler at the Hi-C Data Analysis Bootcamp from Harvard, MIT, and UMassMed. [Slides] May 8th, 2018
Nils presented HiPiler among other tools at Visualizing Biologival Data (VIZBI) 2018. [Slides] Mar 28th, 2018
Nils presented HiPiler at Dana-Farber's Center for Functional Cancer Epigenetics. Mar 2nd, 2018
Version 1.2 is out, supporting five color maps with custom color scaling and featuring HiGlass version 0.10.11 among other things. Jan 21st, 2018
Nils talked about HiPiler at Harvard's 2017 PQG Conference. [Slides] Nov 2nd, 2017
Fritz presented HiPiler at Microsoft's Computational Aspects of Biological Information 2017. [Poster] Oct 13th, 2017
Fritz presented HiPiler at IEEE VIS in Phoenix at InfoVis. [Presentation | Slides] Oct 5th, 2017
Fritz was invited to pitch HiPiler at Harvard - Novartis Machine Learning and Computational Meeting. Sep 22nd, 2017
Nature published an entire blog post about HiPiler. Sep 11th, 2017
Fritz demonstrated HiPiler at 4D Nucleome Annual Meeting 2017. [Slides | Poster] Sep 7th, 2017
HiPiler got accepted at IEEE VIS (InfoVis). Jul 11th, 2017

Screencast

Introduction

The human genome is about 2 meters long and tightly folded into each cell nucleus. This results in a dense, fractal-like and three-dimensional structure in which genome sequences that are distant on the genome, can be in close spatial proximity. It has been shown [1] that this 3D structure is an important factor for regulation of gene expression, replication, DNA repair, and other biological functions. Biologists are interested in uncovering the mechanisms that drive global and local folding to better understand the vast and complex gene regulation network.

The probability of two sequences being in close proximity to each other, i.e. interacting, can be inferred using modern genome sequencing techniques, which yield for every genome a huge symmetric genome interaction matrix with up to 3 million rows and 3 million columns (Fig. 1). Each of the 9 trillion matrix cells represents the proximity of two genomic regions. Repetitive and hierarchically nested visual patterns (Fig. 2), ranging from millions down to a few thousand base pairs in size, can be identified across the matrix, which represent so called regions of interest (ROIs).

Fig. 1: Hi-C methodology: as the DNA (1) is organized non-arbitrarily in the cell nucleus (2), certain parts (highlighted in orange and blue) are frequently in close contact (3). These contacts are quantified over a set of several hundred million cells (4), leading to interaction matrix of up to 3 by 3 million cells (5). Dark colors indicate more frequent contact occurrences of two loci.

Fig. 2: Hi-C Patterns — Fig. 2: Examples of frequent patterns by increasing size. Loops (1) appear as dark central dots. Hierarchically-organized domains (2) are darker rectangles. Flames (3) are horizontal or vertical lines. Active and inactive compartments create a global checkerboard pattern (4).

Algorithms for automatic pattern detection are being development. However, these algorithms can be very complex and often identify tens of thousands of pattern instances. Results of algorithms designed to identify the same type of pattern often differ substantially [2] and the lack of a ground-truth pattern collection hinders thorough evaluation of these algorithms.

Interactive visualization tools have been developed [3] but are focused on supporting visualization of a single or a small number of views of the matrix and navigation through pan and zoom [4, 5]. Detailed exploration and comparison of thousands of small ROIs is unsupported by current tools yet needed, due to the size and multiscale nature of the folded genome.

HiPiler is an interactive visualization tool designed for exploration and analysis of thousands of ROIs extracted from one or more genome interaction matrices.

To overcome the contextual constraints of exploring local patterns in large matrices, HiPiler follows a divide and explore approach that extracts ROIs from the matrix and enables independent exploration (Fig. 3). HiPiler assumes a given set of ROIs, derived from specialized pattern recognition algorithms. HiPiler then visualizes these ROIs as small heatmaps (matrices) which we call snippets. A snippet is associated with a set of ordinal and categorical attributes, such as its noisiness, size, or source dataset. This data is derived from the matrix itself or point to prior knowledge. Based on this data, HiPiler enables automatic and manual ordering, positioning, grouping, filtering, and visual manipulation to identify patterns present across the set of snippets. Additionally, the context of snippets in the matrix is maintained through highlighting of snippet locations in the interaction matrix.

Fig. 3: The snippets approach. — Fig. 3: The *snippets* approach: decompose a large matrix (1) into small snippets (2) and explore these snippets (3) using different layouts, arrangements, and styles, while maintaining global context.

The design of HiPiler (Fig. 4) is informed by semi-structured interviews with ten domain experts from various genomics research labs as well as iterative design sessions over the course of several months. These interviews led to the formulation of six tasks for the exploration of many ROIs in large matrices.

Fig. 4: Design of visualization and interaction concepts in HiPiler.

HiPiler is designed to support four types of scenarios. (i) visual evaluation of the results of pattern detection algorithms. (ii) characterization, aggregation, and outlier detection in large pattern collections (Fig. 5.1). (iii) comparison of ROIs across multiple matrices (Fig. 5.2), e.g., to compare different datasets, experimental conditions, or extraction algorithms.

And (iv) correlation of matrix patterns with other genomic attributes (Fig. 5.3), e.g., genes or protein-binding sites. We evaluated the usability and appropriateness of HiPiler through a user study with five domain experts. The results show that HiPiler is easy to learn and use, and that it offers important benefits for analyzing genome interaction matrices.

Fig. 5: Design of HiPiler — Fig. 5: Illustrations on how HiPiler supports (1) filtering and grouping, (2) comparison, and (3) correlation.

Publication

HiPiler: Visual Exploration Of Large Genome Interaction Matrices With Interactive Small Multiples
1. Fritz Lekschas
2. Benjamin Bach
3. Peter Kerpedjiev
4. Nils Gehlenborg
5. Hanspeter Pfister
IEEE Transactions on Visualization and Computer Graphics (InfoVis), 24, 1, 522-531, 2018.

Presentations

For VIS researchers:

For biomedical researchers:

IEEE VIS InfoVis, Phoenix, 2017

4D Nucleome Annual Meeting, Bethesda, 2017

Source Code

All the code of HiPiler are publicly accessible and open-source.

Authors

Fritz Lekschas
Harvard John A. Paulson School of Engineering and Applied Sciences
Benjamin Bach
University of Edinburgh
- benjbach
- benjbach
Peter Kerpedjiev
Harvard Medical School
Nils Gehlenborg
Harvard Medical School
Hanspeter Pfister
Harvard John A. Paulson School of Engineering and Applied Sciences

References

[1] Fraser and Bickmore. (2017) Nuclear organization of the genome and the potential for gene regulation. Nature, 447, 7143, 413–417.
[2] Forcato et al. (2017) Comparison of computational methods for Hi-C data analysis. Nature methods 14, 7, 679.
[3] Yardımcı and Noble. (2017) Software tools for visualizing Hi-C data. Genome Biology, 18, 1, 26.
[4] Durand et al. (2016) Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Systems, 3, 1, 99–101.
[5] Kerpedjiev et al. (2017) HiGlass: Web-based visual comparison and exploration of genome interaction maps. bioRxiv.

HiPiler

Visual explorationof large genome interaction matriceswith interactive small multiples.

News

Screencast

Introduction

Publication

Presentations

Source Code

Authors

Fritz Lekschas

Benjamin Bach

Peter Kerpedjiev

Nils Gehlenborg

Hanspeter Pfister

References

Visual exploration
of large genome interaction matrices
with interactive small multiples.