The Galaxy Track Browser: Transforming the Genome Browser from Visualization Tool to Analysis Tool
|
|
- James Holland
- 5 years ago
- Views:
Transcription
1 The Galaxy Track Browser: Transforming the Genome Browser from Visualization Tool to Analysis Tool Jeremy Goecks * Kanwei Li Ω Dave Clements ℵ The Galaxy Team James Taylor ℇ Emory University Emory University Emory University Emory University ABSTRACT The proliferation of next-generation sequencing (NGS) technologies and analysis tools present new challenges to genome browsers. These challenges include supporting very large datasets, integrating analysis tools with data visualization to help reason about and improve analyses, and sharing or publishing fully interactive visualizations. The Galaxy Track Browser (GTB) is a Web-based genome browser integrated into the Galaxy platform that addresses these challenges. GTB is the first Web-based genome browser to provide a full multi-resolution data model; this model supports efficient data retrieval from very large datasets. GTB leverages the Galaxy platform to combine data visualization and data analysis; users can specify parameter values and run tools to produce new data, all within GTB. GTB also provides interactive filters that dynamically show and hide data and can be used to identify data for further investigation. GTB is available on every Galaxy server, and visualizations can be created for both standard and custom genome builds. Fully interactive GTB visualizations can be shared with colleagues and published on the Web using a simple graphical user interface. KEYWORDS: Galaxy Track Browser, genomics, genome browser, visual analytics. INDEX TERMS: H5.m. Information interfaces and presentation (e.g., HCI): Miscellaneous, J.3 Life and Medical Sciences: biology and genetics. 1 INTRODUCTION Genomics the study of DNA and related molecules, their functions, and their impact on health is a growing biomedical science that is highly reliant on computational tools and methods. Visualizations play an important role in genomics by helping scientists understand the numerical and textual data that many genomics tools produce. Using visualizations, scientists can find problems and debug an analysis, identify results for subsequent investigation, and communicate findings, both to colleagues and in publications. Genome browsers are a powerful visualization tool that enables scientists to map numerical and textual data onto a visual representation of the genome [1, 2]. Genome browsers have a long history and remain an active area of research and development. * jeremy.goecks@emory.edu Ω kanwei@gmail.com ℵ clements@galaxyproject.org outreach@galaxyproject.org ℇ james.taylor@emory.edu IEEE Symposium on Biological Data Visualization October 23-24, Providence, Rhode Island, USA /11/$ IEEE Early genome browsers such as the UCSC genome browser [3], Ensembl [4], and GBrowse [5] are regularly updated and extended. Recently, new genome browsers, such as IGB [6], IGV [7], JBrowse [8], and Savant [9], have been developed and demonstrate new models and techniques for genome browsing. Recent activity in genome browser development is motivated in large part by the adoption of next-generation sequencing (NGS) technologies [10] and analysis tools [11] in genomic experiments. NGS experiments produce very large datasets, use complex analyses, and require significant collaboration to complete. These demands have driven recent genome browser research. Despite much progress, challenges remain that limit the usefulness of genome browsers for NGS data. Current desktop browsers use full multi-resolution data models for viewing and customizing data, but Web-based browsers lack this needed functionality. NGS analyses are often long and complex, but it is difficult to use genome browsers to reason about and improve an analysis because analysis tools and genome browsers are not integrated. Finally, sharing and publishing fully interactive visualizations is difficult in current browsers, limiting the usefulness of visualizations for collaboration. The Galaxy Track Browser (GTB) is a Web-based genome browser integrated into the Galaxy platform [12-14] that addresses these challenges. GTB utilizes a full multi-resolution, clientserver data model to support gigabyte-sized datasets. Each dataset is a track in the browser, and all tracks are fully interactive and customizable. GTB leverages the Galaxy platform to combine data visualization and data analysis. Using GTB, users can set parameter values and run tools to produce and visualize new data. Dynamic filters can be used to interactively explore data and find data that meets particular criteria. GTB can be used to visualize genomic data for both standard and custom genome builds. Fully interactive GTB visualizations can be shared with particular individuals or published on the Web through a simple graphical user interface (GUI). While GTB leverages many of Galaxy s features, it is modular and can be configured to work outside of Galaxy and with different data providers. This paper s contributions include: (1) a discussion of three challenges that NGS tools and data pose to current genome browsers; (2) a description of the Galaxy Track Browser and how it addresses these challenges; (3) a comparison between the Galaxy Track Browser and other recent genome browsers. 2 THREE CURRENT CHALLENGES FOR GENOME BROWSERS Genome browsers are an important tool for helping scientists visualize and understand their data [1, 2]. The proliferation of next-generation sequencing (NGS) technologies [10] and analysis tools [11] present new challenges to genome browsers. In this section, we discuss three challenges that genome browsers must address to support scientists doing NGS experiments and analyses and how current genome browsers address these challenges. 39
2 2.1 Challenge: Supporting Very Large Datasets in a Web-based Browser NGS technologies and analysis pipelines often produce data that is tens or hundreds of gigabytes in size or larger. For instance, consider a gene expression experiment using RNA-seq [15]. Currently, an RNA-seq dataset obtained from a single lane of an Illumina HiSeq 2000 system is approximately ten to twenty gigabytes in size. Following a simple pipeline to map reads, assemble transcripts, and perform differential expression testing amongst different RNA-seq samples will often generate many datasets that are 500 to 1000 megabytes in size. A genome browser for NGS experiments, then, must efficiently scale to render large datasets and enable users to customize data display to meet their needs. Scaling to support visualization of gigabyte-sized datasets requires a multi-resolution data model that provides access to data at multiple levels of resolution. To display large genomic regions with many data elements, aggregation is done ahead of time and aggregated data used for rendering. To display individual features in a small region, indices such as bigwig [16] and tabix [17] are used to quickly find the region s data in a large file. A multi-resolution data model provides efficient access to a dataset but does not influence how a dataset is rendered. Genome browsers that support a complete multiresolution data models include the desktop-based browsers IGV [7] and Savant [9]. Desktop-based genome browsers are convenient when there is no access to the Web or when datasets are stored locally and uploading them is undesirable. Web-based browsers complement desktop-based browsers by providing access to data and visualizations over the Web via a standard Web browser and do not require users to download or install any software. No Webbased browser, however, implements a complete multi-resolution model. Anno-J [18] supports a multi-resolution model but does not implement one, and JBrowse [8] uses a partial multiresolution data model. Hence, there is a need for a Web-based browser that supports a full multi-resolution data model. 2.2 Challenge: Integrating Data Visualization and Data Analysis Tools NGS analysis tools and pipelines are often complex and highly parameter-dependent, and it can be difficult to determine how changes in parameter values will impact the output of a tool or pipeline. Using genome browsers to debug or tune parameter values can be useful because it is often straightforward to visually confirm that a tool s output is acceptable or needs improvement. However, moving between running analyses and visualizing output can be a tedious process because genome browsers act as endpoints for an analysis pipeline: a series of tools are applied to produce outputs, which are then visualized in a browser. Genome browsers would be more useful if they were integrated with analysis tools. For example, a browser might enable a user to change a tool s parameter value and then, through the browser, rerun the tool and observe how the change impacts the tool s output. If a user could repeatedly change tool parameter values and visualize the corresponding output, she could tune a tool s parameter values to produce the desired output. Similarly, others have argued that more on the fly computation is needed in browsers [2]. Integrating analyses with visualization is a foundation of visual analytics; visual analytics is the science of using interactive visualizations to support analytic reasoning [19]. There are examples of limited visual analytics functionality in current browsers. In the UCSC genome browser [3], a user can run a BLAT [20] query and then visualize the results as a track in the current browser. Savant provides a plug-in framework that developers can use to prototype and run analysis tools. Savant s framework is limited to tools specifically written for Savant because there is no method for incorporating existing tools as plugins. Principles from visual analytics have been applied to develop Hawkeye, a visualization for performing genome assemblies that uses dynamic filtering and automated clustering to help users identify problematic areas of an assembly [21]. These examples illustrate the value of integrating analysis tools into genomic visualizations. The challenge, then, is developing a genome browser that integrates with an analysis framework so that users can run and rerun many different tools to produce novel output all while in the browser. 2.3 Challenge: Sharing and Publishing Interactive Custom Visualizations At the leading edge of biomedical research are large, collaborative experiments that employ NGS technologies and analysis tools to explore complex biomedical questions. Custom genome browser visualizations visualizations of experimental data, often coupled with public data are useful for collaboration and communication because they can efficiently convey information. Despite the value of custom visualizations, they are currently quite difficult to share. The UCSC genome browser supports sharing custom tracks via URL, but there are limitations, including the deletion of tracks 48 hours after they were last accessed. IGB [6] and IGV support data sharing, but users must set up their own visualization to see the data. Using JBrowse, a fully interactive visualization can be shared via the Web; however, JBrowse requires that a server be configured to support the visualization, a task that can be difficult for scientists without programming experience. Custom visualizations are often prominently featured in publications, yet little attention is paid to their reproducibility. Reproducing experimental results including visualizations is an essential facet of scientific inquiry, providing the foundation for understanding, integrating, and extending results toward new discoveries. Reproducibility of genomics experiments has been shown to be limited [22]; NGS technologies, due to their complexity, have exacerbated experimental reproducibility [23]. It is also often difficult to reproduce custom visualizations because essential details of the analysis or the visualization are lost. There are, then, significant issues that limit the usefulness of custom visualizations for scientific collaboration and public communication. A framework is needed that facilitates sharing and publication of fully interactive and reproducible custom genome browser visualizations. 3 GALAXY TRACK BROWSER The Galaxy Track Browser (Figure 1) is a Web-based browser integrated into the Galaxy platform that addresses the three challenges discussed in the previous section. Hence, it is designed for NGS data and tools. Galaxy is an open Web-based platform for accessible, reproducible, and transparent genomic research [12-14]. The public Galaxy service ( makes analysis tools, genomic data, tutorial demonstrations, persistent graphical workspaces, and publication services available to any scientist that has access to the Internet. Local Galaxy servers can be set up by downloading the Galaxy application and customizing it to meet particular needs. Galaxy has established a significant community of users and developers. Like Galaxy, the Galaxy Track Browser (GTB) is designed to be usable by all scientists, especially those without programming experience. GTB is written in JavaScript, and all of its functionality is available using only a Web browser. As is done in most genome browsers, GTB s horizontal axis denotes genome coordinates and each dataset is displayed as a track; individual 40
3 Figure 1. Using the Galaxy Track Browser to analyze ENCODE RNA-seq data. From top to bottom: (1) UCSC knowngene gene annotation track; (2) UCSC all mrna annotation track; (3) UCSC vertebrate quantitative conservation track (4) mapped RNA-seq reads from ENCODE cell line h1-hesc; (5) a form for running Cufflinks [24], a tool for assembling mapped reads into transcripts; (6) first attempt at transcript assembly; (7-9) improving the assembly using different parameter values for Cufflinks; (10) filtering assembled transcripts from the GM12878 cell line using transcript attributes. 41
4 data elements are drawn at their genome coordinates. GTB supports four types of NGS datasets: mapped reads (SAM/BAM file format), features/annotations/ intervals (BED, GFF), variant (VCF), and continuously valued data (WIG). GTB is available on every Galaxy server and Galaxy users can create, save, and share any number of visualizations. GTB visualizations can be created for both standard and custom genome builds. 3.1 Adding, Navigating, and Customizing Data Adding a dataset to a GTB visualization can be done either within a visualization, from a user s Galaxy workspace, or from Galaxy data libraries. To support visualization of very large datasets, GTB automatically creates multiple indices for each dataset added to a visualization and manages the connections between datasets and indices so that indices can be reused if a dataset is used in more than one visualization. Automatically managing indices is an important usability feature because many users may have difficulty creating indices. GTB s use of multiple indices enables smooth user interactions and complete customization of data: users can freely and continuously pan, zoom, and navigate, and GTB fetches data as needed from the server. When GTB is showing a large genomic region, the data obtained is aggregated and is shown as a coverage histogram or features spans. As a user zooms in, details for individual data points or features becomes available and are shown (Figure 2). GTB always obtains data from the server not precomputed image tiles so that a user can completely customize each data track s display as needed. For instance, users can view a continuouslyvalued track as a histogram, line graph, filled line graph, or heatmap; users can further adjust the maximum, minimum, and total track height as well. For mapped read tracks and feature/annotation tracks, display modes include histogram, dense, squished, and packed. Figure 2. GTB renders data differently depending on whether a large or small region is being viewed. Mapped RNA-seq reads are visible as a histogram when viewing a large region (top); zooming in to view a smaller region shows the read structure (middle) and, finally, the read labels (bottom). Users can set a desired level of detail as well Integrating Analysis and Visualization GTB provides access to analysis tools in its visualizations, enabling users to run tools on currently visualized tracks to produce new datasets and tracks. GTB enables users to rerun the tool used to create a track using different parameter values for the visible genomic region. Rerunning a tool produces new output, and GTB automatically renders the new output when the tool is finished (Figure 3). Parameter values can be repeatedly changed and a tool can be rerun many times (see Figure 1 for an example). Running a tool and viewing its output can be done interactively (i.e. quickly) because a tool runs only on the subset of data visible to the user. This functionality is useful for seeing how parameter choices impact tool output, visually comparing output for different parameter values, and tuning values to return desired results. Once a user has chosen a set of parameter values, he can run the tool on the entire dataset, and Galaxy puts the tool output in his Galaxy analysis workspace. For instance, transcript assembly via the Cufflinks tool [24] is highly parameter dependent, generating different numbers of 6 5 Figure 3. Analysis tools can be run in GTB: (1) UCSC knowngene annotation track; (2) assembled transcripts from RNA-seq data; (3) intersections between tracks 1 and 2 with parameter for at least (overlap) of 1 base that was performed in Galaxy (not GTB); (4) interface for setting parameters and rerunning Intersect tool; (5) for at least (overlap) parameter set to 4000 and tool is run; (6) tool with new parameter values run on visible data and new track is rendered. transcripts with different characteristics depending on parameter values. Rerunning Cufflinks in GTB using different parameter values can help make clear how parameters influence the assembly process. Also, a user can quickly generate different Cufflinks assemblies using different parameter values and visually compare the assemblies to choose the best one. GTB provides a generic framework for integrating tools that requires little or no additional work from tool developers. Using 42
5 this framework, a tool s GTB configuration is specified in its Galaxy definition file. After a tool s GTB configuration is specified, all tracks generated using the tool will automatically provide the option to rerun the tool. This approach does not require any setup or configuration by GTB users, and it ensures that users only run tools that are compatible with GTB. However, it does limit users to running tools that have been explicitly configured to work with GTB and, for security reasons, prevents users from running arbitrary tools within GTB. GTB s approach to tool integration is detailed in Section 4.2 The following tools use GTB s integration framework and are currently available in GTB: (i) genomic interval operations such as intersecting, clustering, or subtracting interval sets; (ii) Cufflinks; and (iii) the Unified Genotyper, a SNP and indel caller [25]. GTB s tool integration framework makes it possible to use any Galaxy tool in a visualization provided it meets the following criteria: (a) it produces output that GTB can render and (b) it produces correct output when a subset of the input dataset is provided. Some bioinformatics tools require the complete input dataset to produce correct output; Section 6 discusses this issue in detail. GTB provides dynamic filters that show and hide data elements in real time as users adjust them (Figure 4). Each filter specifies a range for a particular attribute value; a user can set a filter s range by using a two-handle slider or by clicking on its text label and typing in a new range. Data elements with an attribute values outside the specified range are hidden. Filters are additive so that multiple filters can be applied simultaneously. GTB creates filters based on a track s type. Read tracks can be filtered by quality scores. Feature/annotation/interval tracks can be filtered by their score column. Tracks in GFF/GTF format can be filtered by score and by attributes that have numerical values. Filters are useful for visually identifying data elements that meet certain criteria and for understanding the distribution of attribute values in a dataset. At any time, a user can create a Galaxy dataset of the filtered data that is visible or create a dataset by applying the filter to the whole track. Newly created datasets are available in the user s Galaxy workspace for use or download. Figure 4 demonstrates how a set of transcripts might be meaningfully filtered. First, transcripts are filtered by score, which is a measure of relative expression amongst a set of isoforms; next, transcripts are filtered by FPKM, a measure of overall transcript expression. The remaining transcripts are dominant, highly expressed isoforms. 3.3 Sharing and Publishing Visualizations GTB visualizations are Galaxy objects and, like all Galaxy objects, can be shared or published to the Web using Galaxy s sharing and publication features [14]. Users share GTB visualizations via a GUI; no programming or server configuration is needed to share a GTB visualization. GTB visualizations can be shared in multiple ways. Visualizations can be shared with individuals or can be made accessible on the Web via a URL. A visualization can also be published in Galaxy s public repository, where it is browsable and 1 2 Figure 4. GTB dynamic filters applied to a feature/annotation track: (1) filtered for higher scoring features and (2) filtered for featues with higher FPKM (an abundance measure). searchable. GTB visualizations can also be embedded in Galaxy Pages. Pages are custom Web-based documents that enable users to communicate an entire computational experiment using standard document elements such as text, tables, and figures as well as interactive embedded datasets, workflows, and visualizations. Pages are ideal for an online publication supplement. A shared GTB visualization can be viewed using only a Web browser and is fully functional. A colleague or guest viewing a shared visualization can scroll, zoom, run tools, and dynamically filter data. No configuration is necessary, nor does any data or software need to be downloaded. 4 IMPLEMENTATION The Galaxy Track Browser uses an asynchronous HTTP clientserver model where the server is a Galaxy instance. The GTB client communicates with the server using seven distinct actions: (a) get chromosomes lengths; (b) get available tracks; (c) get track definition; (d) get data; (e) get reference genome data; (f) run a tool on a track subset; and (g) save. Each action corresponds to an asynchronous HTTP request, and all data exchanged between client and server is encoded in JSON format 1. Both the GTB and the Galaxy platform are open source under the Academic Free License [26]. 4.1 Client The GTB client is written entirely in object-oriented JavaScript. The client leverages JavaScript s ecosystem of libraries, APIs, and tools. The GTB client uses several jquery 2 libraries and adheres to CommonJS 3 encapsulation and modularity principles so that it can repurposed by other JavaScript applications. The GTB client s most frequent action is drawing tracks. Hence, the client is optimized to draw tracks as quickly as possible while ensuring that each track is completely
6 customizable. To meet these goals, the client fetches and caches track data from the server and draws tracks itself. As discussed below, drawing track data is very fast. By starting with the track data, the client can draw a track using any configuration specified by the user. Redrawing a track due to a configuration change is also fast because cached data can very often be reused and data need not be fetched from the server. The client renders genomic tracks as a set of adjacent tiles. This is advantageous because as the user zooms, pans, and scrolls, only new tiles need to be drawn; the client caches and reuses existing tiles when possible. Track tiles are drawn in the background and in parallel. Each tile drawn uses its own request to obtain the data to be drawn on the tile. Drawing tiles in the background ensures that the GTB client is responsive to user interaction while drawing. Drawing tiles in parallel ensures that delays encountered when drawing a tile, such as network latency or drawing a large number of elements, does not impact the drawing of other tiles or tracks. However, additional code is needed to determine when all of a track s tiles have been drawn so that post-draw action can be taken. Post-draw actions are used to animate the showing and hiding of data elements when a user is filtering data and to quickly fetch data when running tools. The GTB client renders each tile as an HTML <canvas> element. Canvas elements provide the ability to dynamically and precisely draw using a 2D API; the canvas element is supported by recent versions of all major Web browsers. The main advantages of using the canvas element are the speed, precision, and scale at which data elements can be drawn as compared to using HTML elements to represent data elements. Using JavaScript, the GTB client can render up to 5000 elements per tile while supporting smooth navigation throughout the visualization. The GTB client is modular and can be used with any server that implements the seven actions required by the client (listed previously). Tracks are added to a client by specifying a track definition that includes its name, dataset id, dataset type, filters, and tool (if there is one). Track filters and tools are structured so that the client can render and use them without requiring any a priori knowledge. Each filter includes an index into the track s data that denotes the value to use for filtering. A tool s parameters and inputs are encapsulated in an HTML form that the client can use as an interface that enables a user to set parameter values and run the tool. 4.2 GTB Server The GTB server is integrated into the Galaxy framework as a controller in Galaxy s model-view-controller architecture and implements the actions needed to support the client. In order to provide data quickly to the client, the GTB server creates and uses multiple indices for each dataset in a GTB visualization. The GTB server manages index creation and associations between datasets and indices so that neither users nor the client are burdened. Indices for a dataset are created when a client requests them or when a client requests data from the dataset. BAI indices are created for SAM/BAM datasets using SAMtools [27], tabix [17] indices are created for interval files (including BED, GFF, and VCF), and bigwig [16] indices are created for continuously-valued datasets. In addition, a summary tree index is created for all visualized datasets; a summary tree is a custom Galaxy format used to provide aggregated data over large genomic regions. The server stores associations between datasets and their indices so that, regardless of how often a dataset is used in visualizations, indices are created only once. For library datasets shared amongst all users of a Galaxy instance, indices are created once and shared as well. The GTB server takes multiple actions so that running a tool on a subset of data can be done quickly. The first time a tool is rerun, the server identifies all input datasets needed to produce the alternative dataset and creates indices for them. Indices are used to quickly extract data from the input datasets whenever they are used as inputs for a tool and hence reduce a tool s total execution time. Oftentimes indices will already be present because it is common to visualize both a tool s inputs and outputs, and the server will have created indices for all visualized datasets. To run a tool, the server uses indices to create small input datasets that contain only the data in the visible genomic region, runs the tool on these datasets via the Galaxy framework, and returns a track definition for the new output to the client. A tool integrated with Galaxy can be used in GTB with minimal additional work. Any tool run from the command line can be integrated into Galaxy via a tool wrapper; a tool wrapper is an XML file that specifies a tool s parameters, inputs and outputs. Adding the <trackster-conf> tag to a tool s wrapper indicates that it works correctly in GTB and will make the tool available in GTB. Tools compatible with GTB produce the same output for a particular genomic region regardless of whether the input is the subset of data from the region or the complete dataset. Section 6 discusses this issue in more detail. Each track definition that the GTB server sends to the client includes both the track s filters and its tool. The server determines filters based on the dataset s type; for GTF datasets, the server includes all attribute values that are numerical and hence are filterable. The server reuses the Galaxy framework to generate tool definitions, including an interface for specifying parameters. 4.3 Performance GTB s performance is dependent on numerous factors: client Web browser speed and memory, network latency, and server load and speed. Performance profiling and analysis of GTB has largely focused on the client because Galaxy, acting as GTB s server, can be scaled as needed to effectively support significant Web traffic. A full evaluation of the GTB client s performance will be undertaken soon. However, informal evaluation indicates that (a) rendering takes less than 0.5 seconds per track for all track types at all levels of detail, including data-intensive tracks such as mapped reads and ESTs; and (b) data transfer time is significantly larger than rendering time. These observations suggest that optimizing GTB to mitigate data transfers may lead to large performance gains. 5 CONTRASTING GENOME BROWSERS The Galaxy Track Browser complements other genome browser research by exploring an alternative approach that leverages Galaxy to provide novel genome browser functionality. This section contrasts GTB with six popular genome browsers that are compatible with NGS data: Ensembl [4], IGB [6], IGV [7], JBrowse [8], Savant [9], and UCSC [3]. Table 1 summarizes the functionality of GTB and these six browsers along several important dimensions. The similarities amongst all browsers are indicative of core browser functionality. All browsers can obtain data both from a user s local computer and from remote sources via HTTP. Ensembl, IGB, and IGV also support visualizing data via the DAS protocol [28] as well. All browsers provide support for viewing mapped reads in BAM format, interval or annotations in BED and GFF format, and continuously-valued data in WIG format. Full multi-resolution data models that support complete display customization are available in GTB, IGV, and Savant. IGB and JBrowse both use precomputed image data for some track types. IGB uses precomputed data to render tracks and does not provide customization options. JBrowse uses precomputed images rather 44
7 than data for quantitative (WIG) tracks. The three most recently developed browsers GTB, IGV, Can obtain data locally and from public sources and Savant all use full via HTTP and/or DAS multi-scale resolution data models. This trend Can view mapped reads (BAM), interval files is driven by user (BED, GFF), and continuously-valued data (WIG) demand for more Multi-resolution data model for all datatypes that interactive visualization supports complete display customization tools and by technological advances Run bioinformatics tools to produce and visualize in data indices. new data GTB s integration of analysis tools and its Filters for dynamically showing and hiding data dynamic filters are based on attribute values novel features not Share and customize fully interactive present in other genome visualizations browsers. Savant s plug-in framework has the potential to produce user interactions similar to GTB s tool use and filtering. Using Savant s framework, developers can write plug-ins that operate on visualized data; an example plugin is a SNP finder that highlights potential SNPs based on mapped reads. Both GTB and Savant aim to integrate analysis tools into a browser environment and each approach has advantages. Savant s plug-in framework is useful for prototyping new functionality because plug-in development is straightforward and requires little programming. Integrating tools into GTB/Galaxy is equally simple and potentially no programming is needed, but Galaxy requires that a tool run from the command line. GTB s strength is its integration into Galaxy, a fully functional analysis environment. Integration with Galaxy enables GTB to automatically leverage the tools already available in Galaxy without additional work by users or tool developers. Galaxy has a significant community of developers that have integrated many tools into Galaxy [12]. Making tools available in GTB is beneficial to tool developers because GTB provides access to their tools in a visualization environment. Users benefit because GTB is very usable: users can run tools and visualize output datasets without using a command line or installing any software. As tools are increasingly integrated into Savant and GTB, it may be appropriate to compare them not only with genome browsers but also with analysis environments that incorporate visualization components. Many genome browsers enable users to share data or even complete visualizations. The UCSC genome browser supports sharing custom tracks but not complete visualizations via URL, but there are limitations, including the deletion of tracks 48 hours after they were last accessed. IGB and IGV servers can be set up to make data publicly available or accessible via password. To use this data, IGB and IGV users must download and install the genome browser software and then find and add data from a public server. JBrowse and GTB make sharing data simpler because they are Web-based, and hence fully interactive visualizations can be shared via URL. However, GTB s approach for sharing and modifying visualizations is more flexible and user-friendly than JBrowse. JBrowse visualizations are publicly accessible once they are created; GTB visualizations can be shared with individual users, made accessible via URL, or published to Galaxy s repository. Adding datasets to a JBrowse visualization requires that a user access the server to install and preprocess the new data. GTB users add datasets to a visualization using a GUI. GTB users need no programming experience to share and modify visualizations. Table 1. Comparing genome browsers. Ensembl GTB IGB IGV JBrows e In summary, GTB is the only Web-based browser to implement current best practices amongst genome browsers for fetching, representing, and displaying data. Also, GTB s interface provides unique features, enabling users to run tools to produce new data, dynamically filter data, and share fully interactive and customizable visualizations. 6 DISCUSSION Savant UCSC 6.1 Making Visualization-compatible Analysis Tools Interactive visualizations such as genome browsers are useful primarily because users can, in real time, manipulate data and receive feedback. Analysis tools, then, must run quickly if they are to be useful additions to genome browsers. Currently, many bioinformatics analysis tools can run for minutes or hours, especially when processing large datasets, and are unsuitable for an interactive visualization. GTB addresses this issue by running an analysis tool on the subset of data for the genomic region being viewed. Seeing a tool s output for a chosen genomic region provides useful information while minimizing a tool s execution time. For many analysis tools, this approach is sufficient to ensure that runtime is on the order of seconds. However, this approach is not compatible with all analysis tools. Some tools require not only input data for a particular genomic region but also data from other regions or from all regions. There are two common reasons that data outside a region of interest may be needed. First, a tool may build a global model as part of its execution; for instance, both the transcript assembly tool Cufflinks [24] and the ChIP-seq analysis program MACS [29] use global models to perform normalization. Second, a tool may require a complete input dataset because it is not possible to identify, prior to runtime, a subset of input data needed to produce correct output in a particular genomic region. This is true for the NGS read mapping tools Bowtie [30] and Tophat [31]. We are exploring two approaches to address this issue. One approach is to augment tools to store a global model for an input dataset and use the stored model when processing a subset of the input. We have successfully applied this approach to modify Cufflinks to work with subsets of input data. As the use of analysis tools in genome browsers becomes more common, tool developers are likely to support stored global models and other approaches that enable tools to be run on subsets of data or particular genomic regions. For tools that cannot be modified to run on subsets of input data, an alternative approach is to use dynamic filtering to 45
8 simulate running a tool using different parameters. In this approach, a tool s parameters are relaxed so that all potential output is produced and attribute values are attached to output data. A user can then use attribute values to filter data and observe which data points would be produced for particular parameter values. This approach is ideal for tools that use parameter settings to omit data from their output, including Bowtie and Tophat. 6.2 Towards a Visual Analysis Environment Currently, GTB users can rerun tools to both understand how tool parameter values influence tool output and to refine their selection of parameter values to achieve a desired output. This a first step toward creating a general purpose visual analysis environment. GUI environments for bioinformatics analyses such as Galaxy and GenePattern [32] have proven very popular because they make it easier for non-programmers to run bioinformatics tools and perform complete analyses. GUI environments represent inputs, tools, and outputs using text and utilize GUI widgets such as drop-down boxes and text fields that enable users to choose inputs and tools. A complementary approach to GUI environments is a visual analysis environment where a user performs the same actions visually. A visual analysis environment is another approach for making bioinformatics tools and especially outputs more accessible. We are extending GTB so that users can run any compatible tool on any set of input tracks. For example, to create a track of the intersecting regions for two annotation tracks, a user might select the intersect tool and then drag two tracks onto the new track. GTB would then run the tool and renders the tool output. Repeated use of tools would produce more tracks and tracks could be organized so that a user could scroll through the tracks to visually see the steps in her analysis. When inputs and tools can be selected and used in GTB, GTB will function very much like a standard GUI bioinformatics environment. 7 CONCLUDING THOUGHTS Next-generation sequencing technologies and analysis tools present new challenges for genome browsers. NGS datasets are extremely large and growing, and full multi-resolution data models are needed to enable smooth interaction with such large datasets. NGS tools and analyses are complicated and often require use of both tools and visualizations to understand, debug, and improve. To support use of analysis and visualization together, genome browsers need to integrate tools and enable tools to be used to produce new data, all within a browser. NGS experiments are highly collaborative, and genome browsers should facilitate fast and simple sharing of visualizations; shared visualizations can also play a large role when publishing analyses. The Galaxy Track Browser is a Web-based genome browser integrated into the Galaxy platform that addresses these challenges. GTB is the only Web-based genome browser that provides a full multi-resolution data model. GTB provides tools and dynamic filters that enable users to produce, visualize, and analyze data all within GTB. Using the Galaxy framework, fully-functional GTB visualizations can be shared with colleagues or published to the Web. GTB is user friendly; GTB requires no programming experience to use and all GTB functionality is available using only a Web browser. REFERENCES [1] M. S. Cline and W. J. Kent, "Understanding genome browsing," Nat. Biotechnol., vol. 27, pp , [2] C. B. Nielsen, et al., "Visualizing genomes: techniques and challenges," Nature Methods, vol. 7, pp. S5-S15-S5-S15, [3] W. J. Kent, "The human genome browser at UCSC," Genome Res., vol. 12, pp , [4] P. Flicek, et al., "Ensembl 2011," Nucleic Acids Research, vol. 39, pp. D800-D806, January 1, [5] L. D. Stein, et al., "The Generic Genome Browser: A Building Block for a Model Organism System Database," Genome Research, vol. 12, pp , [6] J. W. Nicol, et al., "The Integrated Genome Browser: free software for distribution and exploration of genome-scale data sets," Bioinformatics, vol. 25, pp , [7] J. T. Robinson, et al., "Integrative genomics viewer," Nat Biotech, vol. 29, pp , [8] M. E. Skinner, et al., "JBrowse: A next-generation genome browser," Genome Research, vol. 19, pp , [9] M. Fiume, et al., "Savant: genome browser for high-throughput sequencing data," Bioinformatics, vol. 26, pp , [10] E. R. Mardis, "Next-generation DNA sequencing methods," Annu. Rev. Genomics Hum. Genet., vol. 9, pp , [11] S. Pepke, et al., "Computation for ChIP-seq and RNA-seq studies," Nat Meth, vol. 6, pp. S22-S32-S22-S32, [12] D. Blankenberg, et al., "A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly," Genome Research, vol. 17, pp , [13] B. Giardine, "Galaxy: a platform for interactive large-scale genome analysis," Genome Res., vol. 15, pp , [14] J. Goecks, et al., "Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences," Genome Biology, vol. 11, pp. R86-R86, [15] Z. Wang, et al., "RNA-Seq: a revolutionary tool for transcriptomics," Nat Rev Genet, vol. 10, pp , Jan [16] W. J. Kent, et al., "BigWig and BigBed: enabling browsing of large distributed datasets," Bioinformatics, vol. 26, pp , [17] H. Li, "Tabix: fast retrieval of sequence features from generic TABdelimited files," Bioinformatics, vol. 27, pp , [18] J. Tonti-Filippini. (April 21, 2011). Anno-J: Annotation Browsing 2.0. Available: [19] K. A. Cook and J. J. Thomas, Illuminating the Path: The Research and Development Agenda for Visual Analytics, [20] W. J. Kent, "BLAT--The BLAST-Like Alignment Tool," Genome Research, vol. 12, pp , [21] M. C. Schatz, et al., "Hawkeye: an interactive visual analytics tool for genome assemblies," Genome Biol., vol. 8, p. R34, [22] J. P. A. Ioannidis, et al., "Repeatability of published microarray gene expression analyses," Nat Genet, vol. 41, pp , [23] "Devil in the details," Nature, vol. 470, pp , [24] C. Trapnell, et al., "Transcript assembly and quantification by RNA- Seq reveals unannotated transcripts and isoform switching during cell differentiation," Nat Biotech, vol. 28, pp , [25] A. McKenna, et al., "The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data," Genome Res, vol. 20, pp , Sep [26] Open Source Initiative. (April 21, 2011). The Academic Free License 3.0. Available: [27] H. Li, et al., "The Sequence Alignment/Map format and SAMtools," Bioinformatics, vol. 25, pp , [28] R. Dowell, et al., "The Distributed Annotation System," BMC Bioinformatics, vol. 2, p. 7, [29] Y. Zhang, et al., "Model-based Analysis of ChIP-Seq (MACS)," Genome Biology, vol. 9, pp. R137-R137, [30] B. Langmead, et al., "Ultrafast and memory-efficient alignment of short DNA sequences to the human genome," Genome Biology, vol. 10, pp. R25-R25, [31] C. Trapnell, et al., "TopHat: discovering splice junctions with RNA- Seq," Bioinformatics, vol. 25, pp , [32] M. Reich, et al., "GenePattern 2.0," Nature Genetics, vol. 38, pp ,
NGS Data Visualization and Exploration Using IGV
1 What is Galaxy Galaxy for Bioinformaticians Galaxy for Experimental Biologists Using Galaxy for NGS Analysis NGS Data Visualization and Exploration Using IGV 2 What is Galaxy Galaxy for Bioinformaticians
More informationGalaxy. Daniel Blankenberg The Galaxy Team
Galaxy Daniel Blankenberg The Galaxy Team http://galaxyproject.org Overview What is Galaxy? What you can do in Galaxy analysis interface, tools and datasources data libraries workflows visualization sharing
More informationGalaxy Platform For NGS Data Analyses
Galaxy Platform For NGS Data Analyses Weihong Yan wyan@chem.ucla.edu Collaboratory Web Site http://qcb.ucla.edu/collaboratory Collaboratory Workshops Workshop Outline ü Day 1 UCLA galaxy and user account
More informationGenome Browser Background and Strategy
Genome Browser Background and Strategy April 12th, 2017 BIOL 7210 - Faction I (Outbreak) - Genome Browser Group Adam Dabrowski Mrunal Dehankar Shareef Khalid Hubert Pan Ajay Ramakrishnan Ankit Srivastava
More informationA short Introduction to UCSC Genome Browser
A short Introduction to UCSC Genome Browser Elodie Girard, Nicolas Servant Institut Curie/INSERM U900 Bioinformatics, Biostatistics, Epidemiology and computational Systems Biology of Cancer 1 Why using
More informationGenomic Analysis with Genome Browsers.
Genomic Analysis with Genome Browsers http://barc.wi.mit.edu/hot_topics/ 1 Outline Genome browsers overview UCSC Genome Browser Navigating: View your list of regions in the browser Available tracks (eg.
More informationGenome Browser. Background & Strategy. Spring 2017 Faction II
Genome Browser Background & Strategy Spring 2017 Faction II Outline Beginning of the Last Phase Goals State of Art Applicable Genome Browsers Not So Genome Browsers Storing Data Strategy for the website
More informationWeb-Based Visualization and Visual Analysis for High-Throughput Genomics. Jeremy Goecks! Computational Biology Institute
Web-Based Visualization and Visual Analysis for High-Throughput Genomics with Galaxy! Jeremy Goecks! Computational Biology Institute Topics Galaxy Visualization framework Large-scale visualization Integrated
More informationReproducible & Transparent Computational Science with Galaxy. Jeremy Goecks The Galaxy Team
Reproducible & Transparent Computational Science with Galaxy Jeremy Goecks The Galaxy Team 1 Doing Good Science Previous talks: performing an analysis setting up and scaling Galaxy adding tools libraries
More informationChIP-seq hands-on practical using Galaxy
ChIP-seq hands-on practical using Galaxy In this exercise we will cover some of the basic NGS analysis steps for ChIP-seq using the Galaxy framework: Quality control Mapping of reads using Bowtie2 Peak-calling
More informationBGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14)
BGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14) Genome Informatics (Part 1) https://bioboot.github.io/bggn213_f17/lectures/#14 Dr. Barry Grant Nov 2017 Overview: The purpose of this lab session is
More informationNGS Analysis Using Galaxy
NGS Analysis Using Galaxy Sequences and Alignment Format Galaxy overview and Interface Get;ng Data in Galaxy Analyzing Data in Galaxy Quality Control Mapping Data History and workflow Galaxy Exercises
More informationIntroduction to Galaxy
Introduction to Galaxy Dr Jason Wong Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW Day 1 Thurs 28 th January 2016 Overview What is Galaxy? Description of
More informationIntroduction to Read Alignment. UCD Genome Center Bioinformatics Core Tuesday 15 September 2015
Introduction to Read Alignment UCD Genome Center Bioinformatics Core Tuesday 15 September 2015 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG
More informationNGS FASTQ file format
NGS FASTQ file format Line1: Begins with @ and followed by a sequence idenefier and opeonal descripeon Line2: Raw sequence leiers Line3: + Line4: Encodes the quality values for the sequence in Line2 (see
More informationAccessible, Transparent and Reproducible Analysis with Galaxy
Accessible, Transparent and Reproducible Analysis with Galaxy Application of Next Generation Sequencing Technologies for Whole Transcriptome and Genome Analysis ABRF 2013 Saturday, March 2, 2013 Palm Springs,
More informationRNA-seq. Manpreet S. Katari
RNA-seq Manpreet S. Katari Evolution of Sequence Technology Normalizing the Data RPKM (Reads per Kilobase of exons per million reads) Score = R NT R = # of unique reads for the gene N = Size of the gene
More informationAdvanced UCSC Browser Functions
Advanced UCSC Browser Functions Dr. Thomas Randall tarandal@email.unc.edu bioinformatics.unc.edu UCSC Browser: genome.ucsc.edu Overview Custom Tracks adding your own datasets Utilities custom tools for
More informationChIP-Seq Tutorial on Galaxy
1 Introduction ChIP-Seq Tutorial on Galaxy 2 December 2010 (modified April 6, 2017) Rory Stark The aim of this practical is to give you some experience handling ChIP-Seq data. We will be working with data
More informationAnalyzing Variant Call results using EuPathDB Galaxy, Part II
Analyzing Variant Call results using EuPathDB Galaxy, Part II In this exercise, we will work in groups to examine the results from the SNP analysis workflow that we started yesterday. The first step is
More informationGenome Browsers - The UCSC Genome Browser
Genome Browsers - The UCSC Genome Browser Background The UCSC Genome Browser is a well-curated site that provides users with a view of gene or sequence information in genomic context for a specific species,
More informationJBrowse. To get started early: Double click VirtualBox on the desktop Click JBrowse 2016 Tutorial Click Start
JBrowse To get started early: Double click VirtualBox on the desktop Click JBrowse 2016 Tutorial Click Start JBrowse PAG 2015 Scott Cain GMOD Coordinator scott@scottcain.net What is GMOD? A set of interoperable
More informationSequence Analysis Pipeline
Sequence Analysis Pipeline Transcript fragments 1. PREPROCESSING 2. ASSEMBLY (today) Removal of contaminants, vector, adaptors, etc Put overlapping sequence together and calculate bigger sequences 3. Analysis/Annotation
More informationDr. Gabriela Salinas Dr. Orr Shomroni Kaamini Rhaithata
Analysis of RNA sequencing data sets using the Galaxy environment Dr. Gabriela Salinas Dr. Orr Shomroni Kaamini Rhaithata Microarray and Deep-sequencing core facility 30.10.2017 RNA-seq workflow I Hypothesis
More informationAdvanced genome browsers: Integrated Genome Browser and others Heiko Muller Computational Research
Genomic Computing, DEIB, 4-7 March 2013 Advanced genome browsers: Integrated Genome Browser and others Heiko Muller Computational Research IIT@SEMM heiko.muller@iit.it List of Genome Browsers Alamut Annmap
More informationDavid Crossman, Ph.D. UAB Heflin Center for Genomic Science. GCC2012 Wednesday, July 25, 2012
David Crossman, Ph.D. UAB Heflin Center for Genomic Science GCC2012 Wednesday, July 25, 2012 Galaxy Splash Page Colors Random Galaxy icons/colors Queued Running Completed Download/Save Failed Icons Display
More informationAnalyzing ChIP- Seq Data in Galaxy
Analyzing ChIP- Seq Data in Galaxy Lauren Mills RISS ABSTRACT Step- by- step guide to basic ChIP- Seq analysis using the Galaxy platform. Table of Contents Introduction... 3 Links to helpful information...
More informationChIP-seq hands-on practical using Galaxy
ChIP-seq hands-on practical using Galaxy In this exercise we will cover some of the basic NGS analysis steps for ChIP-seq using the Galaxy framework: Quality control Mapping of reads using Bowtie2 Peak-calling
More informationGalaxy workshop at the Winter School Igor Makunin
Galaxy workshop at the Winter School 2016 Igor Makunin i.makunin@uq.edu.au Winter school, UQ, July 6, 2016 Plan Overview of the Genomics Virtual Lab Introduce Galaxy, a web based platform for analysis
More informationWelcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page.
Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page. In this page you will learn to use the tools of the MAPHiTS suite. A little advice before starting : rename your
More informationIntegrative Genomics Viewer. Prat Thiru
Integrative Genomics Viewer Prat Thiru 1 Overview User Interface Basics Browsing the Data Data Formats IGV Tools Demo Outline Based on ISMB 2010 Tutorial by Robinson and Thorvaldsdottir 2 Why IGV? IGV
More informationBIOINFORMATICS. Savant: Genome Browser for High Throughput Sequencing Data
BIOINFORMATICS Vol. 00 no. 00 2010 Pages 1 6 Savant: Genome Browser for High Throughput Sequencing Data Marc Fiume 1,, Vanessa Williams 1, and Michael Brudno 1,2 1 Department of Computer Science, University
More informationBackground and Strategy. Smitha, Adrian, Devin, Jeff, Ali, Sanjeev, Karthikeyan
Background and Strategy Smitha, Adrian, Devin, Jeff, Ali, Sanjeev, Karthikeyan What is a genome browser? A web/desktop based graphical tool for rapid and reliable display of any requested portion of the
More informationProtocol: peak-calling for ChIP-seq data / segmentation analysis for histone modification data
Protocol: peak-calling for ChIP-seq data / segmentation analysis for histone modification data Table of Contents Protocol: peak-calling for ChIP-seq data / segmentation analysis for histone modification
More informationChIP-seq (NGS) Data Formats
ChIP-seq (NGS) Data Formats Biological samples Sequence reads SRA/SRF, FASTQ Quality control SAM/BAM/Pileup?? Mapping Assembly... DE Analysis Variant Detection Peak Calling...? Counts, RPKM VCF BED/narrowPeak/
More informationHelpful Galaxy screencasts are available at:
This user guide serves as a simplified, graphic version of the CloudMap paper for applicationoriented end-users. For more details, please see the CloudMap paper. Video versions of these user guides and
More informationToday's outline. Resources. Genome browser components. Genome browsers: Discovering biology through genomics. Genome browser tutorial materials
Today's outline Genome browsers: Discovering biology through genomics BaRC Hot Topics April 2013 George Bell, Ph.D. http://jura.wi.mit.edu/bio/education/hot_topics/ Genome browser introduction Popular
More informationGenome Browsers Guide
Genome Browsers Guide Take a Class This guide supports the Galter Library class called Genome Browsers. See our Classes schedule for the next available offering. If this class is not on our upcoming schedule,
More informationmrna-seq Basic processing Read mapping (shown here, but optional. May due if time allows) Gene expression estimation
mrna-seq Basic processing Read mapping (shown here, but optional. May due if time allows) Tophat Gene expression estimation cufflinks Confidence intervals Gene expression changes (separate use case) Sample
More informationWeb-based visual analysis for high-throughput genomics
Web-based visual analysis for high-throughput genomics Jeremy Goecks, Emory University Carl Eberhard, Emory University Tomithy Too, National University of Singapore The Galaxy Team, Emory University Anton
More informationRNA-Seq Analysis With the Tuxedo Suite
June 2016 RNA-Seq Analysis With the Tuxedo Suite Dena Leshkowitz Introduction In this exercise we will learn how to analyse RNA-Seq data using the Tuxedo Suite tools: Tophat, Cuffmerge, Cufflinks and Cuffdiff.
More informationde.nbi and its Galaxy interface for RNA-Seq
de.nbi and its Galaxy interface for RNA-Seq Jörg Fallmann Thanks to Björn Grüning (RBC-Freiburg) and Sarah Diehl (MPI-Freiburg) Institute for Bioinformatics University of Leipzig http://www.bioinf.uni-leipzig.de/
More informationThe UCSC Gene Sorter, Table Browser & Custom Tracks
The UCSC Gene Sorter, Table Browser & Custom Tracks Advanced searching and discovery using the UCSC Table Browser and Custom Tracks Osvaldo Graña Bioinformatics Unit, CNIO 1 Table Browser and Custom Tracks
More informationChIP-seq practical: peak detection and peak annotation. Mali Salmon-Divon Remco Loos Myrto Kostadima
ChIP-seq practical: peak detection and peak annotation Mali Salmon-Divon Remco Loos Myrto Kostadima March 2012 Introduction The goal of this hands-on session is to perform some basic tasks in the analysis
More informationColorado State University Bioinformatics Algorithms Assignment 6: Analysis of High- Throughput Biological Data Hamidreza Chitsaz, Ali Sharifi- Zarchi
Colorado State University Bioinformatics Algorithms Assignment 6: Analysis of High- Throughput Biological Data Hamidreza Chitsaz, Ali Sharifi- Zarchi Although a little- bit long, this is an easy exercise
More informationAnalysis of ChIP-seq data
Before we start: 1. Log into tak (step 0 on the exercises) 2. Go to your lab space and create a folder for the class (see separate hand out) 3. Connect to your lab space through the wihtdata network and
More informationUsing Galaxy to provide a NGS Analysis Platform
11/15/11 Using Galaxy to provide a NGS Analysis Platform Friedrich Miescher Institute - part of the Novartis Research Foundation - affiliated institute of Basel University - member of Swiss Institute of
More informationGetting Started. April Strand Life Sciences, Inc All rights reserved.
Getting Started April 2015 Strand Life Sciences, Inc. 2015. All rights reserved. Contents Aim... 3 Demo Project and User Interface... 3 Downloading Annotations... 4 Project and Experiment Creation... 6
More informationEasy visualization of the read coverage using the CoverageView package
Easy visualization of the read coverage using the CoverageView package Ernesto Lowy European Bioinformatics Institute EMBL June 13, 2018 > options(width=40) > library(coverageview) 1 Introduction This
More informationSequencing Data. Paul Agapow 2011/02/03
Webservices for Next Generation Sequencing Data Paul Agapow 2011/02/03 Aims Assumed parameters: Must have a system for non-technical users to browse and manipulate their Next Generation Sequencing (NGS)
More informationRNA-Seq in Galaxy: Tuxedo protocol. Igor Makunin, UQ RCC, QCIF
RNA-Seq in Galaxy: Tuxedo protocol Igor Makunin, UQ RCC, QCIF Acknowledgments Genomics Virtual Lab: gvl.org.au Galaxy for tutorials: galaxy-tut.genome.edu.au Galaxy Australia: galaxy-aust.genome.edu.au
More informationBioinformatics in next generation sequencing projects
Bioinformatics in next generation sequencing projects Rickard Sandberg Assistant Professor Department of Cell and Molecular Biology Karolinska Institutet March 2011 Once sequenced the problem becomes computational
More informationCLC Server. End User USER MANUAL
CLC Server End User USER MANUAL Manual for CLC Server 10.0.1 Windows, macos and Linux March 8, 2018 This software is for research purposes only. QIAGEN Aarhus Silkeborgvej 2 Prismet DK-8000 Aarhus C Denmark
More informationversion /1/2011 Source code Linux x86_64 binary Mac OS X x86_64 binary
Cufflinks RNA-Seq analysis tools - Getting Started 1 of 6 14.07.2011 09:42 Cufflinks Transcript assembly, differential expression, and differential regulation for RNA-Seq Site Map Home Getting started
More informationDRAGEN Bio-IT Platform Enabling the Global Genomic Infrastructure
TM DRAGEN Bio-IT Platform Enabling the Global Genomic Infrastructure About DRAGEN Edico Genome s DRAGEN TM (Dynamic Read Analysis for GENomics) Bio-IT Platform provides ultra-rapid secondary analysis of
More informationGenome Browser. Background and Strategy
Genome Browser Background and Strategy Contents What is a genome browser? Purpose of a genome browser Examples Structure Extra Features Contents What is a genome browser? Purpose of a genome browser Examples
More informationSingle/paired-end RNAseq analysis with Galaxy
October 016 Single/paired-end RNAseq analysis with Galaxy Contents: 1. Introduction. Quality control 3. Alignment 4. Normalization and read counts 5. Workflow overview 6. Sample data set to test the paired-end
More informationCyverse tutorial 1 Logging in to Cyverse and data management. Open an Internet browser window and navigate to the Cyverse discovery environment:
Cyverse tutorial 1 Logging in to Cyverse and data management Open an Internet browser window and navigate to the Cyverse discovery environment: https://de.cyverse.org/de/ Click Log in with your CyVerse
More informationLong Read RNA-seq Mapper
UNIVERSITY OF ZAGREB FACULTY OF ELECTRICAL ENGENEERING AND COMPUTING MASTER THESIS no. 1005 Long Read RNA-seq Mapper Josip Marić Zagreb, February 2015. Table of Contents 1. Introduction... 1 2. RNA Sequencing...
More informationGoal: Learn how to use various tool to extract information from RNAseq reads. 4.1 Mapping RNAseq Reads to a Genome Assembly
ESSENTIALS OF NEXT GENERATION SEQUENCING WORKSHOP 2014 UNIVERSITY OF KENTUCKY AGTC Class 4 RNAseq Goal: Learn how to use various tool to extract information from RNAseq reads. Input(s): magnaporthe_oryzae_70-15_8_supercontigs.fasta
More informationSimile Tools Workshop Summary MacKenzie Smith, MIT Libraries
Simile Tools Workshop Summary MacKenzie Smith, MIT Libraries Intro On June 10 th and 11 th, 2010 a group of Simile Exhibit users, software developers and architects met in Washington D.C. to discuss the
More informationepigenomegateway.wustl.edu
Everything can be found at epigenomegateway.wustl.edu REFERENCES 1. Zhou X, et al., Nature Methods 8, 989-990 (2011) 2. Zhou X & Wang T, Current Protocols in Bioinformatics Unit 10.10 (2012) 3. Zhou X,
More informationUsing the Galaxy Local Bioinformatics Cloud at CARC
Using the Galaxy Local Bioinformatics Cloud at CARC Lijing Bu Sr. Research Scientist Bioinformatics Specialist Center for Evolutionary and Theoretical Immunology (CETI) Department of Biology, University
More informationServices Performed. The following checklist confirms the steps of the RNA-Seq Service that were performed on your samples.
Services Performed The following checklist confirms the steps of the RNA-Seq Service that were performed on your samples. SERVICE Sample Received Sample Quality Evaluated Sample Prepared for Sequencing
More informationHigh-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg
High-throughput sequencing: Alignment and related topic Simon Anders EMBL Heidelberg Established platforms HTS Platforms Illumina HiSeq, ABI SOLiD, Roche 454 Newcomers: Benchtop machines 454 GS Junior,
More informationBIOINFORMATICS ORIGINAL PAPER
BIOINFORMATICS ORIGINAL PAPER Vol. 27 no. 14 2011, pages 1889 1893 doi:10.1093/bioinformatics/btr309 Genome analysis Advance Access publication May 19, 2011 GenPlay, a multipurpose genome analyzer and
More informationGALAXY BIOINFORMATICS WORKFLOW ENVIRONMENT. Rutger Vos, 3 April 2012
GALAXY BIOINFORMATICS WORKFLOW ENVIRONMENT Rutger Vos, 3 April 2012 Overview Informatics in the post-genomic era The past (?) Analyses glued together using scripting languages, directly on the CLI or in
More informationExercise 2: Browser-Based Annotation and RNA-Seq Data
Exercise 2: Browser-Based Annotation and RNA-Seq Data Jeremy Buhler July 24, 2018 This exercise continues your introduction to practical issues in comparative annotation. You ll be annotating genomic sequence
More informationHow to store and visualize RNA-seq data
How to store and visualize RNA-seq data Gabriella Rustici Functional Genomics Group gabry@ebi.ac.uk EBI is an Outstation of the European Molecular Biology Laboratory. Talk summary How do we archive RNA-seq
More informationAligners. J Fass 21 June 2017
Aligners J Fass 21 June 2017 Definitions Assembly: I ve found the shredded remains of an important document; put it back together! UC Davis Genome Center Bioinformatics Core J Fass Aligners 2017-06-21
More informationTutorial. RNA-Seq Analysis of Breast Cancer Data. Sample to Insight. November 21, 2017
RNA-Seq Analysis of Breast Cancer Data November 21, 2017 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com AdvancedGenomicsSupport@qiagen.com
More informationIllumina Next Generation Sequencing Data analysis
Illumina Next Generation Sequencing Data analysis Chiara Dal Fiume Sr Field Application Scientist Italy 2010 Illumina, Inc. All rights reserved. Illumina, illuminadx, Solexa, Making Sense Out of Life,
More informationUCSC Genome Browser ASHG 2014 Workshop
UCSC Genome Browser ASHG 2014 Workshop We will be using human assembly hg19. Some steps may seem a bit cryptic or truncated. That is by design, so you will think about things as you go. In this document,
More informationRNA-seq Data Analysis
Seyed Abolfazl Motahari RNA-seq Data Analysis Basics Next Generation Sequencing Biological Samples Data Cost Data Volume Big Data Analysis in Biology تحلیل داده ها کنترل سیستمهای بیولوژیکی تشخیص بیماریها
More informationIntroduction to Genome Browsers
Introduction to Genome Browsers Rolando Garcia-Milian, MLS, AHIP (Rolando.milian@ufl.edu) Department of Biomedical and Health Information Services Health Sciences Center Libraries, University of Florida
More informationSupplementary Figure 1. Fast read-mapping algorithm of BrowserGenome.
Supplementary Figure 1 Fast read-mapping algorithm of BrowserGenome. (a) Indexing strategy: The genome sequence of interest is divided into non-overlapping 12-mers. A Hook table is generated that contains
More informationGenome Environment Browser (GEB) user guide
Genome Environment Browser (GEB) user guide GEB is a Java application developed to provide a dynamic graphical interface to visualise the distribution of genome features and chromosome-wide experimental
More informationExpression Analysis with the Advanced RNA-Seq Plugin
Expression Analysis with the Advanced RNA-Seq Plugin May 24, 2016 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com
More informationUsing Galaxy to provide a NGS Analysis Platform GTC s NGS & Bioinformatics Summit Europe October 7-8, 2013 in Berlin, Germany.
Using Galaxy to provide a NGS Analysis Platform GTC s NGS & Bioinformatics Summit Europe October 7-8, 2013 in Berlin, Germany. (public version) Hans-Rudolf Hotz ( hrh@fmi.ch ) Friedrich Miescher Institute
More informationGalaxy. Data intensive biology for everyone. / #usegalaxy
Galaxy Data intensive biology for everyone. www.galaxyproject.org @jxtx / #usegalaxy Engineering Dannon Baker Dan Blankenberg Dave Bouvier Nate Coraor Carl Eberhard Jeremy Goecks Sam Guerler Greg von Kuster
More informationBovineMine Documentation
BovineMine Documentation Release 1.0 Deepak Unni, Aditi Tayal, Colin Diesh, Christine Elsik, Darren Hag Oct 06, 2017 Contents 1 Tutorial 3 1.1 Overview.................................................
More informationm6aviewer Version Documentation
m6aviewer Version 1.6.0 Documentation Contents 1. About 2. Requirements 3. Launching m6aviewer 4. Running Time Estimates 5. Basic Peak Calling 6. Running Modes 7. Multiple Samples/Sample Replicates 8.
More informationIntroduction to Galaxy
Introduction to Galaxy Saint Louis University St. Louis, Missouri April 30, 2013 Dave Clements, Emory University http://galaxyproject.org/ Agenda 9:00 Welcome 9:20 Basic Analysis with Galaxy 10:30 Basic
More informationMapping NGS reads for genomics studies
Mapping NGS reads for genomics studies Valencia, 28-30 Sep 2015 BIER Alejandro Alemán aaleman@cipf.es Genomics Data Analysis CIBERER Where are we? Fastq Sequence preprocessing Fastq Alignment BAM Visualization
More informationAnaquin - Vignette Ted Wong January 05, 2019
Anaquin - Vignette Ted Wong (t.wong@garvan.org.au) January 5, 219 Citation [1] Representing genetic variation with synthetic DNA standards. Nature Methods, 217 [2] Spliced synthetic genes as internal controls
More informationHigh-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg
High-throughput sequencing: Alignment and related topic Simon Anders EMBL Heidelberg Established platforms HTS Platforms Illumina HiSeq, ABI SOLiD, Roche 454 Newcomers: Benchtop machines: Illumina MiSeq,
More informationSolexaLIMS: A Laboratory Information Management System for the Solexa Sequencing Platform
SolexaLIMS: A Laboratory Information Management System for the Solexa Sequencing Platform Brian D. O Connor, 1, Jordan Mendler, 1, Ben Berman, 2, Stanley F. Nelson 1 1 Department of Human Genetics, David
More informationGenomeStudio Software Release Notes
GenomeStudio Software 2009.2 Release Notes 1. GenomeStudio Software 2009.2 Framework... 1 2. Illumina Genome Viewer v1.5...2 3. Genotyping Module v1.5... 4 4. Gene Expression Module v1.5... 6 5. Methylation
More informationWilson Leung 01/03/2018 An Introduction to NCBI BLAST. Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment
An Introduction to NCBI BLAST Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment Resources: The BLAST web server is available at https://blast.ncbi.nlm.nih.gov/blast.cgi
More informationITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013
ITMO Ecole de Bioinformatique Hands-on session: smallrna-seq N. Servant 21 rd November 2013 1. Data and objectives We will use the data from GEO (GSE35368, Toedling, Servant et al. 2011). Two samples were
More informationINTRODUCTION TO BIOINFORMATICS
Molecular Biology-2017 1 INTRODUCTION TO BIOINFORMATICS In this section, we want to provide a simple introduction to using the web site of the National Center for Biotechnology Information NCBI) to obtain
More informationW ASHU E PI G ENOME B ROWSER
W ASHU E PI G ENOME B ROWSER Keystone Symposium on DNA and RNA Methylation January 23 rd, 2018 Fairmont Hotel Vancouver, Vancouver, British Columbia, Canada Presenter: Renee Sears and Josh Jang Tutorial
More informationHigh-throughout sequencing and using short-read aligners. Simon Anders
High-throughout sequencing and using short-read aligners Simon Anders High-throughput sequencing (HTS) Sequencing millions of short DNA fragments in parallel. a.k.a.: next-generation sequencing (NGS) massively-parallel
More informationTutorial 1: Exploring the UCSC Genome Browser
Last updated: May 12, 2011 Tutorial 1: Exploring the UCSC Genome Browser Open the homepage of the UCSC Genome Browser at: http://genome.ucsc.edu/ In the blue bar at the top, click on the Genomes link.
More informationDistributed Visualization for Genomic Analysis
Distributed Visualization for Genomic Analysis Alyssa Morrow Anthony D. Joseph, Ed. Nir Yosef, Ed. Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No.
More informationOur typical RNA quantification pipeline
RNA-Seq primer Our typical RNA quantification pipeline Upload your sequence data (fastq) Align to the ribosome (Bow>e) Align remaining reads to genome (TopHat) or transcriptome (RSEM) Make report of quality
More informationUsing Galaxy to Perform Large-Scale Interactive Data Analyses
Using Galaxy to Perform Large-Scale Interactive Data Analyses Jennifer Hillman-Jackson, 1 Dave Clements, 2 Daniel Blankenberg, 1 James Taylor, 2 Anton Nekrutenko, 1 and Galaxy Team 1,2 UNIT 10.5 1 Penn
More informationGSNAP: Fast and SNP-tolerant detection of complex variants and splicing in short reads by Thomas D. Wu and Serban Nacu
GSNAP: Fast and SNP-tolerant detection of complex variants and splicing in short reads by Thomas D. Wu and Serban Nacu Matt Huska Freie Universität Berlin Computational Methods for High-Throughput Omics
More informationViTraM: VIsualization of TRAnscriptional Modules
ViTraM: VIsualization of TRAnscriptional Modules Version 1.0 June 1st, 2009 Hong Sun, Karen Lemmens, Tim Van den Bulcke, Kristof Engelen, Bart De Moor and Kathleen Marchal KULeuven, Belgium 1 Contents
More informationBiosphere: the interoperation of web services in microarray cluster analysis
Biosphere: the interoperation of web services in microarray cluster analysis Kei-Hoi Cheung 1,2,*, Remko de Knikker 1, Youjun Guo 1, Guoneng Zhong 1, Janet Hager 3,4, Kevin Y. Yip 5, Albert K.H. Kwan 5,
More information