University of Chicago launches cloud to analyze cancer data

University of Chicago launches cloud to analyze cancer data

Bionimbus Protected Data Cloud eliminates need for massive storage infrastructure

May 15, 2013

The University of Chicago is launching the first secure cloud-based computing system that will enable researchers to access and analyze human genomic cancer information without the costly and cumbersome infrastructure normally needed to download and store massive amounts of data.

Until now, researchers authorized by the National Institutes of Health (NIH) to analyze The Cancer Genome Atlas (TCGA) had to set up a secure, compliant computing environment capable of managing and analyzing terabytes of data, download the data -- which could take weeks -- and then install the appropriate tools needed to perform the desired analysis.

The Bionimbus Protected Data Cloud, which is the only NIH-approved cloud-based system for TCGA data, will be equipped with the most commonly used query pipelines and will allow researchers to focus solely on the analysis of large-scale cancer genome sequencing, which experts believe can unlock paths to appropriate treatment, early detection and prevention of cancer.

"Our hope is that the Bionimbus environment will help democratize access to cancer genomics data so that more researchers can fruitfully work with large datasets to understand genomic variations that seem to be one of the keys to the precise diagnosis and treatment of cancer," said Robert L. Grossman, PhD, principal investigator of the Bionimbus project and professor of medicine at the University of Chicago Medicine.

The Bionimbus Protected Data Cloud continues to add to its current stable of the most widely used sets of cancer DNA from TCGA, including breast, ovarian and prostate.

TCGA is a comprehensive project to improve the understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing. TCGA contains data from more than 6,000 cancer patients, spanning 20 different types of cancer. The TCGA is a collaboration between the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI), both part of the NIH.

"The Bionimbus Protected Data Cloud provides cancer researchers a simple way to analyze TCGA data without having to become experts at managing big data," said Kenna Shaw, PhD, director of the TCGA Program Office.

Megan McNerney, MD, PhD, instructor of pathology at University of Chicago, used Bionimbus to analyze data that led to her discovery that gene CUX1, which acts as a tumor suppressor, is frequently inactivated in acute myeloid leukemia.

"Bionimbus was critical for my work, as it was used it for all aspects of the project, including secure storage of protected data, quality control of next-generation sequencing results, alignments, expression analysis, and algorithm development," she said. "The strength of Bionimbus, however, is the support that is provided for end users, which enabled both expert and non-expert team members to use the cloud."

The cloud technology for the Bionimbus Protected Data Cloud was developed in part by the Open Science Data Cloud, a National Science Foundation-supported project that is developing cloud infrastructure to manage, analyze and share large scientific datasets.

About the Bionimbus Protected Data Cloud: The Bionimbus Protected Data Cloud is a collaboration between the Open Science Data Cloud and the Institute for Genomics and Systems Biology, the Center for Research Informatics, the Institute for Translational Medicine and the University of Chicago Medicine Comprehensive Cancer Center, all on the University of Chicago campus. The Protected Data Cloud allows users authorized by the National Institutes of Health to compute over human genomic data in a secure and compliant fashion. Currently, selected datasets from The Cancer Genome Atlas are available in the Protected Data Cloud. The Bionimbus project is supported in part by federal funds from the National Cancer Institute, National Institutes of Health through SAIC-Frederick Inc. and The Frederick National Laboratory for Cancer Research. The Protected Data Cloud also uses technology developed by the Open Science Data Cloud that was supported in part by the National Science Foundation (Grants OISE - 1129076 and CISE 1127316). Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NIH, National Science Foundation, or other supporters of the project. For more information, visit bionimbus.opensciencedatacloud.org.

About the TCGA: The Cancer Genome Atlas (TCGA) is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing. The overarching goal of TCGA is to improve the ability to diagnose, treat and prevent cancer. To achieve this goal in a scientifically rigorous manner, the National Cancer Institute and the National Human Genome Research Institute used a phased-in strategy to launch TCGA. A pilot project developed and tested the research framework needed to systematically explore the entire spectrum of genomic changes involved in more than 20 types of human cancer. For more information, visit cancergenome.nih.gov.

About the Open Science Data Cloud: The Open Science Data Cloud is a petabyte scale cloud to manage, analyze and share large scientific datasets that is managed by the not for profit Open Cloud Consortium. For more information, visit opensciencedatacloud.org.