A National Collaboratory for Developmental Biology

Michael D. Doyle, Ph.D.Director, Center for Knowledge Management, University of California, San Francisco

The Visible Embryo Project is a multi-institutional, interdisciplinary research project to develop a large-scale distributed computational resource "center," or "collaboratory," to support research, education, and health care relating to developmental biology. A primary goal of thi6s project is to provide a testbed for the development of new technologies, and the refinement of existing ones, for the application of high-speed, high-performance computing and communications to current problems in biomedical science. Sets of serial microscopic cross-sections through human embryos, within the collection of the National Museum of Health and Medicine, will be digitized and processed to create volumetric reconstructions of normal human embryonic anatomy. During the five years of this initial project, a large portion of the Museum's Carnegie Collection of Human Embryology will be digitized, reconstructed and archived, together with case histories, scientific articles, research notes, didactic descriptions, and other data contained within the collection. This massive database will be housed at the Museum at Washington D.C., while teams of researchers at more than 20 universities and companies around the United States will access widely distributed supercomputing resources to develop visualization, analysis and telecollaboration software tools, educational materials, virtual reality simulations, basic science investigations, and clinical research projects based upon the data contained within the collection.

This project will serve the dual purpose of providing a testbed for new technology development in high performance computing and communications, as well as creating powerful new tools for the developmental biology research community. New advances in visualization technology are beginning to allow investigators to break through previous technical limitations and discover universally-applicable rules for pattern formation and shape development in organisms. By applying these new technologies to the existing archives of cross-sectional image information that exist in the literature and in collections around the world, we can tap into an enormous amount of new information that can be extracted from these databases. The task of integrating access to such massive information and computational resources is nontrivial. Just one embryo from the 650 serially-sectioned specimens in the collection can yield as much as a terabyte of anatomical volume data (a 20mm specimen, sectioned at 5 microns and digitized at a resolution of 8000x8000 pixels/section at 36bits RGB produces 1.073 TB of voxel data). It is clear that no single workstation, or even supercomputer, can manipulate, process and analyze such a large quantity of data as a single unit, much less perform computational operations on a database of hundreds of such datasets. For this reason, the Visible Embryo team at UCSF has developed tools to allow integrated Internet access (through a modified version of the NCSA Mosaic program) to remote volume visualization engines which can distribute computation across a large number of graphics supercomputers connected by high-speed networking. This allows the integration, through Mosaic and the World Wide Web, of text-based, image, audio, and video data with real-time interactive control of high-performance visualizations embedded within Mosaic documents. The result is that users can access large-scale computational resources through low end machines. And a single system can integrate information search, retrieval, creation, processing and analysis. "Internet workgroups" of scientists will be able to telecollaborate through the system, promoting the concept of a "collaboratory," or a "laboratory without walls."