مشاهدة النسخة كاملة : Human Genome Project


نجوم
27-07-2008, 01:09 PM
Human Genome Project

What is the Human Genome Project?
Begun formally in 1990, the U.S. Human Genome Project was a 13-year effort coordinated by the U.S. Department of Energy and the National Institutes of Health. The project originally was planned to last 15 years, but rapid technological advances accelerated the completion date to 2003. Project goals were to:
• identify all the approximately 20,000-25,000 genes in human DNA,
• determine the sequences of the 3 billion chemical base pairs that make up human DNA,
• store this information in databases,
• improve tools for data analysis,
• transfer related technologies to the private sector, and
• address the ethical, legal, and social issues (ELSI) that may arise from the project.
To help achieve these goals, researchers also studied the genetic makeup of several nonhuman organisms. These include the common human gut bacterium Escherichia coli, the fruit fly, and the laboratory mouse.
A unique aspect of the U.S. Human Genome Project is that it was the first large scientific undertaking to address potential ELSI implications arising from project data.
Another important feature of the project was the federal government's long-standing dedication to the transfer of technology to the private sector. By licensing technologies to private companies and awarding grants for innovative research, the project catalyzed the multibillion-dollar U.S. biotechnology industry and fostered the development of new medical applications.
Sequence and analysis of the human genome working draft was published in February 2001 and April 2003 issues of Nature and Science. See an index of these papers and learn more about the insights gained from them.
What's a genome? And why is it important?
• A genome is all the DNA in an organism, including its genes. Genes carry information for making all the proteins required by all organisms. These proteins determine, among other things, how the organism looks, how well its body ****bolizes food or fights infection, and sometimes even how it behaves.
• DNA is made up of four similar chemicals (called bases and abbreviated A, T, C, and G) that are repeated millions or billions of times throughout a genome. The human genome, for example, has 3 billion pairs of bases.
• The particular order of As, Ts, Cs, and Gs is extremely important. The order underlies all of life's diversity, even dictating whether an organism is human or another species such as yeast, rice, or fruit fly, all of which have their own genomes and are themselves the focus of genome projects. Because all organisms are related through similarities in DNA sequences, insights gained from nonhuman genomes often lead to new knowledge about human biology.

How big is the human genome?
The human genome is made up of DNA, which has four different chemical building blocks. These are called bases and abbreviated A, T, C, and G. In the human genome, about 3 billion bases are arranged along the chromosomes in a particular order for each unique individual. To get an idea of the size of the human genome present in each of our cells, consider the following analogy: If the DNA sequence of the human genome were compiled in books, the equivalent of 200 volumes the size of a Manhattan telephone book (at 1000 pages each) would be needed to hold it all.
It would take about 9.5 years to read out loud (without stopping) the 3 billion bases in a person's genome sequence. This is calculated on a reading rate of 10 bases per second, equaling 600 bases/minute, 36,000 bases/hour, 864,000 bases/day, 315,360,000 bases/year.
Storing all this information is a great challenge to computer experts known as bioinformatics specialists. One million bases (called a megabase and abbreviated Mb) of DNA sequence data is roughly equivalent to 1 megabyte of computer data storage space. Since the human genome is 3 billion base pairs long, 3 gigabytes of computer data storage space are needed to store the entire genome. This includes nucleotide sequence data only and does not include data annotations and other information that can be associated with sequence data.
As time goes on, more annotations will be entered as a result of laboratory findings, literature searches, data analyses, personal communications, automated data-analysis programs, and auto annotators. These annotations associated with the sequence data will likely dwarf the amount of storage space actually taken up by the initial 3 billion nucleotide sequence. Of course, that's not much of a surprise because the sequence is merely one starting point for much deeper biological understanding!
Why was the Department of Energy (DOE) involved in the Human Genome Project?
After the atomic bomb was developed and used, the U.S. Congress charged DOE's predecessor agencies (the Atomic Energy Commission and the Energy Research and Development Administration) with studying and analyzing genome structure, replication, damage, and repair and the consequences of genetic mutations, especially those caused by radiation and chemical by-products of energy production. From these studies grew the recognition that the best way to study these effects was to analyze the entire human genome to obtain a reference sequence. Planning began in 1986 for DOE's Human Genome Program and in 1987 for the National Institutes of Health's (NIH) program. The DOE-NIH U.S. Human Genome Project formally began October 1, 1990, after the first joint 5-year plan was written and a memorandum of understanding was signed between the two organizations.
Consistent with the goals of the Human Genome Project, the DOE Human Genome Program focused on the following:
• Mapping human chromosomes 2, 5, 11, X, 16, 19, and 21;
• Comparative studies between mouse and human genomes
• Development of important biological resources for the Human Genome Project and the broader biomedical research communities, including purified DNA collections for each human chromosome and sequence-ready DNA
• Technologies, instrumentation, and robotics for more efficient DNA sequencing;
• Development of analysis algorithms and integration of databases (informatics) for managing and interpreting genome data
• Communicating about the Human Genome Project to those who would interpret it for various professions and ultimately for the public
Another important DOE goal was to foster research into the ethical, legal, and social implications (ELSI) of genome research. The DOE Human Genome Program ELSI component and the data it generated concentrated on two main areas: (1) privacy and confidentiality of personal genetic information, including its accumulation in large, computerized databases and databanks; and (2) development of educational materials and activities in genome science and ELSI, including curricula and TV ********aries, workshops, and seminars for targeted audiences. Other areas of interest include data privacy arising from potential uses of genetic testing in the workplace and issues related to commercialization of genome research results and technology transfer.
The Human Genome Project was sometimes reported to have cost $3 billion. However, this figure refers to the total projected funding over a 13-year period (1990–2003) for a wide range of scientific activities related to genomics. These include studies of human diseases, experimental organisms (such as bacteria, yeast, worms, flies, and mice); development of new technologies for biological and medical research; computational methods to analyze genomes; and ethical, legal, and social issues related to genetics. Human genome sequencing represents only a small fraction of the overall 13-year budget.
What DOE investments improved the efficiency of the Human Genome Project by reducing costs, speeding progress, and furthering technology?
Making the Project Possible

Its long-standing mission to understand and characterize the potential health risks posed by energy use and production led DOE to propose, in the mid-1980s, that all three billion bases of DNA from an "average" human should be sequenced. Technologies available before that time had not enabled the routine detection of extremely rare and often minute genetic changes resulting from radiation and chemical exposures.
The scientific foundation for DOE's Human Genome Initiative already existed at the national laboratories.
• DOE had a long history of conducting large multidisciplinary projects involving biologists, chemists, engineers, and mathematicians.
• Genbank, a DNA sequence repository, had been developed at Los Alamos National Laboratory (LANL) with DOE computer and data-management expertise. Today, Genbank, the world's principal DNA sequence database, resides at the National Library of Medicine.
• Chromosome-sorting capabilities essential to a genome initiative existed at LANL and Lawrence Livermore National Laboratory (LLNL). Using this technology, LANL and LLNL began the National Laboratory Gene Library Project, a collection of cloned DNAs from single human chromosomes.
In 1986, DOE became the first federal agency to announce and fund a genome program.

DNA Sequencers
Research on capillary-based DNA sequencing contributed to the development of the two major DNA sequencing machines—the Perkin-Elmer 3700 and the MegaBace DNA sequencers. The MegaBace DNA sequencer was developed initially with DOE funds by Dr. Richard Mathies at U.C. Berkeley. The Perkin-Elmer 3700 was based, in part, on DOE-funded research by Dr. Norman Dovichi at the University of Alberta. These high-throughput instruments are one of the keys to the success of the genome project.
Fluorescent dyes

DNA sequencing originally used radiolabeled DNA subunits. DOE-funded research contributed to the development of fluorescent dyes that increased the accuracy and safety of DNA sequencing as well as the ability to automate the procedures.
DNA cloning vectors

Before large DNA molecules can be sequenced, they are cut into small pieces and multiplied, or cloned, into numerous copies using microbial-based "cloning" vectors. Today, the bacterial artificial chromosome (BAC) is the most commonly used vector for initial DNA amplification before sequencing. These cloning vectors were developed with DOE funds.
BAC-end sequencing

The widely agreed-upon strategy for sequencing the human genome is based on the use of BACs that carry fragments of human DNA from known locations in the genome. DOE-funded research at The Institute for Genomic Research in Rockville, Maryland, and at the University of Washington provided the sequencing community with a complete set of over 450,000 BAC-based genetic "markers" corresponding to a sequence tag every 3 to 4 kilo bases across the entire human genome. These markers were needed to assemble both the draft and the final human DNA sequence.
GRAIL
GRAIL (Gene Recognition and Assembly Internet Link) is one of the most widely used computer programs for identifying potential genes in DNA sequence and for general DNA sequence analysis. This powerful analytical tool was developed with DOE funds by Dr. Ed Uberbacher at Oak Ridge National Laboratory. Although a number of gene-finding tools are now available for use, GRAIL led the way.
Reducing Costs and Speeding up Sequencing
The above technological developments dramatically decreased DNA sequencing's cost while increasing its speed and efficiency. For example, it took 4 years for the international Human Genome Project to produce the first billion base pairs of sequence and less than 4 months to produce the second billion base pairs. In the month of January 2003, the DOE team sequenced 1.5 billion bases. The cost of sequencing has dropped dramatically since the project began and is still dropping rapidly.

Where can I find details about the Department of Energy's Human Genome Program and other DOE genome programs?
• GTL is DOE's next step in genomics--builds on data and resources from the Human Genome Project, the Microbial Genome Program, and systems biology. GTL will accelerate understanding of dynamic living systems for solutions to DOE mission challenges in energy and the environment.
What's Next?
Turning Genomics Vision Into Reality
In "A Vision for the Future of Genomics Research," published in the April 24, 2003 issue of the journal Nature, the National Human Genome Research Institute (NHGRI) details a myriad of research opportunities in the genome era. This backgrounder describes a few of the more visible, large-scale opportunities.
The International HapMap Project
Launched in October 2002 by NHGRI and its partners, the International HapMap Project has enlisted a worldwide consortium of scientists with the goal of producing the "next-generation" map of the human genome to speed the discovery of genes related to common illnesses such as asthma, cancer, diabetes and heart disease.
Expected to take three years to complete, the "HapMap" will chart genetic variation within the human genome at an unprecedented level of precision. By comparing genetic differences among individuals and identifying those specifically associated with a condition, consortium members believe they can create a tool to help researchers detect the genetic contributions to many diseases. Whereas the Human Genome Project provided the foundation on which researchers are making dramatic genetic discoveries, the HapMap will begin building the framework to make the results of genomic research applicable to individuals.
Encyclopedia of DNA Elements (ENCODE)
This NHGRI-led project is designed to develop efficient ways of identifying and precisely locating all of the protein-coding genes, non-protein-coding genes and other sequence-based, functional elements contained in the human DNA sequence. Creating this monumental reference work will help scientists mine and fully utilize the human sequence, gain a deeper understanding of human biology, predict potential disease risk, and develop new strategies for the prevention and treatment of disease.
The ENCODE project will begin as a pilot, in which participating research teams will work cooperatively to develop efficient, high-throughput methods for rigorously and fully analyzing a defined set of target regions comprising approximately 1 percent of the human genome. Analysis of this first 30 mega bases (Mb) of human genome sequence will allow the project participants to test and compare a variety of existing and new technologies to find the functional elements in human DNA.
Chemical Genomics
NHGRI is exploring the acquisition and/or creation of publicly available libraries of organic chemical compounds, also referred to as small molecules, for use by basic scientists in their efforts to chart biological pathways. Such compounds have a number of attractive features for genome analysis, including their wide structural diversity, which mirrors the diversity of the genome; their ability in many cases to enter cells readily; and the fact that they can often serve as starting points for drug development. The use of these chemical compounds to probe gene function will complement more conventional nucleic acid approaches.
This initiative offers enormous potential. However, it is a fundamentally new approach to genomics, and largely new to basic biomedical research as a whole. As a result, substantial investments in physical and human capital will be needed. NHGRI is currently planning for these needs, which will include large libraries of chemical compounds (500,000 - 1,000,000 total); capacity for robotic-enabled, high-throughput screening; and medicinal chemistry to convert compounds identified through such screening into useful biological tools.
Genomes to Life
The Department of Energy's "Genomes to Life" program focuses on single-cell organisms, or microbes. The fundamental goal is to understand the intricate details of the life processes of microbes so well that computational models can be developed to accurately describe and predict their responses to changes in their environment.
"Genomes to Life" aims to understand the activities of single-cell organisms on three levels: the proteins and multi-molecular machines that perform most of the cell's work; the gene regulatory networks that control these processes; and microbial associations or communities in which groups of different microbes carry out fundamental functions in nature. Once researchers understand how life functions at the microbial level, they hope to use the capabilities of these organisms to help meet many of our national challenges in energy and the environment.
Structural Genomics Consortium
Structural genomics is the systematic, high-throughput generation of the three-dimensional structure of proteins. The ultimate goal for studying the structural genomics of any organism is the complete structural de******ion of all proteins encoded by the genome of that organism. Such three-dimensional structures will be crucial for rational drug design, for diagnosis and treatment of disease, and for advancing our understanding of basic biology. A broad collection of structures will provide valuable biological information beyond that which can be obtained from individual structures.
To complement various international efforts in structural genomics, the United Kingdom's Wellcome Trust is considering creating a charitable organization - the Structural Genomics Consortium - with a group of pharmaceutical and other companies. The model for the new consortium would be the highly successful SNP Consortium formed in 1999 by the Trust and 12 companies to map human genetic variations, called single nucleotide polymorphisms (SNPs). Like the SNP Consortium, the Structural Genomics Consortium will develop pre-competitive data, placing all protein structures in public databases.


COPY RIGHT NUJOOM