RCSB PDB Newsletter Number 34 -- July 2007 Published quarterly by the Research Collaboratory for Structural Bioinformatics Protein Data Bank Weekly RCSB PDB news is published at www.pdb.org To change your subscription options, please visit lists.sdsc.edu/mailman/listinfo.cgi/rcsb-news ----------------------------------------- TABLE OF CONTENTS Message from the RCSB PDB Remediation Project Details Accessing the PDB Archive Remediation of the Entire PDB Archive The Chemical Component Dictionary Accessing the Remediated Data from the RCSB PDB Data Deposition and Processing Depositing NMR Structures with ADIT-NMR PDB Focus: First Time ADIT Depositors Deposition Statistics Data Query, Reporting, and Access Exploring Ligands in the RCSB PDB Database Using Simple Viewer to Visualize Functional Biological Units Using PubMed Abstracts to Search the PDB Website Statistics Outreach and Education Help Desks ... Structural Genomics News Currently Available Job Openings Molecules of the Quarter PDB Education Corner: Structures and Other NIGMS Booklets Make Science Accessible PDB Community Focus: Alex Wlodawer Statement of Support, Partners, Leadership Team Snapshot -------------------------------------------- MESSAGE FROM THE RCSB PDB The wwPDB is pleased to announce that the PDB archive (ftp://ftp.wwpdb.org/) is now comprised of remediated data. In the past, query across the complete PDB archive has been limited by missing, erroneous and inconsistently reported data, nomenclature, and other annotations. The evolution of experimental methods and the techniques used to process these data has introduced various inconsistencies into the PDB archive. Over the years, wwPDB members Ð the RCSB PDB, MSD-EBI, PDBj, and BMRB Ð have worked together to ensure the uniformity of entries archived in the PDB. The entire archive has now been reviewed and remediated with the objectives of improving the detailed chemical description of non-polymer and monomer chemical components; standardizing atom nomenclature; updating sequence database references and taxonomies; resolving any remaining differences between chemical the macromolecular sequences; improving the representation of viruses; and verifying primary citation assignments. As these corrections were being incorporated, the wwPDB worked with software developers and database maintainers to ensure a smooth transition. This year, the remediated archive was made public for testing, and the resulting feedback was incorporated into the archive. The wwPDB greatly appreciates the efforts of the many people who have taken the time to review and work with these data files, and the advice and discussions with our advisory committees. Descriptions of this remediation process are available in this newsletter and at the wwPDB site at www.wwpdb.org. -------------------------------------------- REMEDIATION PROJECT DETAILS Accessing the PDB Archive The PDB archive has been remediated by wwPDB members the RCSB PDB, MSD-EBI, PDBj, and the BMRB. It can be accessed from ftp://ftp.wwpdb.org. New files processed and released into the archive by the wwPDB sites will reflect the new features incorporated as part of this project, including standardized IUPAC nomenclature for all chemical components*. Users may have to download new software to properly view the files with the new nomenclature (e.g., RasMol, Chimera)**. Links to resources are available at www.wwpdb.org. A snapshot of the unremediated PDB archive (as of July 31, 2007) is available at ftp://snapshots.rcsb.org. *see J.L. Markley et al. (1998) Pure & Applied Chem. 70:117-142. **see sourceforge.net/projects/openrasmol and www.cgl.ucsf.edu/chimera Remediation of the Entire PDB Archive Highlights of the types of information improved through remediation include: sequence (updated references to databases and taxonomies/Resolved differences between chemical and macromolecular sequences), citation (verified and updated primary citation assignments), assembly and virus information (improved representation of deposited and experimental coordinate frames, symmetry, and frame transformations), nucleic acid labeling (deoxy and ribose nucleotides assigned separate chemical definitions), beamline data (beamline and synchrotron facility names have been made consistent with BioSync), chemical components (standardization of chemistry and nomenclature in monomers and ligands). Remediated data are available for each PDB entry in three formats: mmCIF (mmcif.pdb.org) -- all remediation work was done using the PDB Exchange Dictionary (PDBx) that follows the mmCIF syntax; PDBML-XML (pdbml.pdb.org), in a direct translation from the files in mmCIF format, and PDB File Format (wwpdb.org) version 3.0. This version of the file format incorporates standardized atom nomenclature, and distinguishes deoxyribonucleic acid from ribonucleic acid. The Chemical Component Dictionary The Chemical Component Dictionary (formerly known as the ÒHET dictionaryÓ) describes all residues in the PDB, standard and non-standard, and all small molecules. It has been remediated to address the inconsistencies in older dictionary entries that resulted in valence problems, missing model coordinates, redundant ligands, and more. The full Chemical Component Dictionary and the companion Amino Acid Variants Dictionary can be downloaded from remediation.wwpdb.org/ downloads.html. Accessing the Remediated Data from the RCSB PDB Website The latest release of the RCSB PDB website utilizes the data from the wwPDB Remediation Project. This new site offers: * Improved searching and reporting capabilities * Updated sequence references * Updated primary citation information and links * Better representations for complex assemblies (such as viruses) * Access to remediation data and pre-remediation data * Advanced access to ligand information * Enhanced sequence details page for each structure For More Information A variety of documents describing the remediation project are available at wwpdb.org, including format descriptions and further information about what was changed in these files. Links to software resources are also provided. Questions and comments about the remediated data should be sent to info@wwpdb.org -------------------------------------------- DATA DEPOSITION AND PROCESSING Depositing NMR Structures with ADIT-NMR Users can now deposit NMR structure and experimental data using one tool: ADIT-NMR. Available from batfish.bmrb.wisc.edu/bmrb-adit and nmradit.protein.osaka-u.ac.jp/bmrb-adit, ADIT-NMR can be used to precheck, validate, and deposit NMR structures. Coordinates and constraint data will be processed and released by the RCSB PDB and PDBj, while other NMR spectral data (such as chemical shifts, coupling constants, and relaxation parameters, etc.) will be processed and archived by BMRB. All new NMR depositions at RCSB PDB and PDBj will be submitted using ADIT-NMR. The assignment of PDB/BMRB IDs and the movement of data files between sites is fully automated. More than 100 joint depositions have already been processed through this new system. Any unfinished NMR deposition sessions that were started using ADIT before May 16, 2007 will continue to be available at that site. Other tools for NMR depositions include pdb_extract (pdb-extract.rcsb.org) and the Validation Server at the RCSB PDB (deposit.pdb.org/validate) and PDBj (pdbdep.protein.osaka-u.ac.jp/validate). NMR structures may also be deposited using AutoDep at MSD-EBI. PDB Focus: First Time ADIT Depositors There are a few steps a depositor can take to make the process of depositing a structure to the PDB quick, easy, and accurate! This is an iterative process. If you encounter problems at a particular step, please make the correction(s) and go through the steps again. 1. Use the pdb_extract Program Suite to extract information needed for deposition from output files produced by many structure determination applications. 2. Check your structure with the Validation Server to ensure that the data being deposited are accurate and reflect what you intend to submit. 3. Compare your sequence with sequence database references. Any necessary corrections can then be made to your sequence and coordinates (Try BLAST at www.ebi.ac.uk/blast2 or www.ncbi.nih.gov/BLAST). 4. Use Ligand Depot to find the proper codes for existing ligands, to link to other entries with a particular ligand, and to search for substructures. If a ligand related to a deposition is not in Ligand Depot, please email the chemical diagram, name, and formula to deposit@deposit.rcsb.org. 5. Deposit your structure using ADIT, using its editor to add any missing information to the deposition. For a detailed packet of information about first-time deposition, including reprints about validation and Ligand Depot, please send your postal address to info@rcsb.org with the subject line "first time depositor packet". Deposition Statistics In the first half of 2007, 4535 structures were deposited to the PDB archive. The entries were processed by the wwPDB. Of the structures deposited, 63.8% were deposited with a release status of "hold until publication"; 17.9% were released as soon as annotation of the entry was complete; and 18.3% were held until a particular date. 83.7% of these entries were determined by X-ray crystallographic methods; 16.1% were determined by NMR methods. 84.1% of these structures were deposited with experimental data. 93.7% of the crystal structures were deposited with structure factors; 34.7% of NMR structures were deposited with restraints. -------------------------------------------- DATA QUERY, REPORTING, AND ACCESS Exploring Ligands in the RCSB PDB Database A ligand name can be entered in the keyword text search at the top of any page from the RCSB PDB website. The Advanced Search query engine can also be used to search for structures based using a ligand's name, ID code, or SMILES string. In addition to reviewing the structures that match the given query constraints, users can select the Ligand Hits tab, which lists the ligands known to interact with the structures matching the query. The Ligand Hits tab also offers a gallery view of ligand images. Selecting one of the ligands from this tab returns a summary page with chemical and structural details. The page offers interactive and static views of the ligand. Users may also download ÒmodelÓ coordinates (the experimental coordinates from the first deposition of the ligand) and "ideal coordinates" (generated from the model coordinates and their connectivity) in a variety of formats including CIF, XML, SDF and PDB. Using Simple Viewer to Visualize Functional Biological Units When crystallographic structures are deposited in the PDB, the primary coordinate file generally contains one asymmetric unit -- a concept that has applicability only to crystallography. For many of these structures, the asymmetric unit represents the functional biological molecule. In other cases, the biological unit can be generated from the asymmetric unit. In these cases, Protein Workshop can be used to display the asymmetric unit and Simple Viewer can be used to explore the functional biological unit of a structure. Simple Viewer can rotate a structure, zoom in and out, and then save the view of the biological unit as an image file. Simple Viewer tool can be launched from the Display Options found on an entry's Structure Summary page. Simple Viewer requires Java version 1.4 or greater. Using PubMed Abstracts to Search the PDB PubMed abstracts are accessible from a published entry's Structure Summary page. The Abstract link returns a page with the article title, abstract, keywords, authors, organizational affiliation, journal, and PubMed identifier. The PubMed abstract at NCBI can also be viewed by clicking on the icon next to Abstract. The text box at the bottom of the Abstract page can be used to search for related structures in the PDB using any word in the abstract or keyword fields. Terms can be entered into the text box either by typing the word manually or by clicking the mouse over any word in the abstract or the keyword fields. Website Statistics Access statistics for the second quarter of 2007 are given below for the RCSB PDB website at www.pdb.org. Month Unique Number of Bandwidth Visitors Visits Apr 2007.....122,991.....293,105.......503.15 GB May 2007.....123,069.....300,115.......504.13 GB Jun 2007.....104,693.....258,107.......424.74 GB -------------------------------------------- OUTREACH AND EDUCATION Help Desks Answer Questions about Remediated Data, the RCSB PDB Website, Deposition, and More Electronic help desks are available to support users exploring PDB data. info@wwpdb.org is available to address questions regarding the remediated PDB archive. The wwPDB appreciates the feedback from users who have examined the Chemical Component Dictionary and files in the remediated archive. deposit@deposit.rcsb.org answers questions about the deposition and annotation process at the RCSB PDB. Support pages at deposit.pdb.org include a file deposition and release FAQ, an overview of software tools, and tutorials for using ADIT, pdb_extract, the Validation Server, and Ligand Depot. info@rcsb.org responds to requests relating to the navigation of the RCSB PDB website. Questions about searching, reporting, and using all of the resources available from the RCSB PDB should be sent to this address. The RCSB PDB help system launches a separate browser window to allow users to access the help information and the website at the same time. It offers detailed topics (including Getting Started, Download Files, Search/Browse the Database, and Results), an index, glossary, and search engine. Select any of the buttons on the website to launch this system. Structural Genomics News: PSI Highlighted Structures, Technical Advances, and Assessment The RCSB PDB Structural Genomics Information Portal (sg.pdb.org) offers online tools, summary reports, and target information related to structural genomics. This site also links to new features from the Protein Structure Initiative (PSI). The PSI Structures of the Month highlights recent structures solved by the current centers, while the PSI Technical Highlight describes methods developed by the effort's researchers to speed the structure determination process. These features include links to related information, such as PDB structure summaries, published papers, and center websites. To date, the overall PSI effort has resulted in nearly 2,500 structures of which about 70 percent share less than 30 percent of their sequence with other known proteins. Methods and tools developed during the first phase of the PSI have been incorporated into the centers' structural genomics pipelines and adopted by structural biology labs throughout the world. Currently Available Job Openings The RCSB PDB is currently looking for new people to join our team. Available positions include Scientific Lead, Biochemical Information & Annotation Specialist, Web Developer/ Database Programmer, and a Database Application Programmer. For more details, please see www.pdb.org Molecules of the Quarter The Molecule of the Month series explores the function and significance of selected biological macromolecules for a general audience. The molecules featured this quarter were clathrin, aconitase/iron regulatory protein 1, and fatty acid synthase. The complete features are accessible from www.pdb.org. -------------------------------------------- PDB EDUCATION CORNER: Structures and Other NIGMS Booklets Make Science Accessible By Alisa Zapp Machalek, a science writer at the National Institute General Medical Sciences (NIGMS) at NIH As a scientist, you probably donÕt need to be convinced of the value, importance, and beauty of molecular structures or the thrill of studying them. But try explaining it to the public Ð or to teenagers. That's just what The Structures of Life seeks to do. This free science education booklet is published by the National Institute of General Medical Sciences (NIGMS), a part of the National Institutes of Health that supports a good chunk of the worldÕs structural biology research. Naturally, the Protein Data Bank is featured throughout the booklet, both as a source of several images and as the repository into which structural biologists deposit their data to make them freely available to the scientific community. Why does NIGMS produce science education materials? The Structures of Life and our many other science education materials help NIGMS show the public how their tax dollars are leading to research advances. Improving K-12 science education in America is important for many reasons, says Jeremy M. Berg, Ph.D., NIGMS director. "Of course, part of it is long-range workforce development," he says. "But it's broader than that. The ability to think critically and to solve problems is hugely important for all aspects of society. Many have cited the uncomfortably low math and science scores of American students*** as evidence that, to remain leaders in the global marketplace, we will need to improve K-12 science education." Our goal is for NIGMS educational materials to contribute to this effort. We try to encourage an understanding and appreciation of science in all readers by showcasing scientists doing cutting-edge research and explaining its potential implications. We hope that the materials help inspire some readers to pursue careers in biomedical research. Because role models can be pivotal for young people choosing and pursuing careers, we feature male and female scientists from diverse backgrounds, geographic locations, career stages, and scientific fields. We also strive to show that scientists have full, interesting lives and unique personalities. In our semi-annual magazine Findings, we've written about a crystallographer whose clarinet skill landed him in Carnegie Hall, an NMR spectroscopist who is also a former professional basketball player, a computational biologist who is an expert mountain climber, and many others. To increase understanding of the nature and importance of basic, untargeted research, we use examples from areas of science within the NIGMS mission, including structural biology, computational biology, cell biology, genetics, pharmacology, and chemistry. Who uses the materials? Our booklets are used by teachers, homeschoolers, museums and science personnel, student workshop leaders, science curriculum advisors, and teacher trainers programs around the country. Most of the materials are geared for a high school audience, but the publications are also used in some advanced middle school classes and introductory college courses. Here are a few examples of how NIGMS science education publications have been used recently. * The Massachusetts Institute of Technology distributes NIGMS booklets to the teachers in its Summer Teacher Workshop as examples of exemplary supplementary resources and uses PowerPoint slides from Findings to instruct teachers how to incorporate multimedia into their lessons. * The Arizona Biomedical Research Commission uses the publications to educate its members about the science underlying the grant applications they are reviewing for funding. * The Distance Learning Unit in Queensland, Australia included part of an NIGMS booklet in its Senior Biology curriculum, which is distributed on CD-ROM and posted online for students who canÕt attend school because they live in remote areas or are disadvantaged by personal circumstances. What is available and how can I get them? In addition to The Structures of Life and Findings, NIGMS publishes booklets on genetics, pharmacology, cell biology, and biochemistry; a monthly electronic newsletter called Biomedical Beat; and a number of fact sheets. We also offer a small but growing collection of images and other multimedia resources on our website. Our newest publication, available this summer, is called Computing Life and covers computational biology. If you have suggestions about how to improve or use any of our publications, we'd love to hear from you. Contact the NIGMS Office of Communications and Public Liaison at info@nigms.nih.gov or 301-496-7301. For related resources, see the online version of the newsletter at www.pdb.org ***See results from PISA (Programme for International Student Assessment) at www.pisa.oecd.org. PISA is run by the Organisation for Economic Co-operation and Development, a multinational body dedicated to building strong economies worldwide. PISA tests reading, math, and science skills of 15-year-olds around the globe. In 2003, it also tested real-world problem solving skills. -------------------------------------------- PDB COMMUNITY FOCUS: Alex Wlodawer, Macromolecular Crystallography Laboratory, National Cancer Institute Dr. Wlodawer is Chief, Macromolecular Crystallography Laboratory and Chief, Protein Structure Section at the National Cancer Institute in Frederick, MD. Q: In Acta D, you recently expressed the point of view that experimental data for structures solved by X-ray and NMR should be deposited and released under the same policies as coordinate files (Acta Crystallogr. 2007 D63:421-423). Why do you think this is so important? A: The question of which crystallographic results should be deposited in PDB and on what schedule has been asked many times, but still does not have the final answer. Here, the experimental data refer to the processed data, e.g., the structure factors in X-ray diffraction, not the raw images. The rules changed very substantially about 8 years ago, when the International Union of Crystallography modified its deposition regulations, and their recommendations became generally accepted by most funding agencies and by scientific journals. The coordinates of published structures must now be deposited in PDB and released upon publication of the relevant papers. However, although structure factors must also be deposited, their release can be delayed by up to 6 months. In a recent Letter to Editor published in Acta Cryst. D, I proposed that such a delay should be disallowed. I feel very strongly that the coordinates and structure factors are a matched pair, and one needs the other. The heart of the matter is that scientific results should be useful for the community (I consider description of a structure without the availability of coordinates to be advertising and not science), and verifiable (how can we prove that the structure is correct if not by comparison with the structure factors?). Let me give an example from a paper which I recently reviewed. The authors presented a series of structures of enzyme-inhibitor complexes, with one of the structures repeating a previously published experiment. However, the conformation of the inhibitor reported in the new paper was very different, changing in a substantial way the interpretation of the enzymatic mechanism. Unfortunately, with the diffraction data for the original structure never deposited (against the journal rules!), it was not possible to verify if the differences were real (and thus significant for the understanding of how the enzyme works) or due to errors in the interpretation present in the original paper. This is just one example, but I could cite many more. With the acceleration of the process of structure solution we should not have to wait half a year to verify what we read in the papers, if any doubts are raised. And let us remember that the most interesting results are often the ones that are most controversial. Q: As a member of several editorial boards, what types of information are you looking for when reviewing macromolecular structure papers? Has a journal published a paper, only to be surprised by the validation remarks in the corresponding PDB file? What types of information do you think would be valuable to a reviewer of a paper describing a macromolecular structure? A: Oh boy, have we been surprised! I have seen many papers, often published in Science and Nature (these journals seem to care more about getting the scoop than getting it right) where a look at the PDB files would bring a very unpleasant surprise about the quality of structural work. Whenever I review a paper that describes a structure I look first to see if the coordinates have already been deposited. However, that still tells me very little of what is hidden beyond the accession code. There has been much discussion of whether the reviewers should be given the actual coordinates. In an ideal world they should, but even I am a realist, and I know what is possible, and what is not. However, a minimum of what I would like to see as an Editor is the header of the PDB file and a brief version of the validation report. The former will tell me if the authors were lazy and did not bother to calculate the rmsdÕs, or disclose what programs were used to solve the structure, where data were collected, etc. Many PDB data sets have all such records populated by a uniform answer NULL. A short validation report would tell me if the structure might have some serious problems. If I see a D amino acid in an otherwise normal protein, or interatomic distances of 0.1 Angstrom, I would like at least to ask the authors some questions before accepting their otherwise brilliant paper. Q:Compared to pharmaceutical companies, what is the National Cancer Institute's approach to focusing structural studies to cancer? A: I am not allowed to talk about the policy of NCI without obtaining all sorts of permissions, so I better not delve too deeply into this matter. In general, it is not the mission of NCI to create drugs, but rather to create knowledge that might be the basis for drug development by pharmaceutical companies. We have fewer chemists than even some startup biotech/pharma companies, so not too much should be expected of us in this area. However, we do have some superb scholars doing fundamental research who generate data allowing understanding of the basis of cancer, delineating novel drug targets, creating new treatment methods and protocols, etc. Thus our role and that of the pharmaceutical companies should be considered to be different, but complementary. Q:What are your thoughts on the current state of crystallographic education? A: What education? As far as I know, there is none. I do not believe that crystallography is still taught as a discipline, at least in the United States. Whether students will be exposed to it rigorously depends entirely on a good will of a faculty member old enough to know what he/she teaches. I am afraid that the education of most young people that actually solve crystal structures is limited to reading the manuals for HKL2000, CCP4, SHELX, COOT, or other black boxes. I am really afraid that when my generation retires, there will be few of the younger people who will be able not only to solve structures, but also understand the methods and develop them further. Hopefully I am wrong (happens often to me), but certainly I am not optimistic in this respect. ---------------------------------------- STATEMENT OF SUPPORT The RCSB PDB is supported by funds from the National Science Foundation, the National Institute of General Medical Sciences, the Office of Science, Department of Energy, the National Library of Medicine, the National Cancer Institute, the National Center for Research Resources, the National Institute of Biomedical Imaging and Bioengineering, the National Institute of Neurological Disorders and Stroke, and the National Institute of Diabetes & Digestive & Kidney Diseases. The RCSB PDB is managed by two partner sites of the Research Collaboratory for Structural Bioinformatics: RUTGERS Rutgers, The State University of New Jersey Department of Chemistry and Chemical Biology 610 Taylor Road Piscataway, NJ 08854-8087 SDSC/Skaggs/UCSD San Diego Supercomputer Center and the Skaggs School of Pharmacy and Pharmaceutical Sciences University of California, San Diego 9500 Gilman Drive La Jolla, CA 92093-0537 RCSB PDB LEADERSHIP TEAM Dr. Helen M. Berman - Director Rutgers University berman@rcsb.rutgers.edu Dr. Philip E. Bourne - Co-Director SDSC/Skaggs/UCSD bourne@sdsc.edu A list of current RCSB PDB Team Members is available from the website. The RCSB PDB is a member of the Worldwide PDB (www.wwpdb.org) -------------------------------------------- SNAPSHOT July 1, 2007 44320 released atomic coordinate entries * Molecule Type 40730 proteins, peptides, and viruses 1760 nucleic acids 1795 protein/nucleic acid complexes 35 other * Experimental Technique 37716 X-ray 6367 NMR 149 electron microscopy 88 other 27026 structure factor files 3512 NMR restraint files