RCSB PDB Newsletter Number 40 -- January 2009 Published quarterly by the Research Collaboratory for Structural Bioinformatics Protein Data Bank Weekly RCSB PDB news is published at www.pdb.org To change your subscription options, please visit lists.sdsc.edu/mailman/listinfo.cgi/rcsb-news ----------------------------------------- TABLE OF CONTENTS Message from the RCSB PDB Data Deposition and Processing Data Query, Reporting, and Access Outreach and Education Education Corner: PDB Community Focus: Statement of Support, Partners, Leadership Team Snapshot -------------------------------------------- MESSAGE FROM THE RCSB PDB In 2008, the RCSB PDB reached several milestones and released many new resources. The 50,000th structure was released in the PDB archive in April, and by the end of December, more than 55,000 structures were available. All of these structures offer opportunities for learningwhether through their novelty, complexity, or even their similarity with other structures. April's Molecule of the Month feature on adrenergic receptors was the 100th installment of the series. Since January 2000, Molecule of the Month has explored the structure and function of proteins and nucleic acids found in the PDB archive such as transfer RNA, anthrax toxin, and multidrug resistance transporters. By highlighting these structures, this ongoing feature provides a good entry point for navigating through all of the structures available in the PDB archive. Another educational resource, the recently-released Looking at Structures, is intended to help researchers and educators get the most out of the PDB archive. Broad topics include how to understand PDB data, how to visualize structures, how to read coordinate files, and potential challenges to exploring the archive. For depositors, a variety of tools were released to help streamline the deposition process. Resources such as SF-Tool, which validates and translates structure factor files, and Ligand Expo, which can be used to search and build chemical components, join proven resources such as pdb_extract and ADIT. The wwPDB's publication of the Comprehensive Format Guide Version 3.2 marks another achievement towards the standardization of the archive. Website features, including the enhanced RSS feed, 3D views of domain information, and Advanced Search, offer a diverse tool set for accessing PDB data. These features were all developed with input and feedback from our diverse user community. We look forward to this continued collaboration in 2009. -------------------------------------------- DATA DEPOSITION AND PROCESSING PDB Archive Version 3.15 to be Released A new standardized version of the PDB archive will be available from ftp://ftp.wwpdb.org in early 2009. The date will be announced at wwpdb.org. As of December 2, 2008, all new PDB releases follow PDB File Format Contents Guide Version 3.20. With the new version of the archive, all entries released prior to December 2, 2008 will be re-released as PDB Format Version 3.15 files. This release will overwrite all existing files. A snapshot of the archive before this release will be available from ftp://snapshots.wwpdb.org/. Tools for downloading the archive can be found at www.wwpdb.org/ downloads.html. For file format documentation, please see www.wwpdb.org/docs.html. Questions may be sent to info@wwpdb.org. Tips for Depositing Multiple Related Structures using ADIT/ADIT-NMR For depositing many structures that are related to one another, there are a few ways of making the ADIT/ADIT-NMR deposition process simpler: * Use pdb_extract when preparing structures solved using X-ray crystallography or NMR. Not only does pdb_extract minimize the amount of manual typing needed during the deposition process, it also utilizes an author information form that can be filled out just one time for use with multiple entries. pdb_extract takes information about data collection, phasing, density modification, and the final structure refinement from the output files and log files produced by the various applications used for structure determination. The collected information is organized into a file ready for deposition using ADIT/ADIT-NMR. The author information form in pdb_extract contains author names, citation information, protein names and source--the types of information that are repeated in multiple related entries. This form can be filled out once and used with pdb_extract to prepare several structures for deposition. * For structures solved by other experimental methods, first deposit one representative structure. After it has been annotated and processed, use this finalized entry as a template for the related depositions by replacing the coordinates and updating information in the PDB or mmCIF file as necessary. * If the structures have bound ligands, drugs, or inhibitors, please check Ligand Expo for matching chemical components. If a match is found, use that corresponding ID code for the component in your coordinates. If a match is not found, choose a new three-character code for the component, and upload the chemical name and a file showing the chemical drawing for the new component into the Ligand Information section of ADIT/ADIT-NMR. These resources and more can be found at www.pdb.org. 2008 Deposition Statistics In 2008, 7043 experimentally-determined structures were deposited to the PDB archive. The entries were processed by wwPDB teams at the RCSB PDB, PDBe, and PDBj. Of the structures deposited in 2008, 75.7% were deposited with a release status of "hold until publication"; 19.5% were released as soon as annotation of the entry was complete; and 4.8% were held until a particular date. 90.9 % of these entries were determined by X-ray crystallographic methods; 8.1% were determined by NMR methods. Since February 1, 2008, depositing structure factor amplitudes/intensities (for crystal structures) and restraints (for NMR structures) has been a mandatory requirement for PDB deposition. As a result, 98.7 % of the 2008 depositions were deposited with experimental data. Also in 2008, 7072 structures were released into the archive. -------------------------------------------- DATA QUERY, REPORTING, AND ACCESS Website Statistics 2008 access statistics for www.pdb.org are given below. Download statistics are available from www.wwpdb.org. Unique No. Bandwidth Visitors Visits Jan 128781 319459 426.87 GB Feb 139444 338946 567.18 GB Mar 152264 361999 642.98 GB Apr 134119 309222 585.77 GB May 123862 286612 607.73 GB Jun 132168 317814 651.02 GB Jul 161567 355065 636.25 GB Aug 133412 296024 514.27 GB Sep 168114 366983 631.13 GB Oct 178581 404453 843.63 GB Nov 173891 381565 728.30 GB Dec 141294 303359 491.65 GB View Domain Annotations in 3D Sequence Details pages for all protein structures now include a Jmol view of the structure that can display domain annotations from SCOP, CATH, DP, PDP, Pfam, and InterPro. To activate this view from a structure summary page, first select the Sequence Details tab. The default view displays a 2D graphical representation of the UniProt, PDB-ATOM and PDB-SEQRES sequences. Users can also select third-party domain annotations from this 2D image to appear with the corresponding structure in a Jmol viewer. To view these annotations mapped onto the 3D structure, select [show 3D in Jmol] from the top of the page. Then, click on any of the domains on the sequence view. The corresponding colors for that domain will appear in the 3D Jmol viewer. The annotations shown in Jmol can change by clicking on an annotation shown in the 2D view. By default, the Jmol window stays positioned on the top of the page. Select [dynamic Jmol position] to have the Jmol viewer adjust so that it is always to the top right of the page as you scroll down. Getting Started with the RCSB PDB Website Not sure how to find what youre looking for? To help users access all of the data and related resources available from the RCSB PDB website, the Getting Started page has been updated. This introduction offers a quick start to using the website and explains the left-hand menu and the tabbed navigation system. For example, selecting each tab offers rich ways of exploring individual structures and search result sets. The left-hand menus organize resources by topic. The Getting Started page is available from the bottom of www.pdb.org. Browser Check for Compatibility with Website Features Is your web browser configured to fully utilize RCSB PDB website features such as changing menus, temporarily stored queries, and Advanced Search? Click on the browser check page from the bottom of www.pdb.org to find out. Most modern browsers are fully supported. Users may encounter difficulty in certain portions of the site when using unsupported browsers or when different options are turned off. The browser check page reports if there are any problems with your browser or browser settings, and provides instructions if changes are needed. Any other questions or problems? Please let our help desk know at info@rcsb.org. -------------------------------------------- OUTREACH AND EDUCATION Meetings and Presentations The RCSB PDB has been participating in a wide variety of meetings. For a full list, please see the January 2009 newsletter at www.pdb.org. Looking at Structures: A Resource for Learning About PDB Data Where are all the hydrogen atoms in this file? Should I care about the R-factor? Why are there 20 overlapped structures in my file? These questions and many others are explored in the RCSB PDB's new Looking at Structures online resource. Using text, images, and interactive Jmols, Looking at Structures intends to help researchers and educators get the most out of the PDB archive. Broad topics include how to understand PDB data, how to visualize structures, how to read coordinate files, and potential challenges in exploring the archive. A Table of Contents appears on the right side of every page so at any time, users can access the individual pages: Biological Units, Dealing with Coordinates, Methods for Determining Structure, Missing Coordinates and Biological Units, Molecular Graphics Programs, Resolution, and R-value and R-free. Looking at Structures is available from the General Education section of the left-hand menu at www.pdb.org. Publications: 2008 Annual Report and More The 2008 Annual Report features current progress and accomplishments, and explores the RCSB PDB's different activities in data deposition, query, and education. The 2008 report highlights milestones, such as the publication of the 100th installment of the Molecule of the Month, and online resources such as Advanced Search. This publication is currently being distributed to the diverse community of PDB users in academia, industry, and education. If you would also like a printed copy, please send your postal address to info@rcsb.org. Interested in an Education Corner from an older newsletter? Want to know what papers have been published that discuss the RCSB PDB project? Looking for a flyer to guide you through the deposition process? The News and Publications page offers access to all RCSB PDB publications. Located in the General Information section of the left-hand menu, this page archives our Annual Reports on the history, mission, and yearly accomplishments of the project; publications in peer-reviewed journals; weekly news items about recent features and upcoming events; the quarterly newsletter; and more. Requests for printed copies may also be sent to info@rcsb.org. -------------------------------------------- EDUCATION CORNER: The Science Learning Center at Brookhaven National Laboratory by Bernadette Uzzi, Coordinator Many science educators face the challenge of successfully making the connection between what is learned in a classroom and real science. At Brookhaven National Laboratorys Science Learning Center, the two go hand in hand; its science-based educational facility is located within a world-class scientific research facility. The Science Learning Center offers hands-on lab experiences encompassing such disciplines as biology, chemistry, physics, and nanotechnology for secondary level students. These programs are aligned with the National Science Education Standards, and each program highlights the Laboratorys scientific research and achievements. The Protein Data Bank (PDB) originated at Brookhaven in 1971 and was managed by the Lab until 1999. Many of the protein structures stored in the PDB were experimentally determined using intense X-ray beams at the Laboratorys National Synchrotron Light Source (NSLS). Many important discoveries were made at the NSLS. For example, the discovery of key proteins, OspA and OspC, located on the outer surface of the bacterium that causes Lyme disease, led to the development of Lyme vaccines for humans. Also at this laboratory, scientists were able to get a three-dimensional image of a virus enzyme, adenovirus protease, which may lead to the development of new anti-viral drugs. Both of these protein structures can be found in the PDB. How does this complex information get translated to a level that a middle school student can understand? The answer: through hands-on science. At the Science Learning Center, we focus on making abstract scientific concepts real using activities grounded in research done at BNL. We offer a variety of classes on challenging subjects, including DNA Extraction, Gene Transfer and Genetic Engineering, and Protein Structural Biology in 3D: The Shape of Things to Come. These classes are scalable, so they can offer an introduction to the material for students or reinforce what was already learned in their classrooms as needed. Students come to the Science Learning Center to build cell models and perform DNA extractions. They genetically transform bacteria with a jellyfish gene, Green Fluorescent Protein (GFP), and culture the green bacteria that fluoresce under UV light. Students purify the GFP and learn how scientists use this same technique. The students don stereographic glasses and view a variety of molecules, including GFP, in a 3D visualization theater. Our educators make the connection between DNA, protein shape and function, and how that function is expressed as a trait. The students have the opportunity to use a tool that scientists at Brookhaven use, the PDB. The students are given laptop computers so they can search for the GFP structure. They manipulate the protein nicknamed the light in the can and seek out additional fluorescent proteins and make comparisons to the structures and traits. Students are encouraged to continue this research on their own. Using scientific tools is an effective way to motivate and excite students of all ages about science. Developing an understanding of why it is important and relevant to everyday life is a challenge. The Science Learning Center has found great success in bridging the gap between classroom learning and world-class scientific research. The Office of Educational Program's Science Learning Center (www.bnl.gov/slc) offers programs to students in grades 1-12, featuring interactive exhibits, hands-on labs, and programs that demonstrate basic scientific principles and utilize the inquiry method of teaching. Brookhaven National Laboratory is operated and managed for the U.S. Department of Energys Office of Science by Brookhaven Science Associates, a limited-liability company founded by the Research Foundation of the State University of New York on behalf of Stony Brook University and Battelle, a nonprofit, applied science and technology organization. -------------------------------------------- PDB Community Focus: Johannes Kirchmair, Ph.D., and Gerhard Wolber, Ph.D. University of Innsbruck Q: You recently published an extensive paper describing the PDB archive, its history, and related resources. What surprised you when you were preparing this manuscript? A: We were overwhelmed by the number and diversity of tools provided by the PDB portals and related websites. Before this work, we regularly used only a small part of PDB-related tools, simply because we were accustomed to them. After beginning to thoroughly investigate services and software available for structure-based PDB-related drug development, we immediately started using many of these tools and applications for research as well as for teaching. Another positively surprising fact was that many of these approaches take care of small organic ligands, while providing a high level of cross-linking; i.e., it becomes possible to solve a specific problem by jumping from one service to another one without losing intermediate results. The best thing is that most of these services are free for non-commercial use despite their high quality. Naturally, this is great for teaching students, since we have access to a similar level of information as an industrial environment. The PDB bridges the two worlds of biology (macromolecules) and of medicinal chemistry (small molecules); it also provides a large quantity of easy-to-use tools for scientists that may not have been too much involved in computational chemistry or modeling so far. We see a strong trend for chemists to use PDB data to derive new ideas for synthesis and SAR within a short time without the need for installing any software; everythings on the webfree for academics! Q: What do you think your online category of PDB-related tools (www.uibk.ac.at/pharmazie/phchem/camd/pdbtools.html) will look like in 10 years? A: Looking at the development in the past few years, we hopeand are confidentthat ligand chemistry will become more important to the PDB. A tighter integration with initiatives like PubChem certainly bears great potential, such as being able to correlate ligand similarity with binding pocket similarity, which could lead to integrating virtual screening tools into the web interface of the PDB. 3D pharmacophores could be a good way to formulate the interaction of a ligand with its surrounding protein. Another possibility is that more software could be developed to further analyze the binding site. Protein-ligand-docking is also an interesting but currently controversial topic. If docking were to be regarded less commercially, eventually the PDB could offer a freely parameterizable docking toolbox that could help solving the scoring problem by large-scale statistics. We also hope that there will be more membrane proteins crystallized in the next 10 years, which would trigger the creation of a plethora of new tools that deal with membrane-drug interactions and homology modeling. Q: How do you use the PDB when training pharmacy students? A: The RCSB PDB is an invaluable resource for teaching: the web application has improved so much in the past few years that many aspects of computational chemistry teaching can be directly covered using the standard RCSB PDB interface. Visualization of the proteins and binding pockets are only one; the ability to perform sequence similarity searches and the EC-classification to identify similar proteins with and without bound ligands are others. We also use the PDB as input to our own tools, such as the 3D pharmacophore generator LigandScout for developing structure-based 3D pharmacophores. For teaching medicinal chemistry, the clarity of the RCSB PDB web interface allows for demonstrating essential structure-activity relationships (e.g., Which geometric chemical features are essential for ligand binding? If there is a reactive group on the ligand, why is an irreversible inhibitor bad? The PDB structure complex can show that the ligand is covalently bound to its co-factor). Q: What are some challenges facing structure-based drug discovery today? A: The ligand affinity problem: protein-ligand docking has frequently addressed, but never solved this challenge. The practical approach that most scientists choose is to define rule-based scoring functions for their problems, and for that the PDB can help to better understand a problem by providing experimental data. However, there are still several issues with how the PDB stores small organic molecules. In some cases, it is still impossible to get the correct chemistry of a ligand from the PDB in an automated way. Sometimes, crystallographers do not pay much attention to the ligand or crystal waters and ions. Hence, it could be useful to store initial, un-refined electron densities without model bias only for the ligand to allow for re-interpretation. Other challenges are the lack of crystal structures for important protein classes, such as membrane proteins, and protein flexibility, especially conformational flexibility at the binding site which could be analyzed by multiple X-ray structures of one and the same target interaction site with multiple ligands. Q: What are some of the new exciting opportunities in drug discovery? What role would the PDB play in these? A: The large collection of useful tools shows that the PDB provides extremely useful data for drug discoveryalso for regarding small molecules, which probably has never been the primary focus of the PDB. Ligand Expo shows that ligands are becoming important, and we see a huge potential in paying more attention to ligand chemistry. Getting correct ligands directly from the PDB bears the potential of providing a lot of new cross-linking applications. Structure-based parallel screening and polypharmacology approaches are exciting topics that seem to be tailored for a database like the PDB. ---------------------------------------- STATEMENT OF SUPPORT The RCSB PDB is supported by funds from the National Science Foundation, the National Institute of General Medical Sciences, the Office of Science, Department of Energy, the National Library of Medicine, the National Cancer Institute, the National Institute of Neurological Disorders and Stroke, and the National Institute of Diabetes & Digestive & Kidney Diseases. The RCSB PDB is managed by two partner sites of the Research Collaboratory for Structural Bioinformatics: RUTGERS Rutgers, The State University of New Jersey Department of Chemistry and Chemical Biology 610 Taylor Road Piscataway, NJ 08854-8087 SDSC/Skaggs/UCSD San Diego Supercomputer Center and the Skaggs School of Pharmacy and Pharmaceutical Sciences University of California, San Diego 9500 Gilman Drive La Jolla, CA 92093-0537 RCSB PDB LEADERSHIP TEAM Dr. Helen M. Berman - Director Rutgers University berman@rcsb.rutgers.edu Dr. Martha Quesada - Deputy Director Rutgers University mquesada@rcsb.rutgers.edu Dr. Philip E. Bourne - Associate Director SDSC/Skaggs/UCSD bourne@sdsc.edu A list of current RCSB PDB Team Members is available from the website. The RCSB PDB is a member of the Worldwide PDB (www.wwpdb.org) -------------------------------------------- SNAPSHOT January 1, 2009 55072 released atomic coordinate entries * Molecule Type 50854 proteins, peptides, and viruses 1949 nucleic acids 2236 protein/nucleic acid complexes 33 other * Experimental Technique 47132 X-ray 7627 NMR 209 electron microscopy 104 other 36256 structure factor files 4313 NMR restraint files