RCSB PDB Newsletter Number 35 -- October 2007 Published quarterly by the Research Collaboratory for Structural Bioinformatics Protein Data Bank Weekly RCSB PDB news is published at www.pdb.org To change your subscription options, please visit lists.sdsc.edu/mailman/listinfo.cgi/rcsb-news ----------------------------------------- TABLE OF CONTENTS Message from the RCSB PDB Data Deposition and Processing Structure Deposition Overview Deposition Statistics Annotation Job Opening at the RCSB PDB Data Query, Reporting, and Access New Query and Reporting Capabilities and Features Website Statistics Outreach and Education Meeting Report: ACA, ISMB, BSR, and ACS Art of Science Update 2007 RCSB PDB Poster Prizes Awarded at ACA, ISMB, and ECM Education Corner: Navigating the Molecular Universe in 3D: Teaching biology students protein structure-function relationships using StarBiochem. By Dr. Melissa Kosinski-Collins, Brandeis University PDB Community Focus: Dr. Philip E. Bourne, RCSB PDB Statement of Support, Partners, Leadership Team Snapshot -------------------------------------------- MESSAGE FROM THE RCSB PDB In September, the RCSB PDB hosted a series of important meetings at the Chauncey Conference Center in Princeton, New Jersey. These meetings, held back-to-back, provided an unprecedented opportunity for discussion and planning. The RCSB PDB Advisory Committee, an organization of international experts in X-ray crystallography, NMR, 3-D EM, bioinformatics, and education, listened to presentations and discussed future plans and goals. This review was followed by the Worldwide Protein Data BankÕs Advisory Committee (wwPDB AC) Meeting. This panel of expert structural biologists includes representatives from the International Union of Crystallography and the International Conferences on Magnetic Resonance in Biological Systems. After the advisory committee considered reports from the wwPDB, a "Funding Forum" took place. In this session, the wwPDB AC sought advice from the representatives present from the agencies that fund the individual groups about funding options for the continued operation of the wwPDB organization. Representatives from the academic and industrial research communities that rely on the PDB for their research efforts also described of the PDB value for those present. To take advantage of having so many people in the same place, a retreat was held for members from all of the wwPDB sites. September's retreat was attended by nearly 50 people from the four groups. While the wwPDB sites interact regularly, this was the first meeting on such a large scale. Many colleagues met in person for the first time. The retreat also provided an opportunity to celebrate the August release of the remediated PDB archive (ftp://ftp.wwpdb.org). This wwPDB milestone represents years of work and unprecedented international collaboration. At the start of the meeting, the wwPDB team was treated to presentations from PDB users and advisorsÐEdward N. Baker (Professor of Structural Biology, University of Auckland), Angela Gronenborn (UPMC Rosalind Franklin Professor and Chair, Department of Structural Biology, University of Pittsburgh), Gerard Kleywegt (Research Fellow of the Royal Swedish Academy of Sciences, Research Scientist, Uppsala University), Marin Van Heel (Professor of Structural Biology, Imperial College London), and Soichi Wakatsuki (Professor, Structural Biology Research Center, High Energy Accelerator Research Organization, Japan). The retreat then focused on discussing how the wwPDB could evolve with and anticipate the needs of the scientific community. Special thanks to all of the advisors, funding agency representatives, and wwPDB collaborators who traveled to New Jersey for these important meetings. -------------------------------------------- DATA DEPOSITION AND PROCESSING Structure Deposition Overview Structures can be deposited to the wwPDB using the tools ADIT, ADIT-NMR, or AutoDep. Data deposited to the archive is processed using agreed-upon standards for full validation of the data. These data are forwarded to the RCSB PDB for release into the archive. wwPDB members also maintain websites that provide different views of the data. "5 Easy Steps for Fast, Accurate, and Complete Data Deposition using the ADIT system" was presented this summer by Lead Annotator Jasmine Young at the American Crystallographic Association's Annual Meeting. For a description of this presentation, please see www.rcsb.org/pdb/general_information/news_publications/newsletters/2007q3/ Deposition Statistics In the first three quarters of 2007, 6358 structures were deposited to the PDB archive and processed by the wwPDB. Of the structures deposited, 66.9% were deposited with a release status of "hold until publication"; 19.2% were released as soon as annotation of the entry was complete; and 13.9% were held until a particular date. 85.4% of these entries were determined by X-ray crystallographic methods; 14.2% were determined by NMR methods. 86.6% of these structures were deposited with experimental data. 94.1% of the crystal structures were deposited with structure factors; 43.6% of NMR structures were deposited with restraints. Annotation Job Opening at the RCSB PDB In addition to curating data, annotation staff at the RCSB PDB are involved in a variety of educational and outreach projects, attend professional society meetings, and assist in software development. This position offers the opportunity to participate in an exciting project with significant impact on the scientific community. To apply, please send your resume to Dr. Helen M. Berman at pdbjobs@rcsb.rutgers.edu. -------------------------------------------- DATA QUERY, REPORTING, AND ACCESS New Query and Reporting Capabilities and Features Since the RCSB PDB website and database utilize data from the wwPDB Remediation Project, queries now return more accurate results. New developments in query and reporting features also provide improved access to these data. * Access to Remediation and Pre-remediation Data All data in the PDB archive (ftp://ftp.wwpdb.org) reflect the new features incorporated as part of the wwPDB Remediation Project, including standardized IUPAC nomenclature for chemical components. These data have been incorporated into the RCSB PDB website and database to provide improved searching and reporting capabilities. Access to the unremediated data is possible for individual structures and for the entire archive. The left hand menu of each Structure Summary page provides download options for either remediated or unremediated data in a variety of formats. The Remediation Tab will appear on this page to describe any changes to chain and residue naming conventions made for consistency in the archive. An example description would be "This structure's single unnamed chain was assigned chain id A." A snapshot of the entire unremediated PDB archive (as of July 31, 2007) is available at ftp://ftp.rcsb.org. This archive will not be updated. * Advanced Search The data in the PDB archive offers a wealth of valuable metadata. Advanced Search is a powerful and easy-to-use interface to the underlying search architecture and remediated data. Complex queries are constructed by combining simple "subqueries" chosen from a drop-down list. Users get a feel for the likely success of their search strategy while constructing the search by checking the number of results for each subquery. A broad range of subqueries is available including sequence searches; Gene Ontology (GO) assignments; SCOP and CATH domain assignments; and author name searches. These subqueries may be combined into a complex query by searching "all" or "any" of the user-specified subqueries. * Improved Sequence Details The Sequence Details tab offers a customizable report that displays polymer chain sequences annotated with properties such as domain and secondary structure. This feature utilizes data from the Remediation Project to provide an exact mapping of the structure sequence to the UniProt sequence. Annotations from CATH, DSSP, PDP, and the author-approved secondary structure can be applied to either the sequence in UniProt or in the PDB entry's SEQRES information. The size of the report can be customized for use in presentations. * Search Result Tabs Keyword or Advanced Searches will also return different ways of exploring the search results list. Options available from the tabs shown above the default results list include: ** Citations: The primary citations for all structures have been verified as part of the Remediation Project. This improved mapping between structure and associated reference is reflected in the database. The Citations Tab provides a PubMed-like list of the primary citations for the structures that match a query. ** Ligand Hits: This tab lists the ligands known to interact with the structure matching the query. For example, a keyword search for "protein kinase" will return all ligands known to bind protein kinases. Linked images, names, IDs, and formulas appear for each ligand. ** Web Page Hits: Any of the more than 900 curated web pages found at the RCSB PDB website, including Molecule of the Month features, that contain a requested keyword are found on this tab. ** GO, SCOP, CATH Hits: These tabs link to the structures that have the same mapping in the GO, SCOP, and CATH resources. Entries are returned in a tree browser that indicates where these structures reside in the respective hierarchies. The SCOP tab, for example, indicates which hits belong to which class of proteins. For references, please see www.rcsb.org/pdb/general_information/news_publications/newsletters/2007q3/ Website Statistics Access statistics for the second quarter of 2007 are given below for the RCSB PDB website at www.pdb.org. Month Unique Number of Bandwidth Visitors Visits Jul 2007......93,719.....244,152.......592.23 GB Aug 2007......87,494.....225,482.......380.69 GB Sep 2007.....118,631.....294,060.......482.76 GB -------------------------------------------- OUTREACH AND EDUCATION Meeting Report: ACA, ISMB, BSR, and ACS * Thanks to everyone who stopped by the RCSB PDB exhibit booth for demonstrations of the RCSB PDB website and discussions about the remediated data at the American Crystallographic Association's Annual Meeting (ACA; July 21-26 in Salt Lake City, UT). We also appreciate those who viewed the poster "Remediation of the PDB Archive." The session "Informatics in Structural Biology," organized by John Westbrook (RCSB PDB) and Kim Henrick (MSD-EBI), focused on the applications of structural informatics and inspired a lot of interesting conversations. Annotator Jasmine YoungÕs presentation at the Fun Lectures for Young Scientists symposium is described in this newsletter. * Demonstrations of the RCSB PDB website and the Art of Science exhibit were found at the 15th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) & 6th European Conference on Computational Biology. * A poster about the BioSync resource (biosync.rcsb.org) was presented at the 9th International Conference on Biology and Synchrotron Radiation (BSR) by Judith L. Flippen-Anderson (August 13-17; Manchester, UK). * At the American Chemical SocietyÕs National Meeting (ACS), Shuchismita Dutta presented a poster describing the information contained in the remediated Chemical Component Dictionary and how this dictionary was used by the wwPDB to help remediate the PDB archive (August 19-23; Boston, MA). Art of Science Update The Art of Science is a traveling exhibit of images from the RCSB PDB website and the Molecule of the Month. This exhibit was recently hosted by the International Society for Computational Biology at the ISMB meeting. The show also was on display from September 25 Ð October 5 at The University of Texas Southwestern Medical Center in Dallas, Texas. That show was sponsored by the Molecular Biophysics Graduate Program, and presented in conjunction with the Molecular Biophysics' Meet the Program Poster Presentations. If you would be interested in sponsoring this exhibit at your institution, please let us know at info@rcsb.org. 2007 RCSB PDB Poster Prizes Awarded at ACA, ISMB, and ECM Thanks to everyone who participated in the recent RCSB PDB Poster Prize competitions. The creators of the best student posters related to macromolecular crystallography the ACAÕs Annual Meeting and the European Crystallographic AssociationÕs Meeting (ECM) and in the Structure and Function Prediction category at the ISMB meeting were awarded a subscription to Science and a related book. The same prize will also be awarded at the Asian Crystallographic Association meeting later this year. For a list of winners, please see www.rcsb.org/pdb/static.do?p=general_information/about_pdb/poster_prize_2007.html -------------------------------------------- EDUCATION CORNER: by Dr. Melissa Kosinski-Collins, Brandeis University Navigating the Molecular Universe in 3D: Teaching biology students protein structure-function relationships using StarBiochem StarBiochem is the result of collaboration between the Department of Biology, the Academic Computing Group of the Department of Information Services and Technology, and the Department of Physics at the Massachusetts Institute of Technology (MIT). The founding members of the project include Dr. Melissa Kosinski-Collins, Dr. Graham Walker, Dr. John Belcher, Michael Danziger, Charles Shubert, and Ivica Ceraj. We have been privileged to be assisted by many other talented individuals including Andrew McKinney, Justin Riley, Violeta Ivanova, Professor Dan Hastings (Dean of Undergraduate Education at MIT), Dr. Vijay Kumar (Associate Dean and Director of the Office of Educational Innovation and Technology), and Dr. Jerry Grochow (MIT Vice President of Information Services and Technology). We further have been supported by the educational efforts of Dr. Julia Khodor, Dr. Megan Rokop, Dr. Mandana Sassafar, and Dr. Robyn Tanny in introductory biology courses at MIT and in the high school outreach efforts. Funding for this work was provided in part by the Department of Information Services and Technology at MIT, a Howard Hughes Medical Institute (HHMI) Professorship Grant awarded to Graham Walker, and a grant from the Davis Educational Foundation Grant awarded to John Belcher. The Academic Computing group that participated is this project is now the Software Tools for Academics and Researchers (STAR) group in Office of Educational Innovation and Technology for the Dean of Undergraduate Education at MIT. Melissa Kosinski-Collins is now an Assistant Professor of Biology at Brandeis University. "Why does a protein do its job in the cell? Because of its shape and chemistry." As biologists, we understand and appreciate the overwhelming knowledge value in these simple statements, but as educators, we battle to try to make the protein structure-function relationship clear to our students. It is easy for us to understand this now, but to get through to our students we need to remember back to a time when we did not understand. How did we originally learn this? The answer is simple: practice. It is clear that a very efficient way to teach to the structure-function relationship comes from letting students view some of the many deposited PDB molecules in a 3D environment. Many of the stereotypically structural Òah-haÓ moments come from this type of hands-on interaction with the molecule. Students can not only identify binding pockets and partners, see disease-associated mutations, and observe structural contexts, but they can physically manipulate and, in a sense, control the molecule in real-time. Students of introductory biology need these types of hands-on experiences as well as practice with multiple molecules to really "get" structural biology. Implementing such an interactive yet understandable series of exercises in the average college-level introductory biology course is a daunting task for many reasons. These hurdles include class size, computer and technology access both in the classroom and at home, time devoted to the topic in the syllabus, time involved in creating this type of homework, and the level of understanding of the incoming student. Although there are many freely available software packages that allow the students to explore in 3D, few present the material in a format that makes sense to the average biology student and are simple enough so that the student can use the program outside of the classroom on their own for additional practice. In 2004, a project was begun at MIT to create a new program that filled the pedagogical void left in the world of structural biology. We wanted to create a viewer and a series of exercises that presented structures and functions in the same way we presented them in class that was usable outside of the classroom without staff supervision, and that allowed students many of the freedoms and exploratory options of the research-level PDB viewers. This beta version of this software was named StarBiochem. StarBiochem has one particular option that has become paramount to its success in the context of biology education. In class we invariably introduce protein structure as a build-up of primary, to secondary, to tertiary, to quaternary structures. Most software packages avoid mention of these levels altogether leaving the student to wonder where the levels fit in and how they are related to 3D structure they see on the screen. StarBiochem can open any protein PDB coordinate file and categorize it into these different levels allowing the student to conceptually analyze the 3D structure that they see on their screen. In the examples of hemoglobin and sickle cell anemia, the student is asked to look first at the primary structure change in the molecule, but then to determine at which structural level the disease manifests itself. Using the program as a conceptual guide, the students are asked to understand that the primary structure change from glutamic acid to valine at position 6 does not manifest as a disease until you see a change of intermolecular interaction chemistry in the quaternary structure. StarBiochem was first piloted in an HHMI-sponsored high school field trip at MIT in March 2006. A series of guided exercises led students through an in-depth exploration of proteins with defined structure-function relationships, like sucrose-specific porin and hemoglobin. For example, the students were asked to look at the barrel-like structure of porin and reflect on how that shape might be conducive for molecular transport. The students were further asked to investigate the outer chemistry of the molecule and determine how the hydrophobic exterior of the protein might influence the ability of the protein to remain stable in its cellular membrane-bound location. StarBiochem was found to be an effective, and easy-to-use teaching tool in this context and is now being used by several of the visiting teachers as a curricular tool in their classroom. The StarBiochem high school initiative is now being further disseminated in MIT and Harvard's Broad Institute Outreach Program. For the rest of this article, please see www.rcsb.org/pdb/general_information/news_publications/newsletters/2007q3/ StarBiochem is freely available for download at web.mit.edu/star/biochem. Questions about StarBiochem may be sent to kosinski@brandeis.edu. -------------------------------------------- PDB COMMUNITY FOCUS: Dr. Philip E. Bourne, RCSB PDB Philip E. Bourne is a Professor in the Department of Pharmacology at the University of California, San Diego, co-director of the RCSB PDB, and an Adjunct Professor at the Burnham Institute and the Keck Graduate Institute. He received his Ph.D. in chemistry from the Flinders University of South Australia in 1980 where he studied the structural and electrophilic effects of substitution on fully saturated caged hydrocarbon molecules. While a post-doctoral fellow at Sheffield University UK he contributed to the understanding of the structural role of ferritin in iron storage. Later as a Senior Research Scientist at Columbia University, he proposed mechanisms for the role of caracurines and snake toxins that operate postsynaptically. During the 80's, first as the Director of the Cancer Center Computer Facility, and later Director of the Medical School Computer Facility at Columbia, he helped establish a tumor registry and various applications and databases in support of patient care. As a Senior Associate of the Howard Hughes Medical Institute in the early 90s, he worked on developing high performance hardware and software for computational structural biology. He moved to UCSD in 1995 to work on structural bioinformatics. His current research interests are in structural genomics, the structural basis of evolution and immunology, apoptosis, cell signaling, data and knowledge modeling and scientific visualization. Bourne is an elected Fellow of the American Medical Informatics Association and past President of the International Society for Computational Biology. He is the Founding Editor-in-Chief of the open access journal PLoS Computational Biology, on the Advisory Board of Biopolymers and on the Editorial Boards of Proteins: Structure Function and Bioinformatics, Biosilico and IEEE Trends in Computational Biology and Bioinformatics. He is the author of over 200 scientific papers and 4 books. He has received two UCSD Connect Awards for new inventions in the areas of comparative protein structure analysis and shared visualization. He was the recipient of the 2002 Sun Microsystems Convergence Award and the 2004 Convocation Medal for career achievement from his graduate university. He has co-founded four companies. Q: What is the current impact of the PDB archive on biology, and what is the future of the archive? A: Given that more than six million data sets are downloaded from the wwPDB ftp archives each month, clearly its impact is large. The archive is recognized as a critical component in new drug discovery and development processes, and in the advancement of structural biology. While part of this usage is well understoodÐfor example, there are many instances where structure provided a better understanding of biological function in disease states that led to the treatment of those diseases through new drugsÐI suspect that there is a lot more to this story. A challenge in the next 5 years for all of the wwPDB is to better understand usage patterns and to help specific communities use the PDB archive in a way that would be the most beneficial to research and education. Education is of keen interest at the RCSB PDB. Students in grades K-12 will be the leading scientists of tomorrow, and make up a key focus for our outreach programs. Structure biology has an advantage, as it is a visual science that can captivate young people. The RCSB PDB reaches out to these students through resources such as the New Jersey Science Olympiad and the Molecule of the Month. Of course, we would very much like to do more. One way we could proceed would be to take advantage of changing usage patterns on the web. Students today are very communicative online and use various social networking sites for hours on end. They are also part of the ÒWiki GenerationÓ, where knowledge is defined by community input and consensus. Perhaps we at the RCSB PDB could capture this collective knowledge from teachers and students to create lessons around specific molecules and classes of molecules? Q: What about the older generation? A: For established life scientists, structure is often not a consideration and yet it has a great deal to offer. It is my experience that many life scientists associate molecular biology with DNA and protein sequences, and then skip structural biology to consider biochemical pathways, cellular processes, and whole cells and organisms. Let me give you an example from recent research work in my laboratory that makes this point using evolutionary biologists as the test case. Since the time of Darwin, evolution has been studied through simple observation by paleontologists, zoologists, and botanists. Molecular biology, through protein and DNA sequencing, has revolutionized these evolutionary studies and allowed us to confirm and adjust the tree of life. But sequence has its limitations. The sequence signal degrades over long evolutionary time scales, and distant relationships cannot be seen. Structure is far more conserved than sequence over evolutionary time scales. With our ability to map structures to the ever-increasing number of fully completed proteomes, new insights can be made. Very few evolutionary biologists think of using structure in this way. One recent study from our laboratory showed how the tree of life could be reconstructed just by considering whether given species did or did not contain specific structural superfamilies of proteins defined by SCOP. In my view, the RCSB PDB has a role in facilitating these new kinds of studies to bring them to the attention of a broader community. So in this example, we could facilitate these studies by mapping structural domains and their changing arrangements onto the tree of life. Q: Given these kinds of developments, where do you see the RCSB PDB in 10 years? A: The core mission of the RCSB PDBÐproviding timely delivery of high quality and complete structure data and useful and unique views of that data to enable scientific innovationÐwill not change. Of course, there will continue to be more and different types of data and the RCSB PDB will need to maintain these high standards of quality while catering to new types of delivery technology. It is hard to believe that the Internet has only been with us in a big way for ten years or so. Given the fundamental change in how we do science that has been bought about by the web, it is at least conceivable that how we do science will change even more dramatically in the future, even though we are hard-pressed to detail what those changes might be at this time. I would guess that we would need to provide data to people, software, and applications in seamless ways at very different degrees of granularity. Currently, most RCSB PDB queries return specific structures, but in the future you can imagine many more fine-grained requests from specific classes of scientist. For example, the pharmacy students I teach might use their handheld devices to ask a question like "we see significant instances of myocardial infarction in patients on select estrogen receptor modulator drugs like tamoxifen; what is the underlying biochemistry and molecular biology causing these side effects?" The RCSB PDB's role in this request could conceivably be to return and compare the receptors known to bind this class of drugs and allow the student to better understand the molecular implications. Inherent in this kind of request is the RCSB PDB's ability to integrate with other resources that permit the field of genomic medicine to advance and to return data such that non-specialists can answer their questions. These are significant (but fun) challenges. Q: Let's bring you back more to the immediate future. The wwPDB recently remediated the entire PDB archive. What effect has this had on the RCSB PDB's query and reporting engine? A: The remediation effort is fundamental to the more far-reaching developments like those I have just discussed. Consistent representation of the data we have and the data we will collect going forward is critical if we are to use the archive effectively and integrate with other sources of data. A very pragmatic example is the work that has gone into the Chemical Component Dictionary. As a result of this project, we can now reliably query ligands in the PDB archive through their names and/or chemical structures. For the rest of this article, please see www.rcsb.org/pdb/general_information/news_publications/newsletters/2007q3/ ---------------------------------------- STATEMENT OF SUPPORT The RCSB PDB is supported by funds from the National Science Foundation, the National Institute of General Medical Sciences, the Office of Science, Department of Energy, the National Library of Medicine, the National Cancer Institute, the National Center for Research Resources, the National Institute of Biomedical Imaging and Bioengineering, the National Institute of Neurological Disorders and Stroke, and the National Institute of Diabetes & Digestive & Kidney Diseases. The RCSB PDB is managed by two partner sites of the Research Collaboratory for Structural Bioinformatics: RUTGERS Rutgers, The State University of New Jersey Department of Chemistry and Chemical Biology 610 Taylor Road Piscataway, NJ 08854-8087 SDSC/Skaggs/UCSD San Diego Supercomputer Center and the Skaggs School of Pharmacy and Pharmaceutical Sciences University of California, San Diego 9500 Gilman Drive La Jolla, CA 92093-0537 RCSB PDB LEADERSHIP TEAM Dr. Helen M. Berman - Director Rutgers University berman@rcsb.rutgers.edu Dr. Philip E. Bourne - Co-Director SDSC/Skaggs/UCSD bourne@sdsc.edu A list of current RCSB PDB Team Members is available from the website. The RCSB PDB is a member of the Worldwide PDB (www.wwpdb.org) -------------------------------------------- SNAPSHOT October 1, 2007 46051 released atomic coordinate entries * Molecule Type 42350 proteins, peptides, and viruses 1787 nucleic acids 1881 protein/nucleic acid complexes 33 other * Experimental Technique 39184 X-ray 6621 NMR 154 electron microscopy 92 other 28451 structure factor files 3648 NMR restraint files