RCSB PDB Newsletter Number 38 -- July 2008 Published quarterly by the Research Collaboratory for Structural Bioinformatics Protein Data Bank Weekly RCSB PDB news is published at www.pdb.org To change your subscription options, please visit lists.sdsc.edu/mailman/listinfo.cgi/rcsb-news ----------------------------------------- TABLE OF CONTENTS Message from the RCSB PDB Data Deposition and Processing SF-Tool: A Tool for Crystallographic Experimental Data Validation Ligand Expo: A Small Molecule Resource Workshop on Next Generation Validation Tools for the wwPDB Deposition Statistics Data Query, Reporting, and Access Website Statistics Secondary Structure Information RCSB PDB Usage Statistics Outreach and Education RCSB PDB Meetings and Presentations RCSB PDB Poster Prize Awarded at ACA Meeting Princeton HS Wins NJSO Protein Modeling State Finals Papers Published Structural Displays Education Corner: BAMBED, a Journal for University, College, and High School Educators by Judith G. Voet and Donald H. Voet PDB Community Focus: John Norvell, National Institute of General Medical Sciences Statement of Support, Partners, Leadership Team Snapshot -------------------------------------------- MESSAGE FROM THE RCSB PDB The PDB archive reached a significant milestone in its 37-year history this past spring. The 50,000th molecule structure was released into the archive on April 8, 2008, joining other structures vital to biology, medicine, and education. The worldwide Protein Data Bank (wwPDB) has seen the archive double in size since 2004, and estimates that the size of the PDB archive will triple to 150,000 structures by the year 2014. The archive was founded in 1971 with seven structures at Brookhaven National Laboratory. Today, the wwPDB receives approximately 25 new experimentally-determined structures from scientists each day. These data are checked and processed by annotators located around the world. wwPDB members provide a variety of ways to explore PDB data by developing online databases that promote searching, reporting, and visualizing structures. The wwPDB will exhibit at the XXI Congressand General Assembly of the International Union of Crystallography (August 21-31; Osaka, Japan). Please stop by booth #14 to learn about the latest developments. -------------------------------------------- DATA DEPOSITION AND PROCESSING SF-Tool: A Tool for Crystallographic Experimental Data Validation A streamlined, web-based tool is available for validating crystallographic experimental data. SF-Tool (pdb-extract.rcsb.org/auto-check/index-ext.html) can be used to: * Validate your model coordinates against structure factor data * Easily translate your structure factor file between different formats * Check for twinned or detwinned data Ligand Expo: A Small Molecule Resource The Chemical Component Dictionary archives chemical and structural information about all of the small molecules found within PDB structure entries. Ligand Expo is a new tool that can access, visualize, and build reports about these data. With Ligand Expo (ligand-expo.rcsb.org), users can also * Search for a chemical component * Browse tables of components that contain o modified amino acids and nucleotides o popular drugs (trade and generic names) o common ring systems * Review related information in chemical dictionaries and resources * Download model and ideal chemical component coordinates * View all instances of a component in released PDB entries Workshop on Next Generation Validation Tools for the wwPDB A meeting of the wwPDB X-ray Validation Task Force was held to collect recommendations and develop consensus on additional validation that should be performed on PDB entries, and to identify software applications to perform validation tasks. The workshop was organized by Randy Read (Cambridge University), and sponsored by the RCSB PDB & PDBe. Detailed information about the workshop is available at www.wwpdb.org/workshop/2008/index.html. Deposition Statistics In the second quarter of 2008, 1689 experimentally-determined structures were deposited to the PDB archive. The entries were processed by wwPDB teams at the RCSB PDB, PDBe, and PDBj. Of the structures deposited, 76.8% were deposited with a release status of "hold until publication"; 14.3% were released as soon as annotation of the entry was complete; and 8.9% were held until a particular date. 89.5% of these entries were determined by X-ray crystallographic methods; 9.5% were determined by NMR methods. During the same time period, 2534 structures were released into the archive. -------------------------------------------- DATA QUERY, REPORTING, AND ACCESS Website Statistics Access statistics for www.pdb.org and ftp://ftp.wwpdb.org for the second quarter of 2008 are given below. Month Unique Number of Bandwidth HTTP FTP Visitors Visits Downloads Downloads Apr 2008.....134119.....309222.......585.77 GB..3463641...15633685 May 2008.....123862.....286612.......607.73 GB..3237762...10749266 Jun 2008.....132168.....317814.......651.02 GB..3328550...8903617 Secondary Structure Information A plain text file containing sequence and secondary structure information in FASTA format for all structures is available from www.rcsb.org/pdb/files/ss.txt. A separate file, www.rcsb.org/pdb/files/ss_dis.txt, includes disordered regions in addition to the secondary structure. These files, which replace analogous ones previously found in the FTP tree, are updated weekly. RCSB PDB Usage Statistics Approximately 140,000 unique visitors explore the RCSB PDB website at www.pdb.org each month (identified by unique IP address). During this period, these users download more than 500 GB (GigaBytes) of data. At the same time, about 7,000 unique visitors download more than 10 million files from the FTP site at ftp://ftp.wwpdb.org, for a total of about 2 TB (TeraBytes) of data. The PDB archive is also accessed through FTP sites supported by wwPDB members PDBe and PDBj. There were more than 100 million FTP file downloads, with peak download rates of more than 10 files per second, during the six-month period that followed the August 2007 release of the archive of remediated data. In comparison, less than 10 million files were downloaded in all of the year 2000. Visitors from 150 countries visit the RCSB PDB FTP and website. Approximately one third of all visitors are from the United States, another third from Europe, and a final third from the rest of the world. Since users are accessing data from around the globe, traffic is evenly distributed over the course of 24 hours. On an average day, download rates never drop below 50% of peak daytime values. Standard server log information, such as IP address, time spent on the site, and browser type, is collected and assessed on an aggregate, rather than individual, basis in order to track site statistics, identify data popular to our visitors, monitor performance, and troubleshoot. Studying access statistics and usage patterns helps to project future hardware needs, and aids in the design of new functionality. We also use this information for site and system security. We do not share server log information with third parties for marketing or other purposes. -------------------------------------------- OUTREACH AND EDUCATION RCSB PDB Meetings and Presentations The RCSB PDB has been exhibiting at several meetings, including: * The American Crystallographic AssociationÕs Annual Meeting (May 31-June 5, 2008; Knoxville, TN) * The Experimental Biology Annual Meeting, which is attended by the members of the American Society for Biochemistry and Molecular Biology (April 5-9, 2008; San Diego, CA) RCSB PDB members routinely give oral presentations at meetings. Here are recent highlights: * John Westbrook described the process of "Automating PDB Deposition" at the Cold Spring Harbor LaboratoryÕs (CSHL) X-ray Crystallography Course X-ray Methods in Structural Biology (April 29-May 15, 2008). This yearÕs class was held at the Chinese Academy of Sciences in Beijing, China and chaired by Zihe Rao. For a description of this class, please see the Winter 2007 Education Corner article by one of the organizers, Gary Gilliland. * As part of the Sigma Xi Distinguished Lecturers series for 2008-2009, Director Helen Berman discussed ÒHow the History of the Protein Data Bank Informs the Future of BiologyÓ at Ramapo College (NJ), SUNY College at Old Westbury, and Indiana University of Pennsylvania. She also spoke at Montana State University as part of the Women In Bioinformatics seminar series. * Peter Rose led an RCSB PDB workshop and a demonstration at From Molecules to Medicine: Integrating Crystallography in Drug Discovery (May 29 - June 8, 2008; Erice, Italy) RCSB PDB Poster Prize Awarded at ACA Meeting The RCSB PDB Poster Prize for best student poster related to macromolecular crystallography at the American Crystallographic Association's Annual Meeting (May 31 - June 5, 2008; Knoxville, TN) went to Wei Yong for "X-ray crystallographic studies of pig sarcosine dehydrogenase" (Wei Yong, Ila Misra, Jung-Ja Kim, Medical College of Wisconsin). Yong will receive a subscription to Science and an International Tables of Crystallography volume of his choosing. Thanks to everyone who participated, especially our judges: Robert Rose (Chair; North Carolina State University); Gloria E.O. Borgstahl (Eppley Institute for Cancer Research and Allied Diseases); Antonella Longo (North Carolina State University); Robert McKenna (University of Florida); and Joseph E. Wedekind (University of Rochester). Princeton HS Students Win NSJO Protein Modeling State Finals The team from Princeton High School that came in first place at the 2007 Protein Modeling event at the Science Olympiad State Finals in New Jersey held on to their title at this yearÕs competition. Teams from all over the Garden State presented their hand-built 3D models of a calmodulin protein, along with an abstract, to be judged by staff from the RCSB PDB at the March 11 meet. At the competition, teams built a model of a selected region of the structure using Jmol and took a written exam about the structure. Teams used the RCSB PDB Molecule of the Month and other resources to help prepare for this event. Following Princeton High School (First Place) came Livingston High School (Second), and West Windsor-Plainsboro High School North (Third). Congratulations to all participating teams--there were many great models, abstracts, and responses to the written exam. Pictures of the event and rubrics used in judging are available at education.pdb.org/olympiad. Questions about the NJ Science Olympiad Protein Modeling trial event should be sent to buildmodels@deposit.rcsb.org. Congratulations to National Tournament Champions At the Science Olympiad National Tournament held on May 30-31, 2008 at The George Washington University, New JerseyÕs West Windsor-Plainsboro High School North won first place at the protein modeling event. Papers Published * wwPDB deposition tools, methods (including validation), and policies are discussed in: Data deposition and annotation at the Worldwide Protein Data Bank. Shuchismita Dutta, Kyle Burkhardt, Ganesh J. Swaminathan, Takashi Kosada, Kim Henrick, Haruki Nakamura, Helen M. Berman (2008) in Methods in Molecular Biology, vol. 426: Structural Proteomics: High-Throughput Methods (Bostjan Kobe, Mitchell Guss, Thomas Huber, eds.), pp. 81-101. * Issues relating to NMR depositions are outlined in: BioMagResBank (BMRB) as a partner in the Worldwide Protein Data Bank (wwPDB): new policies affecting biomolecular NMR depositions. John L. Markley, Eldon L. Ulrich, Helen M. Berman, Kim Henrick, Haruki Nakamura, and Hideo Akutsu (2008) J Biomol NMR 40(3): 153-155. * Resources and efforts to ensure that PDB data are used by scientists, students, and teachers inside and outside of the structural biology community are described in: Interesting structures: Education and outreach at the RCSB Protein Data Bank. Christine Zardecki (2008) PLoS Biology 6:e117. doi: 10.1371/journal.pbio.0060117 Structural Displays Monitors that show rotating proteins and nucleic acids are on display on the campuses of UCSD and Rutgers. The set of entries highlighted includes structures solved on campus and structures related to human health. The program ÒfloatsÓ each structure across the screen before moving on to the next. -------------------------------------------- EDUCATION CORNER: BAMBED, a Journal for University, College, and High School Educators by Judith G. Voet and Donald H. Voet Donald Voet received a B.S. in Chemistry from the California Institute of Technology, a Ph.D. in Chemistry from Harvard University with William Lipscomb, and did postdoctoral research in the Biology Department at MIT with Alexander Rich. As an associate professor at the University of Pennsylvania, Dr. Voet uses X-ray crystallography to study the structure of biologically interesting molecules, including yeast inorganic pyrophosphatase and granulocyte-macrophage colony-timulating factor. Judith ("Judy") Voet received her B.S. in Chemistry from Antioch College and her Ph.D. in Biochemistry from Brandeis University with Robert H. Abeles. She has done postdoctoral research at the University of Pennsylvania, Haverford College, and the Fox Chase Cancer Center. Dr Voet is the James H. Hammons Professor, Emeritus, at Swarthmore College. Her main area of research involves enzyme reaction mechanisms and inhibition. This husband and wife team serve as joint editors-in-chief of the journal Biochemistry and Molecular Biology Education (BAMBED), published by John Wiley & Sons for the International Union of Biochemistry and Molecular Biology (IUBMB). They have also co-authored the seminal textbook Biochemistry, now in its third edition, and Fundamentals of Biochemistry, along with Charlotte Pratt, now in its third edition. The aim of Biochemistry and Molecular Biology Education (BAMBED) is to enhance teacher preparation and student learning in biochemistry, molecular biology, and related sciences such as biophysics and cell biology, by promoting the worldwide dissemination of educational materials. BAMBED seeks and communicates articles on many topics, including: * innovative techniques in teaching and learning, * new pedagogical approaches, * research in biochemistry and molecular biology education, * reviews on emerging areas of biochemistry and molecular biology to provide background for the preparation of lectures, seminars, student presentations, dissertations, etc., * historical reviews describing past research under the title: Paths to Discovery, * novel and proven laboratory experiments that have both skill-building and discovery-based characteristics, * reviews of relevant textbooks, software, and websites, * descriptions of software for educational use, and * descriptions of multimedia materials such as tutorials on various aspects of biochemistry and molecular biology. The journal is published bimonthly by John Wiley & Sons for the International Union of Biochemistry and Molecular Biology (IUBMB). All articles are freely available after a 2-year hold. Four years of BAMBED articles (2002Ð 2005) are already freely available at the website (www. bambed.org). Soon we will add PDFs of back issues from 2000 and 2001. One topic that has received increasing attention over the years is protein structure and its relationship to function. At one time a subject for graduate and post-doctoral study, now even high school students are exposed to the amazing field of protein structure. BAMBED has been an important tool in helping educators expand their ability to teach this important subject as more and more information has become available. Dissemination and use of this information has been made possible by the concomitant growth in the power of computers and computer software to allow visualization of these structures. In the 1960s and 1970s, students of protein structure built physical models of the few proteins for which structures were available, and molecular artists like Irving Geis drew 2D representations of these models. Now students visualize these structures on laptop computers using data from the RCSB PDB and freely-available software programs, or even directly from web browsers. One of BAMBEDÕs roles has been to help educators learn and teach the use of these visualization techniques, both in the classroom and in the laboratory. Since the year 2000, when we became co-editors-in-chief of BAMBED, many articles on this subject have been published. We urge the users of the RCSB PDB to share their educational techniques and programs by submitting manuscripts to BAMBED. Instructions to authors can be found at www.bambed.org. -------------------------------------------- PDB Community Focus: John Norvell, Ph.D., Program Director, NIGMS John C. Norvell is a Program Director and Branch Chief at the National Institute of General Medical Sciences (NIGMS), National Institutes of Health (NIH). After receiving a Ph.D. from Yale University, Dr. Norvell conducted research in protein crystallography at the University of Wisconsin, Brookhaven National Laboratory, and the NIH. At NIGMS, he has directed national support programs in structural biology and computational biology, in addition to directing the Institute's research training programs. He is the director of the NIGMS Protein Structure Initiative, a national research program in structural genomics. Q: How did your research training prepare you for your career at the National Institutes of Health (NIH)? A: My graduate field was physics and I studied magnetic materials. I learned the principles of crystallography and came to realize that my main scientific interests were somewhere between physics and biology. Subsequent research in biophysics and protein/tRNA crystallography got me hooked on structural biology. I have now worked at the National Institute of General Medical Sciences (NIGMS) for 30 years and, although I am not in a research lab, my knowledge of this field is important almost every day. Q: How do you view your relationship with the resources that you fund, such as the RCSB PDB and the Protein Structure Initiative? A: As a NIGMS program director for these resources, my main responsibility is to ensure that the scientific community--and ultimately the broader public--receive the maximum benefit from these funded resources. The issues are usually complicated, and there are seldom easy answers. But I enjoy solving problems, the wide variety of the work, the interaction with many groups (federal staff, resource scientists, advisors, scientists, etc.), and the overall NIGMS/NIH mission. As you know, the RCSB PDB is supported by a consortium of ederal agencies headed by the National Science Foundation. I find managing the support of resources to be challenging, invigorating, and rewarding. Q: In addition to supporting research, you also administer the NIGMS Predoctoral Institutional Training Grants. What are your goals for training future scientists? A: These training grants support multidisciplinary graduate training programs and students all over the country. Although these training grants make only a small contribution to the total support for graduate students in the biomedical sciences, they are important because they focus on the highest quality graduate programs and students. The goal of these training grants is to promote multidisciplinary activities to prepare students for research careers. The NIGMS training grants include core courses, lab rotations, seminars, training in the responsible conduct of research, journal clubs, research retreats, etc. The NIH requires all these programs to recruit students from groups that are underrepresented in the biomedical sciences. The NIGMS training grant program has evolved with time and several new areas such as computational biology and bioinformatics have been added. I anticipate that the program will continue to change and promote even better graduate education and lead to a better national research effort with many benefits to the public. Q: Structural genomics is focused on solving as many structures as possible in a short period of time. Since the year 2000, the Protein Structure Initiative (PSI) has determined more than 3000 structures, and developed new methods and technologies. How has this changed structural biology? A: When we first considered the PSI in 1999, it was not clear that a structural genomics pipeline could be built. The development of many new techniques and methods and the now widespread utilization of robotics have automated many of the steps of producing and determining the structure of many new proteins and has improved both efficiency and success rates. In addition, the RCSB PDB has worked with the centers to improve the automation of structural deposition. I believe the PSI centers have played a major role in this transformation. Many new tools and methods developed in part by the PSI centers are now in labs around the world. Q: The PSI Structural Genomics Knowledgebase (SGKB) makes the products of the PSI--the structures, methodologies, target information--available from an integrated website (kb.psi-structuralgenomics.org). Who do you envision will use this tool? What types of research can it enable? A: The PSI SGKB will bring together the resources developed by the PSI centers so that researchers across biomedical fields can access data, tools, materials, methods, and structures that will help them advance their own studies. Structure has only slowly been incorporated into the general scientific toolkit, and as several structural biologists have remarked, this is a cultural problem that must be corrected. Structure is so powerful that it should become a major tool for the broad scientific community. Q: How will community input, in terms of annotations and target suggestions, impact the PSI centers and the PSI SGKB "marketplace of ideas"? A: A major goal of the PSI and the PSI SGKB is to involve the broader scientific community in the PSI. The PSI is now at the end of the third year of the second 5-year phase and it is imperative to accelerate this process. Our focus right now is on making the products of the PSI easily available and known to the scientific community and to involve non-structural biologists in target selection decisions. I am confident that the PSI SGKB will play a dramatic role in making this happen. ---------------------------------------- STATEMENT OF SUPPORT The RCSB PDB is supported by funds from the National Science Foundation, the National Institute of General Medical Sciences, the Office of Science, Department of Energy, the National Library of Medicine, the National Cancer Institute, the National Center for Research Resources, the National Institute of Biomedical Imaging and Bioengineering, the National Institute of Neurological Disorders and Stroke, and the National Institute of Diabetes & Digestive & Kidney Diseases. The RCSB PDB is managed by two partner sites of the Research Collaboratory for Structural Bioinformatics: RUTGERS Rutgers, The State University of New Jersey Department of Chemistry and Chemical Biology 610 Taylor Road Piscataway, NJ 08854-8087 SDSC/Skaggs/UCSD San Diego Supercomputer Center and the Skaggs School of Pharmacy and Pharmaceutical Sciences University of California, San Diego 9500 Gilman Drive La Jolla, CA 92093-0537 RCSB PDB LEADERSHIP TEAM Dr. Helen M. Berman - Director Rutgers University berman@rcsb.rutgers.edu Dr. Philip E. Bourne - Associate Director SDSC/Skaggs/UCSD bourne@sdsc.edu Dr. Martha Quesada - Deputy Director Rutgers University mquesada@rcsb.rutgers.edu A list of current RCSB PDB Team Members is available from the website. The RCSB PDB is a member of the Worldwide PDB (www.wwpdb.org) -------------------------------------------- SNAPSHOT July 1, 2008 51491 released atomic coordinate entries * Molecule Type 47526 proteins, peptides, and viruses 1870 nucleic acids 2062 protein/nucleic acid complexes 33 other * Experimental Technique 43855 X-ray 7355 NMR 182 electron microscopy 99 other 33017 structure factor files 4054 NMR restraint files