RCSB PDB Newsletter Number 29 -- Spring 2006 Published quarterly by the Research Collaboratory for Structural Bioinformatics Protein Data Bank Weekly RCSB PDB news is published at www.pdb.org To change your subscription options, please visit lists.sdsc.edu/mailman/listinfo.cgi/rcsb-news ----------------------------------------- TABLE OF CONTENTS Message from the RCSB PDB Data Deposition and Processing Deposition Statistics Validating Structures Saves Deposition Time Data Query, Reporting, and Access PDB Statistics: Structures Solved by Multiple Methods Time-stamped Copies of PDB Archive Available via FTP Structural Genomics Tools and Portal Described in NAR Database Issue Website Statistics Outreach and Education Protein Modeling at the NJ Science Olympiad PDB-in-a-Cave: Virtual Reality Environment Meetings and Exhibits RCSB PDB Focus: Frequently Asked Questions PDB Education Corner: Making Viruses with Middle School Students PDB Community Focus: Shri C. Jain, RCSB Protein Data Bank Molecules of the Quarter RCSB PDB Partners, Leadership Team, and Statement of Support Snapshot ----------------------------------------- MESSAGE FROM THE RCSB PDB On December 30, 2005, the RCSB PDB upgraded www.pdb.org to the improved database and website that had been in beta testing since July 2004. In addition to enhanced navigation and more accurate searching and reporting, the new site has also brought significant performance enhancements, such as faster searching and page display. The RCSB PDB website is accessed by about 100,000 unique visitors per month from nearly 140 different countries. More than 600 GigaBytes of data are transferred each month. On a typical weekday, two pages from the site are viewed every second. The new features of the site, which include enhanced searching, browsing, navigating, and reporting, have received significant usage. For example, the options for exploring structures through browsers that navigate the PDB archives using classifications from Gene Ontology, EC nomenclature, source organism, disease, genome, SCOP, and CATH were used more than 6,000 times in January alone. The narrated Flash tutorial, which provides an introduction to using the new site, was viewed more than 11,000 times. The RCSB PDB thanks everyone who has contributed to the development of this new resource since testing. Questions about the transition to this site not addressed in the FAQ, which includes information about direct linking and downloading, should be sent to info@rcsb.org. -------------------------------------------- DATA DEPOSITION AND PROCESSING DEPOSITION STATISTICS In the first quarter of 2006, 1717 experimentally-determined structures were deposited to the PDB archive. The entries were processed by wwPDB teams at RCSB-Rutgers, MSD-EBI, and PDBj. Of the structures deposited in 2006, 71.5% were deposited with a release status of "hold until publication"; 17.9% were released as soon as annotation of the entry was complete; and 10.6% were held until a particular date. 84.1% of these entries were determined by X-ray crystallographic methods; 12.0% were determined by NMR methods. 83.3% of these depositions were deposited with experimental data. VALIDATING STRUCTURES SAVES DEPOSITION TIME To lower the number of revisions and problems found during the annotation process, depositors should validate their structure, provide the correct and complete sequence, and run BLAST (1). The Validation Server (deposit.pdb.org/validate) allows the user to check the format of coordinate and structure factor files, and to create a variety of validation reports about a structure. When the validation process is complete, users are presented with a validation report which includes an Atlas entry, a summary report, and a collection of structural diagnostics including bond distance and angle comparisons, torsion angle comparisons, base morphology comparisons (for nucleic acids), and an image of the molecule. Reports from MolProbity(2), PROCHECK(3), NUCheck, and SFCheck(4) are made available. Validating a structure using this tool helps authors spot possible errors in the structure prior to starting a deposition session. Structures can also be validated using ADIT (deposit.pdb.org/adit) by selecting the 'validate' option before proceeding to 'deposit' option. It is also important to provide the complete and correct sequence for polymers. The deposited sequence should include all residues in the crystal or NMR tube used for the experiment, including uncleaved His tags and cloning artifacts and any residues missing from the coordinates due to lack of electron density or disorder. The one letter code sequence should not conflict with the sequence from the coordinates. Annotators perform a BLAST to find a match for the deposited sequence with a sequence database reference. Based on these search results, sequence database records are generated in the PDB entry. If the deposited sequence is in conflict with the known database sequence, records are created in the entry with the proper explanation of the conflict. Running BLAST on the sequence prior to deposition can help authors find possible mismatches. If the polymer has engineered mutations they should be mentioned in the "Molecule Details, Specific mutation" section of ADIT for proper annotation. Providing the correct and complete sequence, running a BLAST search and validating the structure help make the annotation process fast and easy as well as each PDB entry complete and accurate. References: 1 Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., et al., (2005). Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 33, D39-45. 2 Davis, I.W., Murray, L.W., Richardson, J.S., and Richardson, D.C. (2004). MOLPROBITY: structure validation and all-atom contact analysis for nucleic acids and their complexes. Nucleic Acids Res 32, W615-619. 3 Laskowski, R.A., Rullmann, J.A., MacArthur, M.W., Kaptein, R., and Thornton, J.M. (1996). AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR. J. Biomol. NMR 8, 477-486. 4 Vaguine, A.A., Richelle, J., and Wodak, S.J. (1999). SFCHECK: a unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. Acta Crystallogr D Biol Crystallogr 55, 191-205. -------------------------------------------- DATA QUERY, REPORTING, AND ACCESS PDB STATISTICS: STRUCTURES SOLVED BY MULTIPLE METHODS As the PDB archive grows larger, there is an overlap of data generated by different experimental methods. The RCSB PDB provides a statistics table to highlight structures solved by multiple techniques at www.rcsb.org/pdb/clusterExpMethods.do. This list of proteins solved by multiple experimental methods was generated by clustering PDB protein chains of at least 100 amino acids into clusters of greater than 95% sequence similarity, and then listing only those clusters which contain at least one structure solved by one method (e.g. X-ray) and at least one structure solved by a different method (e.g. NMR). The clustering was done using CD-HIT (bioinformatics.org/cd-hit). TIME-STAMPED COPIES OF THE PDB ARCHIVE AVAILABLE VIA FTP Time-stamped yearly snapshots of the PDB Archive are available from ftp://snapshots.rcsb.org/. It is hoped that these snapshots will provide readily identifiable data sets for research using PDB data. The directory 20060103, which contains the exact and complete contents of the FTP archive as it appeared on January 3, 2006, has been added. It includes the 34,421 experimentally-determined coordinate files that were current (i.e. not obsolete) as of January 3, 2006. It joins the directory 20050106, which contains the frozen contents of the FTP archive as of January 6, 2005. A script is available at ftp://snapshots.rcsb.org/rsyncSnapshots.sh to download all or part of a yearly snapshot via rsync. Other scripts are provided to assist in the automated download of current data from the production ftp site. These can be used to make copies of the current archive (ftp://ftp.rcsb.org/pub/pdb/software/getPdbStructures.pl), portions of the archive (ftp://ftp.rcsb.org/pub/pdb/software/rsyncPDB.sh), or weekly updates (ftp://ftp.rcsb.org/pub/pdb/software/getPdbUpdate.pl). STRUCTURAL GENOMICS TOOLS AND PORTAL DESCRIBED IN NUCLEIC ACIDS RESEARCH DATABASE ISSUE "The RCSB PDB information portal for structural genomics" has been published in the latest issue of Nucleic Acids Research. The article describes the online tools, summary reports, and target information related to structural genomics from a new information portal at sg.pdb.org. From this site, information and links are provided for the structural genomics initiatives located worldwide, including reports for each center that provide target lists, target status progress, targets in the PDB, and sequence redundancy analyses. Databases that track the progress of protein studies are available. TargetDB contains information about the progress of the production and solution of structures. PepcDB extends the content of TargetDB with status history, stop conditions, reusable text protocols and contact information collected from the PSI Centers. A tool is also provided to explore the distributions of functions found among structural genomics structures, PDB structures, genomes, and homology models. This functional coverage can be examined according to Enzyme Classification, Gene Ontology (Biological Process, Cell Component, or Molecular Function) and Disease. The abstract and full text of the article are also available from the Nucleic Acids Research website. Andrei Kouranov, Lei Xie, Joanna de la Cruz, Li Chen, John Westbrook, Philip E. Bourne and Helen M. Berman The RCSB PDB information portal for structural genomics Nucleic Acids Research, 2006, Vol. 34, Database issue D302-D305 WEBSITE STATISTICS Access statistics for www.pdb.org are given below for the first quarter of 2006. MONTH...UNIQUE VISITORS...# OF VISITS...PAGES.......HITS...BANDWITH JAN............99418.........22650....7367396...26282033...629.20GB FEB...........103250........225487... 5038123...24020357...725.90GB MAR...........124000........282394....7523375...31401960...627.16GB -------------------------------------------- OUTREACH AND EDUCATION PROTEIN MODELING AT THE NJ SCIENCE OLYMPIAD Several high school teams competed in the protein modeling trial event at the New Jersey Northern Regional and State Science Olympiads. At the start of the Olympiad, the student teams submitted a previously prepared model and a written description of the TATA-Binding Protein created using resources available from the RCSB PDB. Then, each team built a model at the competition and answered written questions about the structure, function, importance, and history of the protein. In a hushed room, the teams used the Molecule of the Month feature on Designer Proteins, the PDB coordinate file, and the structure's primary citation to complete the challenge at the regional competition. At the state final, the teams built cholera toxins. The three-dimensional protein models are built using Mini-Toober kits provided by the RCSB PDB. The students all submitted great structures, with awards going to Montville Township High School (First Place), Freehold High School (Second Place), and Alghazly High School (Third Place) at the regional level. In an extremely close competition at the state competition, East Brunswick High School (First Place), The Lawrenceville School (Second Place), and Montgomery High School (Third Place) created very strong models. Special thanks to our judges from the RCSB PDB (Kyle Burkhardt, Cathy Lawson, Jeramia Ory, Irina Persikova, Massy Rajabzadeh, Monica Sundd, and Jasmine Young), the NJ Science Olympiad organizers, and to the MSOE Center for BioMolecular Modeling for the design of this event. Questions about the NJ Science Olympiad Protein Modeling trial event should be sent to buildmodels@rcsb.rutgers.edu. Information is also available at education.pdb.org/olympiad. PDB-IN-A-CAVE: VIRTUAL REALITY ENVIRONMENT HIGHLIGHTS PDB STRUCTURES To offer a new way of looking at molecular structure, the RCSB PDB and CalIT2 have released the first version of CAVE (Cave Automatic Virtual Environment) software for visualizing 3D macromolecular structures in an immersive, virtual reality environment. The CAVE offers a room-sized space for users to interact with high-resolution video. Wearing stereoglasses, the viewer can move through and around a structure that is projected in the CAVE. The new software has an interface that makes a connection to the RCSB PDB web site to download and display files. In addition, the software also supports visualization of structure motions by supporting multiple file loading. For example, users may visualize structure motions from the Database of Molecular Movements (molmovdb.mbb.yale.edu/molmovdb). The software, built on the COVISE platform, runs in the CAVE in both single user and collaborative modes. With over 100 CAVEs around the world, users may now download and visualize any PDB structure while in this unique environment. MEETINGS AND EXHIBITS * Biophysical Society Annual Meeting: Wolfgang Bluhm and Jeramia Ory met with users and provided demonstrations of the new site at the RCSB PDBÕs exhibit booth at the 50th Annual Meeting of the Biophysical Society (February 18-22, 2005 in Salt Lake City, Utah). * Experimental Biology: Demonstrations were also available at the American Society for Biochemistry and Molecular Biology's (ASBMB) Annual Meeting (April 1-5, San Francisco, CA), which was held in conjunction with the Experimental Biology conference (EB). In addition to exhibiting, Shuchismita Dutta presented "Educational Resources for Structural Biology at the RCSB Protein Data Bank" as a talk and poster. EB is sponsored by the American Association of Anatomists, American Physiological Society, ASBMB, American Society for Investigative Pathology, American Society for Nutrition, Inc., and American Society for Pharmacology and Experimental Therapeutics. * Virginia Tech Structural Biology Symposium: Images from the RCSB PDB's Art of Science exhibit were on display at Virginia Polytechnic Institute and State University as part of their Structural Biology Symposium (March 31 - April 1, Blacksburg, VA). Attendees also explored protein structures in the "PDB-in-a-CAVE" environment. RCSB PDB FOCUS: FREQUENTLY ASKED QUESTIONS The Frequently Asked Questions page answers a number of common queries about the new RCSB PDB site, including How do I link to a Structure Summary page for a PDB ID? and What are the URLs to download files? Questions related to searching, reporting, and using all of the resources available from the RCSB PDB not found on this page should be sent to info@rcsb.org. For deposition-related queries, please see the Deposition FAQ or contact us at deposit@deposit.rcsb.org. FAQ: www.rcsb.org/pdb/static.do?p=home/faq.html Deposit FAQ: deposit.rcsb.org/depoinfo/depofaq.html -------------------------------------------- PDB EDUCATION CORNER: MAKING VIRUSES WITH MIDDLE SCHOOL STUDENTS The RCSB PDB was a part of Princeton University's Science and Engineering Expo (SEE) on Thursday, March 17. More than 900 students from 11 area middle schools participated in a variety of demonstrations and hands-on activities that ranged from crawling exotic bugs to sheep brain dissection. The goal of the event was to expose these students, who are at an age where science may start to lose its appeal, to a vibrant look at science and technology. We wanted to demonstrate symmetry in protein structures, and selected a protein structure highlighted in the Molecule of the Month series that the students could relate to Ð the virus. By having the students generate their own models of a virus structure, they were able to understand that many virus structures, such as the common cold, are icosahedrons. And since the structure is formed by repetition of the same protein (or group of proteins), it is a relatively simple and stable structure. Armed with Molecule of the Month printouts, posters, and two laptops, several annotators were able to help students create their viruses. Students had their choice of an edible virus, made out of toothpicks and marshmallows, or a paper virus that could be decorated with pens. The marshmallow virus was built by first creating a triangular face with 3 marshmallows and 3 toothpicks. It was then joined by 19 other faces to form the final structure. Using paper cut from a template, students artfully colored their virus papers, and then folded and glued to make a three-dimensional form. The kids were very enthusiastic about their creations, and walked around the event showing them off. Their teachers also responded to the activity, and appreciated how it combined biology with geometry. SEE was organized by the Department of Molecular Biology Outreach at Princeton. -------------------------------------------- PDB COMMUNITY FOCUS: SHRI C. JAIN, RCSB PROTEIN DATA BANK Shri C. Jain has been annotating structures for the PDB and the NDB since 1997. He has trained more than 25 members of the annotation staff and has annotated over one thousand structures. After receiving his PhD from the University of Poona, India, Dr. Jain taught at the Indian Institute of Technology Bombay, India for about two and half years and then continued his research at the University of Rochester, where he was a pioneer in the study of nucleic acid drug complexes. The focus of his X-ray crystallographic work was to study and understand how drugs, including intercalating compounds such as ethidium bromide, ellipticine, and acridine bind to nucleic acids. During that period, he solved the structure of the first actinomycin nucleoside complex(1). This work was presented at the Cold Spring Harbor Symposium in 1971 Ð the Protein Data Bank was founded at the same meeting. In 1992, he came to Rutgers The State University of New Jersey to join Dr. BermanÕs research group to continue structural studies on subtilisin and drug-nucleic acid complexes. In March 2006, he retired after nearly 15 years of working at Rutgers. Q. You came to the PDB as a structural biologist. Tell us about your research. A. At Rochester, I solved structures of several drug-nucleic acid complexes in which drugs such as ethidium bromide intercalate between the dinucleotide base pairs. The study of actinomycin-deoxyguanosine complex shed light on how it interacts with DNA. At Rutgers, I was involved with structural studies of the subtilisin-propeptide complex, actinomycin-nucleic acid complex, and alpha neurotoxin. Q. You've been annotating structures since 1997. Has it changed the way that you look at molecular structures? What types of challenges has annotation presented? A. When I first started annotating nucleic acid structures for the NDB in 1997, we were processing structures pretty much manually with the program MAXIT(2). We did not have the data processing tools that we have now. A lot of automation has been incorporated since then to speed up the processing time with increased accuracy. The validation report that we send to depositors helps them evaluate their structure more closely so they can correct errors, if needed. We still visualize every structure with graphics programs such as RasMol(3, 4), (the internal program) Molly, or Chimera(5). There has been a significant increase in the size and complexity of the structures that are being studied by researchers and deposited with the PDB. Some of the very large structures, such as an icosahedral virus particle, have their own special beauty. Q. What has been the most challenging/most rewarding thing about annotation? A. Most of the time, annotation has been great fun. We get to see the latest trend in structural studies as these structures are deposited with PDB. But more importantly it gives a feeling of pride in being instrumental to providing an important service to the scientific community by processing structures accurately and consistently. It is not, however, as much fun when depositors do a poor job in deposition, or when an annotator is processing a huge complex structure in which a large part of the structure is defined as unknown, where several chains occupy alternate location, or the structure has a large number of new ligands. Q. How have the annotation tools changed over the years? A. Improvement in annotation tools is an ongoing process. There is a continuous demand for improved automation in data processing. The annotation tools available now have helped make processing structures easier and have minimized errors. Many diagnostic reports are created during the annotation process to continually warn us about what needs to be corrected. These tools and features, including the automatic creation of sequence database records, a ligand tool to process and add new ligands, and various scripts and MAXIT options have been of great help. Other tools, such the one that generates reminder letters to be sent to depositors, another that assists in updating structures on hold for publication, and all of the tools created to help prepare structures for public release have made the annotation process much easier. Q. Do you have any advice for future annotators? A. New annotators should pay close attention during training time and not be bashful in asking questions. Spend an hour or two everyday skimming over the data processing guide during training. Also review the PDB format guide to familiarize yourself with the record format. The new annotators need to be meticulous in record keeping. It will be very helpful to keep up with the status sheet that helps track down your workload. DonÕt be afraid to ask other members of the annotation staff for help or clarification. Enjoy processing structures Ð itÕs fun to use graphics programs to look at all of the different structures with their various shapes. Q. What will you miss about the PDB after retirement? A. Now that I am retiring, I will perhaps sometimes miss the daily grind of structure processing. Honestly, I think, what I will miss the most are all of the people that I have had an opportunity to work with for many years. It has been a great time and I am leaving with a sense of great pride that I had the opportunity to train so many annotators. References: (1) Sobell, H.M., Jain, S.C., Sakore, T.D., and Ponticello, G. (1972). Concerning the stereochemistry of actinomycin binding to DNA: an actinomycin-deoxyguanosine crystalline complex. Cold Spring Harb Symp Quant Biol 36, 263-270. (2) Feng, Z., Hsieh, S.-H., Gelbin, A., and Westbrook, J. (1998). MAXIT: Macromolecular Exchange and Input Tool, Rutgers University, New Brunswick, NJ. (3) Sayle, R., and Milner-White, E.J. (1995). RasMol: biomolecular graphics for all. Trends Biochem. Sci. 20, 374. (4) Bernstein, H.J. (2000). Recent changes to RasMol, recombining the variants. Trends Biochem. Sci. 25, 453-455. (5) Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng, E.C., and Ferrin, T.E. (2004). UCSF ChimeraÐa visualization system for exploratory research and analysis. J Comput Chem 25, 1605-1612. -------------------------------------------- MOLECULES OF THE QUARTER The Molecule of the Month series explores the functions and significance of selected biological macromolecules for a general audience. The molecules featured this quarter were topoisomerases, alpha-amylase,and tissue factor. All Molecule of the Month features are accessible from the RCSB PDB home page. ---------------------------------------- STATEMENT OF SUPPORT The RCSB PDB is supported by funds from the National Science Foundation, the National Institute of General Medical Sciences, the Office of Science, Department of Energy, the National Library of Medicine, the National Cancer Institute, the National Center for Research Resources, the National Institute of Biomedical Imaging and Bioengineering, and the National Institute of Neurological Disorders and Stroke. The RCSB PDB is managed by two partner sites of the Research Collaboratory for Structural Bioinformatics: RUTGERS Rutgers, The State University of New Jersey Department of Chemistry and Chemical Biology 610 Taylor Road Piscataway, NJ 08854-8087 SDSC/UCSD San Diego Supercomputer Center University of California, San Diego 9500 Gilman Drive La Jolla, CA 92093-0537 The overall operation of the PDB is managed by the RCSB PDB Leadership Team. Technical and scientific support is provided by the RCSB PDB Members. RCSB PDB LEADERSHIP TEAM Dr. Helen M. Berman - Director Rutgers University berman@rcsb.rutgers.edu Dr. Philip E. Bourne - Co-Director SDSC/UCSD bourne@sdsc.edu A list of current RCSB PDB Team Members is available from the website. The RCSB PDB is a member of the Worldwide PDB (www.wwpdb.org) ----------------------------------------- SNAPSHOT -- April 1, 2006 35813 released atomic coordinate entries * Molecule Type 32,724 proteins, peptides, and viruses 1,585 nucleic acids 1,471 protein/nucleic acid complexes 33 other * Experimental Technique 30,306 diffraction and other 5,314 NMR 19,949 structure factor files 2,898 NMR restraint files